energy efficient computing in nanoscale...

24
Energy Efficient Computing in Nanoscale CMOS Vivek De Intel Fellow Director of Circuit Technology Research Intel Labs

Upload: others

Post on 04-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

Energy Efficient Computing in Nanoscale CMOS

Vivek De

Intel Fellow

Director of Circuit Technology Research

Intel Labs

Page 2: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

2

2

Internet of Everything (IoE)

Need end-to-end energy efficiency

Page 3: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

3

More, better transistors

More cores

Continued benefitsfrom Moore’s Law

Moore’s Law scaling

45nm

+

2007

105

103

107

10914nm

Trigate

2014

Page 4: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

4

4

Dynamic platform control

Deliver best user experience under constraints

Scalable On-die Interconnect FabricScalable On-die Interconnect Fabric

Graphics

Video

SpecialPurposeEngines

IntegratedMemory

Controllers

Off Die interconnect

Cache Cache Cache

Last LevelCache

Last LevelCache

Last LevelCache

Scalable On-die Interconnect FabricScalable On-die Interconnect Fabric

Graphics

Video

SpecialPurposeEngines

IntegratedMemory

Controllers

Off Die interconnect

Cache Cache Cache

Last LevelCache

Last LevelCache

Last LevelCache

Scalable On-die Interconnect FabricScalable On-die Interconnect Fabric

Graphics

Video

SpecialPurposeEngines

Graphics

Video

SpecialPurposeEngines

IntegratedMemory

Controllers

Off Die interconnect

Cache Cache Cache

Last LevelCache

Last LevelCache

Last LevelCache

Cache Cache CacheCache Cache Cache

Last LevelCache

Last LevelCache

Last LevelCache

DynamicV/F control

IndependentV/F control

regions

Workload-basedcore activation

& shutdown

Scenario-basedpower allocation

Maximize

performance

& efficiency

Page 5: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

5

5

Near Threshold Voltage (NTV) computing

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 0.2 0.4 0.6 0.8 1 1.2

32nm CPU process32nm SOC process

400mV

500mV

3X

5X

Voltage /Freq operating points

No

rma

lize

dE

ne

r gy

/ cy

cle

Normal operating range NTVSub-threshold

Page 6: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

6

6

NTV IA processor

Technology 32nm High-K Metal Gate

Interconnect 1 Poly, 9 Metal (Cu)

Transistors 6 Million (Core)

Core Area 2mm2

IA-32

CorePLL

JTAG

I/O Area

I/O Area

I/O

Are

a

5 m

m

5 mm

IA-32 Core

Logic

Scan

RO

M

L1$-I L1$-D

Level Shifters + clk spine

1.1 mm

1.8

mm

Page 7: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

7

7

NTV design techniques

m1 m2

m3 m4

m5

m7 m8

wrwl

rdwl

wrb

l#

rdb

l

bitx bit

m9

wrwl#

m10

m6

Modified Register File Cell (L1$)

Robust Flop Topologies

Multi-corner design

optimizations(SCL)

0.5 0.6 0.7 0.8 0.9 1 1.1

Fre

qu

en

cyVoltage (V)

Optimization Corners

Variation-aware design2X min Z, 40% lib cells used

0.4 0.5 0.6 0.7 0.8 0.9 1

Voltage (V)

Delay spread due to random variations

2-i

np

ut

NA

ND

gat

e d

ela

y

4:1 Mux

“1”

“1”

“1”

“0”

“0”

“0”

“1”

“1”

“1”

“0”

“0”

“0”

Narrow muxes No stack height > 2

input

output

vcch vcch

vcch

vccl

vcch

input

output

vcch vcch

vcch

vccl

vcch

Robust level converters

Page 8: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

8

8

NTV IA – powered by solar cell!

Page 9: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

9

9

Power performance measurements

915MHz

500MHz

100MHz

3MHz

737mW

174mW

17mW2mW

0

100

200

300

400

500

600

700

800

1

10

100

1000

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2

0.55 0.55 0.55 0.55 0.6 0.7 0.8 0.9 1 1.1 1.2

To

tal P

ow

er (m

W)F

req

uen

cy (

MH

z)

32nm CMOS, 25oC

Logic Vcc / Memory Vcc (V)

Page 10: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

10

10

Power components

Logic Dynamic Power

Logic Leakage Power

Memory Dynamic Power

Memory Leakage Power

81%

11%

3%

5%

53%

27%

15%

5%

4%

33%

62%

1%

Logic Vcc: 1.2V

Memory Vcc: 1.2V

Vcc-max (Super-Threshold) Vcc-opt (Near-Threshold) Vcc-min (Sub-Threshold)

Logic Vcc: 0.45V

Memory Vcc: 0.55V

Logic Vcc: 0.28V

Memory Vcc: 0.55V

Page 11: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

11

11

Minimum energy operation

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2

Total Energy

Leakage Energy

Dynamic Energy

0.55 0.55 0.55 0.55 0.6 0.7 0.8 0.9 1 1.1 1.2

En

erg

y/C

ycle

(nJ

)

32nm CMOS, 25oC

4.7X

Logic Vcc / Memory Vcc (V)

Page 12: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

12

12

NTV and variability

100

1000

0 100 200 300 400 500 600 700 800 900 1000

Fast Medium Slow

Leakage Comparison

Slow 1.0X

Medium 2.5X

Fast 7.5X

Frequency (MHz)

En

erg

y/C

ycle

(p

J)

16%30%

22%

28%

18%

32nm CMOS, 25oC

Page 13: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

13

13

Voltage-frequency marginsVoltage

Frequency

Voltage

Frequency

V m

arg

inIR drop

Inductive

droops

Load line

variations

V variation T variation

F margin

Nominal T

Worst T

V m

arg

in

F margin

Voltage

Frequency

Voltage

Frequency

Aging Path activity

BOL

EOL

V m

arg

in

F margin

Nominal

Worst

F margin

V m

arg

in

MIS

Signal

coupling

Critical

path

Page 14: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

14

14

Dynamic adaptation & reconfiguration

Adapt & reconfigure for best power-performance

Dynam

ic C

ontr

ol U

nit

Processor

Aging

Sensor

Thermal

Sensor

Voltage

Sensor

Current

Sensor

Sensors

Voltage

Control

Frequency

Control

Configuration

Control

Change V

Change F

Reconfigure

Page 15: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

15

15

Dynamic V & F adaptation

Clocking

Input

Buffer

Sensors

& Analog

DAB

TCP/IP

Processor

Core

JTAG

No

ise

ge

n

No

ise

ge

n

TCP/IP

processor

PLL0

PLL1

DAB

Control

Thermal

sensor

Div

PMOS

CBG

NMOS

CBG

core clk

gate

Droop

sensor

Time

Tem

p

Time

Vc

c

PLL2

NMOS body bias

PMOS body bias

I/O clk

Noise

injector

CL

OC

KIN

GC

ON

TR

OL

F0

Inp

ut b

uffe

r

Ou

tpu

t po

rt

F1

F2

1st droop

2nd droop 3rd droop

ctrl

PL

L c

om

man

d

VR

M

• Adapt F/V to V/T change reduce V/T margin

• Adapt F/V to aging reduce aging margin

Environment-aware dynamic adaptation

Prototype chip in 90nm

Source: Intel

Source: Intel

Page 16: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

16

16

Resilient platforms

Resiliency for performance, efficiency & reliability

Circuit & Design

Microarchitecture

Microcode

Firmware

VM

OS

Programming System

Applications

Low

er

err

or

rate

Less re

covery

overh

ead

Less s

ilic

on o

verh

ead

Resiliency framework

Systemadaptation

Systemrecovery

Errorcorrection

Faultconfinement

Faultdiagnosis

Errordetection

Systemreconfiguration

Resilie

nt

pla

tfo

rm

featu

res

Page 17: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

17

17

Resilient & adaptive core

DE

RA

EX

ME

M

X

WB

CORE

CLOCK GENERATOR

errorreplay

PC

refclkPLL

IFDuty

Cycle½FCLK

16KB

I$

16KB

D$

RF

Error Control Unit

Adaptive Clock Control

÷

I/O

DC

ac

he

ICa

ch

e

Co

re

Clo

ck

RFJTAG

I/O

DC

ac

he

ICa

ch

e

Co

re

Clo

ck

RFJTAG

1.45GHz at 1.0VCore FMAX

Technology 45nm CMOS

Die Area 13.64 mm2

Core Area 0.39 mm2

Core Power 135mW at 1.0V

1.45GHz at 1.0VCore FMAX

Technology 45nm CMOS

Die Area 13.64 mm2

Core Area 0.39 mm2

Core Power 135mW at 1.0V

Page 18: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

18

18

Performance & efficiency gains

0

5

10

15

20

25

edgedetect linkedlist bubble

Application

Th

rou

gh

pu

t G

ain

(%

)EDS TRC

22%

41%

10% VCC Droop

0.0

0.4

0.8

1.2

1.6

0.0 0.2 0.4 0.6 0.8 1.0 1.2

Throughput (BIPS)

To

tal E

ne

rgy

(m

J)

Conventional

EDS

TRC

22%

41%

10% VCC Droop

0.0

0.4

0.8

1.2

1.6

0.0 0.2 0.4 0.6 0.8 1.0 1.2

Throughput (BIPS)

To

tal E

ne

rgy

(m

J)

Conventional

EDS

TRC

Input Image

Output Image:Resiliency ON

Output Image:Resiliency OFF

Page 19: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

19

19

Integrated voltage regulators

Page 20: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

20

20

Fully integrated VR

Page 21: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

21

Energy efficient interconnects

21

10

100

0 10 20 30 401

Channel Loss @ Symbol rate (dB)

Energ

y E

ff. (p

J/bit)

Green = Intel (research)

Driver PINDiode

Laser

Modulator

Receiver

Fiber

TIA

Optical I/O

Page 22: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

22

22

Memory capacity & bandwidth

Page 23: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

23

23

Neuromorphic computing

Page 24: Energy Efficient Computing in Nanoscale CMOScdworkshop.eit.lth.se/fileadmin/eit/group/71/2016/2016_Lund_Works… · Near Threshold Voltage (NTV) computing 0 0. 2 0. 4 0. 6 0. 8 1

24

24

End-to-end efficiency for IoE