cacti-io: cacti with off-chip power-area-timing models norman p. jouppi ¥, andrew b. kahng †‡,...
TRANSCRIPT
![Page 1: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/1.jpg)
CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models
Norman P. Jouppi¥, Andrew B. Kahng†‡,Naveen Muralimanohar¥, Vaishnav Srinivas†
November 6th, 2012
ECE† and CSE‡ DepartmentsUniversity of California, San Diego
Hewlett-Packard Laboratories¥, Palo Alto
![Page 2: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/2.jpg)
(2)
Agenda
• Introduction• Need for off-chip power-area-timing
models• CACTI-IO models• Case studies using CACTI-IO:
• High-capacity DDR3 configurations• 3-D stacking• LPDDRx for servers
• Summary
![Page 3: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/3.jpg)
(3)
Memory Subsystem Performance• Latency/Access times: The Memory Wall
• Modern architectures try to hide the latency impact
• Capacity: Need for large server main memory• Bandwidth: The Memory Bandwidth Limit
• Latency hiding techniques do not help• Off-chip limits bandwidth
Source: Rogers et al.Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scaling
![Page 4: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/4.jpg)
(4)
Memory Subsystem Power
• Memory subsystem power a significant portion
![Page 5: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/5.jpg)
(5)
Memory Subsystem Power
• Memory subsystem power a significant portion• DRAM
![Page 6: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/6.jpg)
(6)
Memory Subsystem Power
• Memory subsystem power a significant portion• DRAM, Buffers
![Page 7: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/7.jpg)
(7)
Memory Subsystem Power
• Memory subsystem power a significant portion• DRAM, Buffers, Caches
![Page 8: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/8.jpg)
(8)
Memory Subsystem Power
• Memory subsystem power a significant portion• DRAM, Buffers, Caches, Interconnect/IO/PHY
![Page 9: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/9.jpg)
(9)
Memory Subsystem Power
• Memory subsystem power a significant portion• DRAM, Buffers, Caches, Interconnect/IO/PHY• Off-chip IO power is a key component
Source: Economou et al.Full-System Power Analysis and Modeling for Server Environments
![Page 10: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/10.jpg)
(10)
Off-chip Performance
• Memory bandwidth limited by off-chip interface
![Page 11: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/11.jpg)
(11)
Off-chip Performance
• Memory bandwidth limited by off-chip interface• Source-synchronous signaling
![Page 12: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/12.jpg)
(12)
Off-chip Performance
• Memory bandwidth limited by off-chip interface• Source-synchronous signaling• Signal/Power Integrity
![Page 13: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/13.jpg)
(13)
Off-chip Performance
• Memory bandwidth limited by off-chip interface• Source-synchronous signaling• Signal/Power Integrity: ISI
![Page 14: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/14.jpg)
(14)
Off-chip Performance
• Memory bandwidth limited by off-chip interface• Source-synchronous signaling• Signal/Power Integrity: ISI, Crosstalk
![Page 15: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/15.jpg)
(15)
Off-chip Performance
• Memory bandwidth limited by off-chip interface• Source-synchronous signaling• Signal/Power Integrity: ISI, Crosstalk, Supply Noise
![Page 16: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/16.jpg)
(16)
Off-chip Performance
• Memory bandwidth limited by off-chip interface• Source-synchronous signaling• Signal, power integrity: ISI, Crosstalk, Supply Noise• Pincount
![Page 17: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/17.jpg)
(17)
Off-chip Power
• Off-chip power significant portion of the memory subsystem
![Page 18: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/18.jpg)
(18)
Off-chip Power
• Off-chip power significant portion of the memory subsystem
• Higher off-chip capacitance and voltages
![Page 19: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/19.jpg)
(19)
Off-chip Power
• Off-chip power significant portion of the memory subsystem
• Higher off-chip capacitance and voltages• Terminations and Vref-biased receivers
![Page 20: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/20.jpg)
(20)
Off-chip Power
• Off-chip power significant portion of the memory subsystem
• Higher off-chip capacitance and voltages• Terminations and Vref-biased receivers• Clocking elements
![Page 21: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/21.jpg)
(21)
Off-chip PAT Models For Architects• Off-chip models for full-system simulator
• Simulators today do not account for IO/PHY power• Accurate off-chip power and performance numbers• Co-optimize off-chip & on-chip power/performance • Explore new off-chip topologies and technologies
Full System Simulator
Off-Chip Power/
Area/Timing Models
Accurate Off-chip Power/
Peformance
On-Chip Power/
Area/Timing Models
Optimal On-chip and
Off-chip Configuration
![Page 22: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/22.jpg)
(22)
CACTI-IO
• CACTI well known for memory architects• CACTI-IO includes off-chip PAT models• CACTI-IO config file includes off-chip
parameters• CACTI-IO Tech Report available
# Memory State (R=Read, W=Write, I=Idle or S=Sleep)
//-iostate "R"-iostate "W"//-iostate "I"//-iostate "S"
# Is ECC Enabled (Y=Yes, N=No)
-dram_ecc "N"
#Address bus timing
//-addr_timing 0.5 //DDR, for LPDDR2 and LPDDR3-addr_timing 1.0 //SDR for DDR3, Wide-IO//-addr_timing 2.0 //2T timing//addr_timing 3.0 // 3T timing
# Bandwidth (Gbytes per second, this is the effective bandwidth)
-bus_bw 12.8 GBps
# Memory Density (Gbit per memory/DRAM die)
-mem_density 2 Gb
# IO frequency (MHz) (frequency of the external memory interface).
-bus_freq 800 MHz
# Duty Cycle (fraction of time in the Memory State defined above)
-duty_cycle 1.0
# Activity factor for Data (0->1 transitions) per cycle (for DDR, need to account for the higher activity in this parameter. E.g. max. activity factor for DDR is 1.0, for SDR is 0.5) -activity_dq 1.0
# Activity factor for Control/Address (0->1 transitions) per cycle (for DDR, need to account for the higher activity in this parameter. E.g. max. activity factor for DDR is 1.0, for SDR is 0.5)
-activity_ca 0
# Number of DQ pins
-num_dq 1
# Number of DQS pins
-num_dqs 0 //8 differential pairs
# Number of CA pins
-num_ca 0
# Number of CLK pins
-num_clk 2 //1 differential pair
# Number of Physical Ranks
-num_mem_dq 2 //Number of ranks (loads on DQ and DQS) per DIMM or buffer chip
# Width of the Memory Data Bus
-mem_data_width 1 //x4 or x8 or x16 or x32 memories
![Page 23: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/23.jpg)
(23)
Agenda
• Introduction• Need for off-chip power-area-timing
models• CACTI-IO Models• Case Studies using CACTI-IO:
• High-capacity DDR3 configurations• 3-D Stacking• BOOM: LPDDRx for servers
• Summary
![Page 24: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/24.jpg)
(24)
Dynamic Power• Dynamic Power (switching lumped caps)
• Interconnect Power
intE
fVVCαDNP dd
i
SWcpinsdyn ii
fEαDNP intcpinsint
tL VSW Vdd / Z0 if 2tL tb
tb VSW Vdd / Z0 if 2tL > tb
![Page 25: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/25.jpg)
(25)
Termination Power• DQ:
• Multi rank• Few termination types• READ and WRITE• Assume 50% 0’s, 1’s• Includes Rx, Tx
• CA:• Fly-by• VDD/2 termination
![Page 26: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/26.jpg)
(26)
PHY Power• Reference generators• Vref-biased receivers• Clock distribution• DLL/PLL• Phase Rotators
![Page 27: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/27.jpg)
(27)
Performance: Eye Compliance• Timing Budget: Tx, Channel, and Rx (setup/hold)• Voltage Budget: Tx (VOL/VOH), Channel, Rx (VIL/VIH)
![Page 28: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/28.jpg)
(28)
Channel Jitter
• DOE for topology parameters• Ron/Rtt/Cdram some of the key parameters• Linear interpolation of Taguchi array
![Page 29: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/29.jpg)
(29)
Timing Budget
i i
ijitter RJiDJT 2
avgjitterjitter TT _0)F(
i
avgjitterioijitter TFFT _
DS
setupskew
setupjittererror
ck
DH
holdskew
holdjittererror
ck
TTTTT
TTTTT
4
4
![Page 30: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/30.jpg)
(30)
Voltage Budget
NISWNN VVKV
N
SSOISIxtalkN
K
KKKK
for DOE
ILHrefM
NSWM
VVV
VVV
2
![Page 31: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/31.jpg)
(31)
Area
fkfkfkR
N
)R,(R
kANArea
ONIO
TTIONIOIO
33
221
00
1
2min
• Driver area depends on RON and RTT
• Predriver stages fanout to driver• Fixed area for ESD and controls
![Page 32: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/32.jpg)
(32)
Validation
• CACTI-IO models account for off-chip power, area and timing
• Validation against SPICE • Within 15% error across all the simulations• Lookup tables validated by construction
![Page 33: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/33.jpg)
(33)
Power for LPDDR2 DQ Single-Lane
Total IO Power
![Page 34: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/34.jpg)
(34)
Power for DDR3 DQ Single-Lane
Termination PowerTotal IO Power
![Page 35: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/35.jpg)
(35)
Agenda
• Introduction• Need for off-chip power-area-timing
models• CACTI-IO Models• Case Studies using CACTI-IO:
• High-capacity DDR3 configurations• 3-D Stacking• BOOM: LPDDRx for servers
• Summary
![Page 36: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/36.jpg)
(36)
Case Studies Using CACTI-IO
• We present three case studies:• High-capacity DDR3 configurations• 3-D configurations• BOOM (Buffered Output On Module): LPDDRx
for servers• Compare the configurations for:
• Capacity• Bandwidth• IO Power Efficiency
• BOOM case study with IO+DRAM power
![Page 37: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/37.jpg)
(37)
Case Study 1: High-capacity DDR3• RDIMM
![Page 38: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/38.jpg)
(38)
Case Study 1: High-capacity DDR3• RDIMM, LRDIMM
![Page 39: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/39.jpg)
(39)
Case Study 1: High-capacity DDR3• RDIMM, LRDIMM, BoB (Buffer on Board) • BoB uses serial bus to host
![Page 40: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/40.jpg)
(40)
Case Study 1: High-capacity DDR3• RDIMM, LRDIMM, BoB (Buffer on Board) • BoB uses serial bus to host• LRDIMM offers highest capacity• BoB offers best bandwidth and
power efficiency per GB of capacity
![Page 41: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/41.jpg)
(41)
Case Study 2: 3-D Stacking• TSS based• Peak bandwidth of 176
GB/s for Micron’s Hybrid Memory Cube (HMC)
• Power efficiency varies by around 2X
Source: Micron
![Page 42: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/42.jpg)
(42)
BOOM: LPDDRx for servers
• BOOM (Buffered Output On Module) architecture from Hewlett-Packard:• Buffer chip on the board• LPDDRx memories (lower speed, power)• Wider bus from the buffer to the DRAMs
• Achieves better power efficiency using LPDDRx memories
• Still meets performance using buffer
![Page 43: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/43.jpg)
(43)
BOOM Topology
![Page 44: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/44.jpg)
(44)
Case Study 3: BOOM
• 50% increase in IO efficiency with LPDDRx• No terminations with wider, slower buses• Serial bus from the buffer offers more
savings
![Page 45: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/45.jpg)
(45)
BOOM: IO+DRAM Power
![Page 46: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/46.jpg)
(46)
BOOM: IO+DRAM Power
• IO power a significant portion of the combined power (DRAM+IO): 50-60%
• IO Idle power a very significant contributor• LPDDR2 unterminated signaling reduces idle
power• BOOM-N4-L-400 w/ serial bus to host
provides a 3.4X energy savings (DRAM+IO) over the BOOM-N2-D-800
• Combining IO+DRAM allows for correct optimizations
![Page 47: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/47.jpg)
(47)
Optimizing Fanout• IO power vs. number of ranks while
capacity and bandwidth are constant• Slower and wider provides better power• Die area and clock distribution goes up as
bus gets wider, so 200-400MHz seems like a sweet spot
BWfW
CapacityWWN
B
MBR
2
)/(
![Page 48: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/48.jpg)
(48)
Agenda
• Introduction• Need for off-chip power-area-timing
models• CACTI-IO Models• Case Studies using CACTI-IO:
• High-capacity DDR3 configurations• 3-D Stacking• BOOM: LPDDRx for servers
• Summary
![Page 49: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/49.jpg)
(49)
Summary• Introduced CACTI-IO with off-chip models• CACTI-IO models include
• IO/Interconnect dynamic and termination power• PHY power• Voltage/Timing budgets for eye compliance• IO area
• 3 case studies show the capabilities of CACTI-IO• Calculate off-chip power/area/timing• Combine on-chip and off-chip power• Identify key configuration choices and optimizations
• Ongoing work:• Extend the models to other types of off-chip memory
and off-chip configurations, including PCRAM
![Page 50: CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models Norman P. Jouppi ¥, Andrew B. Kahng †‡, Naveen Muralimanohar ¥, Vaishnav Srinivas † November 6 th,](https://reader036.vdocuments.site/reader036/viewer/2022062423/56649e1b5503460f94b09aec/html5/thumbnails/50.jpg)
Thank You!