ddr4/lpddr4: a practical design methodology · the solution: optimize the design for ber margin...

82
DDR4/LPDDR4: A Practical Design Methodology for High-Speed Memory Systems Stephen Slater WW Business Development Manager High Speed Design Keysight EESof EDA Division April 12th, 2015

Upload: others

Post on 08-Jul-2020

8 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

DDR4/LPDDR4: A Practical Design Methodology for High-Speed Memory Systems

Stephen Slater

WW Business Development Manager

High Speed Design

Keysight EESof EDA Division

April 12th, 2015

Page 2: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

– Context

– Introduction to DDR4

– New Challenges

– New Thinking

– Exploring a Practical Design Methodology for High-Speed Memory

– What’s New and What’s Coming in Advanced Design System (ADS)

Agenda

Keysight EEsof EDA 2

Page 3: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Signal Integrity with ADS, EMPro, and SystemVue

3

Integrated design

flow with

best-in-class,

measurement-

hardened

technologies,

tuned to the needs

of the high speed

digital engineer.

SVU

ADS

PCB Layout

Page 4: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

– Context

– Introduction to DDR4

– New Challenges

– New Thinking

– Exploring a Practical Design Methodology for High-Speed Memory

– What’s New and What’s Coming in Advanced Design System (ADS)

Agenda

Keysight EEsof EDA 4

Page 5: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Introduction: DDR4 memory ramping up in electronic systems

5

Image source: Google Data Center

Image source: AnandTech.com

Image source: theverge.com

Samsung Galaxy S6

with LPDDR4

Performance

computing with

Corsair DDR4 DIMMs

Server Technology with

DDR4 RDIMM/LRDIMM

Page 6: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

DDR4 Memory in the News

6

Page 7: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

100

200

400

800

1600

3200

6400

12800

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

DDR2 DDR3 DDR4DDR5 LPDDR1/2/3/4/5 GDDR2/3/4/5Flash

Mb

/sec p

er

Pin

Year

Extreme Pressure on Interface Speeds

Flash

2xnm

chipsets

force

migration

to DDR3/4

1/28/2015

7

Page 8: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

DDR4 Highlights

8

Specification DDR2 DDR3 DDR4

Voltage 1.8 V 1.5 / 1.35 V 1.2 V

Per Pin Data Rate (Mbps) 400-1066 800-2133 1600-3200

Channel Bandwidth (GBps) 3.2-8.5 6.4-17 12.8-25.6

Component Density 512 MB–2 GB 1-8 GB 2-16 GB

• Lower VDD voltage and Pseudo-Open Drain

(POD) reduces power consumption by 40%

• Internal VREF training performed within the IC

receiver, to optimize VREF level. Retraining at

regular intervals.

• Data lines are calibrated at the IC, to reduce

their skew to the strobe.

• Data bus inversion (DBI) DDR3 Push-Pull DDR4 Pseudo-Open Drain

Image Source: Micron Technology

Page 9: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

– Context

– Introduction to DDR4

– New Challenges

– New Thinking

– Exploring a Practical Design Methodology for High-Speed Memory

– What’s New and What’s Coming in Advanced Design System (ADS)

Agenda

9

Page 10: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

DDR4 High Speed => Less Timing Margin

Image Source: Altera

10

Page 11: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Shrinking Eye due to Package, PCB and Connectors

11

Page 12: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Challenges in DDR4 Design

• Higher data rate means reduced UI and smaller

margins

• Reduced VDDQ to achieve power consumption spec

• Timing margin is eroded by ISI and RJ

• Adding a safety margin creates over-engineered

solutions.

• Solution: JEDEC introduces a bit error rate (BER)

Rx Mask test to the DDR4/LPDDR4 specifications

12

Requires new specs beyond traditional electrical and timing.

Page 13: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Example: [LP]DDR4 Rx Input Masks

• Simpler definition of DRAM requirements and system design

• More compatible with LPDDR4 training procedures

• Eliminates troublesome slew rate derating

• Bit Error Rate (BER) spec recovers timing and noise margin

LPDDR4 receiver requirements defined by masks instead of setup / hold and DC voltage swings:

Similar to USB3 / SATA / PCI-E Gen 2

13

Page 14: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

What is a Mask?

CK / DQS

Crosspoint

Region where DRAM

receiver is most likely

to sample input signal

Voltag

e

Time

Vref

Ideal sample position DRAM Internal Jitter D

RA

M I

nte

rnal N

ois

e

BER 1e-16

• Latch timing

• Clock receiver hysteresis

• DRAM internal skews

• Internal noise converted to jitter

BER = probability that DRAM will

sample outside the mask region

• DAC quantization error

• Comparator hysteresis

• Comparator offset error

• Internal noise and crosstalk

14

Page 15: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Keysight Education Forum

1/28/2015

15

Page 16: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page 16

A Big Business Issue: The Cost of Server Downtime

Infonetics.com

Infonetics Research prepared a market survey in 2015, covering

205 medium and large businesses in North America and discovered

that companies are losing as much as $100 million per year to

downtime related to information and communication technology

(ICT). A median loss of $4M per year, or 0.5% of Revenue.

Infonetics’ survey, The Cost of Server, Application, and Network

Downtime, explores the frequency, length, cost and causes of ICT

downtime, including those related to the network, security, servers,

applications and devices.

• On average 2 outages per month, and 4 system

degradations

• Each event lasting on average 6 hours

• Top strategies to combat this: Speed to diagnosis,

and additional redundancy in systems

Page 17: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

The Prevention: Optimize the design for BER mask margin

Completely out of the question. 1e16 is 10 quadrillion

bits, equivalent to 125,000 Peta Bytes.

What else can we do?

1. Dual-Dirac Extrapolation

17

How long will a SPICE simulation of 1e16 bits take? …

Page 18: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

1. Dual Dirac Extrapolation: Measuring the BER Mask for System Compliance with a Scope

Vcent (one for all

DQs, one for all CAs)

Mask center time

calculated separately for

each signal

About 1E6 bits

accumulated by

scope at DRAM pin

Dual-dirac eye extra-

polation to BER 1E-16

Extrapolated eye must

not touch the mask

18

Page 19: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

The Solution: Optimize the design for BER margin

Completely out of the question. 1e16 is 10 quadrillion

bits, equivalent to 125,000 Peta Bytes.

What else can we do?

1. Dual-Dirac Extrapolation

Requires at least 1e6 bits to make a reasonable extrapolation

with high-confidence.

2. Worst-Case Bit Pattern

19

How long will a SPICE simulation of 1e16 bits take? …

Page 20: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page 20

2. Worst-case Bit Pattern

The major contributors to eye degradation are:

- ISI (Inter Symbol Interference) in the channel

- Frequency dependent Loss

- Impedance mismatch (discontinuities) causing

reflections

- Random Jitter (e.g. thermal noise in ICs)

- Deterministic Jitter (e.g. Cross Talk)

So how many bits do we need to simulate,

in order to be sure our eye diagram includes the worst-

case bit sequence?

Channel Simulation with Impedance

Discontinuities:

Both Series and Stub Resonances

No ISI

Jitter Caused by ISI

Channel Simulation with

Impedance Discontinuities –

Series and Stub resonances

Page 21: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page 21

Worst-case Bit Pattern

The answer to the question is related to the impulse

response of our channel.

This server memory design is 3 slots per channel with 2 ranks of DRAM, all positions occupied

and terminated with on die terminations (ODT).

Total channel length is about 5 inches.

Page 22: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page 22

Worst-case Bit Pattern

The answer to the question is related to the impulse

response of our channel.

Channel Simulation with Impedance

Discontinuities:

Both Series and Stub Resonances

DDR3

800

DDR3

800

At DDR3 800MHz – there are 2 bits in

flight throughout the channel.

By the time the 3rd bit enters the

channel, the energy from the 1st bit

has left the channel

Therefore there are 22 possible bit

combinations that could produce the

worst-case pattern.

Page 23: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page 23

Worst-case Bit Pattern

The answer to the question is related to the impulse

response of our channel.

Channel Simulation with Impedance

Discontinuities:

Both Series and Stub Resonances

At DDR4 2400MHz – there are 6 bits

in flight throughout the channel.

By the time the 7th bit enters the

channel, the energy from the 1st bit

has left the channel

There are 26 possible bit patterns that

could result in the worst-case eye.

That’s not bad, only 64 possible bit

patterns.

Page 24: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page 24

Worst-case Bit Pattern

What about Crosstalk?

Showing the responses of the two

closest crosstalk channels.

By the time the 10th bit enters the

channel, the energy from the 1st bit

has left the channel

There are 29 possible bit patterns that

could result in the worst-case eye

(512).

We should at least consider 4 Xtalk

aggressors (2 either side).

Leaving 2(6+Nxtlk*9) = 4e12 bit patterns!!

Page 25: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page 25

Worst-case Bit Pattern

Can this be right? Let’s check the single-bit response:

We can clearly see the effects of

crosstalk and ringing in the channel

when driven with 3 ideal 2133 MHz

sources

Xtalk

Reflections + Xtalk

Page 26: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

The Solution: Optimize the design for BER margin

Completely out of the question. 1e16 is 10 quadrillion

bits, equivalent to 125,000 Peta Bytes.

What else can we do?

1. Dual-Dirac Extrapolation

Requires at least 1e6 bits to make a reasonable extrapolation

with high-confidence.

2. Worst-Case Bit Pattern

If you can’t be sure you’ve captured the worst-case bit pattern,

all is for nothing. Still doesn’t include the statistical spread of Rj.

3. New Thinking from the world of SERDES

26

How long will a SPICE simulation of 1e16 bits take? …

Page 27: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

– Context

– Introduction to DDR4

– New Challenges

– New Thinking

– Exploring a Practical Design Methodology for High-Speed Memory

– What’s New and What’s Coming in Advanced Design System (ADS)

Agenda

27

Page 28: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Introducing: The rigorous approach of the Statistical Eye technique

28

Ultra low BER contours in seconds not days

Step 1: Run a short transient

simulation to get the impulse

response of the channel, Tx,

Rx

Step 2: Construct the eye

metrics from the impulse

response and stochastic

properties of a conceptually

infinite non-repeating bit

pattern

Inherently capturing the worst-case eye at the desired BER

Page 29: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

DDR Bus Simulator in ADS 2015.01

29

• Rigorous statistical calculations for DQ and DQS eye probabilities at arbitrarily low BER

• Eliminate design uncertainty, includes the statistical spread of Rj

• Check eye contours at target BER (10-16) against DDR4 Rx mask

• Account for crosstalk between signal lines

Page 30: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Comprehensive Results in Seconds!

30

Page 31: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Built-in DQ and DQS Driver Models

31

• Driver de-emphasis models

• Account for asymmetry between rise and fall edges

• Physical jitter model. Transitions in stimulus are shifted by Tx jitter

Page 32: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Built-in DQ and DQS Receiver Models

32

Continuous-time-linear-equalizer

(CTLE) model

Page 33: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Using Other Device Models

33

• Supports IBIS, netlist, and Verilog-A models for driver and receiver

• Asymmetric rise and fall edges in these models are also captured

• Allow mix-and-match between built-in, IBIS, circuit and Verilog-A models

IBIS Tx driver

Netlist Rx

IBIS Rx

Use built-in as ideal

pattern generator

Page 34: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

BER Mask Margin Measurements

34

• Comprehensive margin measurements versus DDR4 Rx mask

• Timing and voltage margins between mask and contour at target BER are

reported for each mask corner

Page 35: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

BER Mask Margin Measurements

35

• Comprehensive margin measurements versus DDR4 Rx mask

• Timing and voltage margins between mask and contour at target BER are

reported for each mask corner

• Minimum margin metric e.g. if ringback is present

Page 36: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

DDR Bus Simulator is Ideal for Design Space Exploration

36

Parameter Sweep Using Batch Simulation

Page 37: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

37

Page 38: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Batch Simulation Export

38

• Support sweep on design variables, data files (e.g. Touchstone file) and corners

• Automatic generation of spreadsheet summary for external design of experiments tools

Page 39: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

– Context

– Introduction to DDR4

– New Challenges

– New Thinking

– Exploring a Practical Design Methodology for High-Speed Memory

– What’s New and What’s Coming in Advanced Design System (ADS)

Agenda

40

Page 40: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

Step 1. Begin from a known starting point

• Example of RDIMM topology with 3 slots per channel, from a reference design provided by an

IC vendor

• Apply new DDR Bus Simulation (a DDR-focused statistical Channel Sim) to evaluate the

design’s baseline performance

• Simulate S-parameters and TDR of Channel

41

Page 41: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

DDR4 KEF

DesignCon 2015 42

4/22/2015

Page 42: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

Step 2. Evaluate individual components from chosen vendors

• Use SnP utility to quickly import files for simulation. Evaluate connectors,

using S-Parameter simulation to view loss, pin-pin isolation, and post-

processed impulse responses

• Evaluate DIMMs and package effects, using Channel Simulation. Looking at

the eye for each component under investigation

• Compare to those provided by the reference design

DDR4 KEF

DesignCon 2015 43

4/22/2015

DC output

Page 43: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

Step 3. Evaluate PCB Stack-Up Technology

• Using the Controlled-Impedance Line Designer (CILD) to first

understand the reference design stackup.

• Use the built-in analysis tools to Sweep, Optimize or perform

statistics on impedance (Single-Ended and Differential) versus

a multitude of variables.

• The new line-types that are created, can then be placed into a

System simulation with the transmitter and receivers in place

to optimize system metrics such as BER margin.

DDR4 KEF

DesignCon 2015 44

4/22/2015

DC output New LineType

definition has

parameterized

values available

on the schematic

Page 44: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

DDR4 KEF

DesignCon 2015

- Microstrip Single-Ended

- Microstrip Edge-Coupled

- Microstrip Broadside-Coupled

- Stripline Single-Ended

- Stripline Edge-Coupled

- Stripline Broadside-Coupled

- Coplanar Waveguide Single-Ended

- Coplanar Waveguide Edge-Coupled

Controlled Impedance Line Designer

Page 45: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

DDR4 KEF

DesignCon 2015

• CILD used to optimize tracks

to 40 Ohm @2400 MHz

• ISI due to impedance

mismatch is reduced

• BER Contour @1e-16 now

more open and passes mask

test, on worst DQ line.

Page 46: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

Step 5. Finalize Pre-Layout Design and Design Constraints

• By using Batch Simulation,

Monte Carlo or DOE,

multiple variables can be

swept to understand the

sensitivity to small design

variations.

• Best choices for On Die

Terminations (ODT) can be

selected from the sweep

values that result in the

most open eyes.

• The ODT is swept via the

IBIS model name parameter

• Design constraints are

written for layout engineer

DDR4 KEF

DesignCon 2015 47

4/22/2015

Page 47: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

DDR4 KEF

DesignCon 2015

• Design Of Experiment (DOE) provides insight into the sensitivity

of a System Metric to small changes in design variables.

• ParamSweep/ BatchSim/ MonteCarlo can be efficiently handled

as parallel processes on Linux platforms, using the built-in

Simulation Manager.

• Results can be collated into a single dataset for analysis

Page 48: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

Step 6. Layout and Post-Layout EM simulation

• The board layout is performed in an Enterprise PCB tool, and then imported to

ADS. The critical nets are cookie-cut, ready for analysis with the Momentum 3D-

Planar EM simulator.

• EM model of the channel is placed into the original system simulation. The

impact of imperfect current return paths, via coupling, via stub effects, are all

taken into account.

• Using the DDR Bus Simulator to calculate BER contours, ensures the robustness

of the design

DDR4 KEF

DesignCon 2015 49

4/22/2015

DC output

Zuken CR-8000/5000

Cadence Allegro

PCB/PD/SiP

ODB++ Mentor, any other CAE

tool

Page 49: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

Step 7. Power Integrity

• The ground and power planes of the board can be simulated with an EM

analysis for Power Integrity. Here we are looking to see if there are areas of

high-current density, at the switching frequencies of the Tx/Rx and VRM.

• Decoupling capacitors can be placed, and after EM simulation the values of

the capacitors can be optimized to dampen resonances below the target

impedance profile for the power plane.

DDR4 KEF

DesignCon 2015 50

4/22/2015

DC output

Page 50: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

DDR4 KEF

DesignCon 2015 51

IBIS v5.0 Power Aware -Synchronous Switching Noise DDR4-2400 in Write Mode DeCap Optimization SI/PI Analysis

Xilinx UltraScale

PODL12 Driver

IBIS V5.0

Power-Aware

Micron

256Mx16,FBGA

[MT40A256M16HA]

IBIS V5.0

Power-Aware

Xilinx FPGA

(MCH)

Micron

DDR Pkg

8 Layer PCB

DeCap Optimization to

reduce SSN

Page 51: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page 52

DDR4 KEF

DesignCon 2015

DeCap=10pF DeCap=Multiple DeCaps (0.1~0.01uF,1.0uF,2.2uF)

1.2GHz

2.4GHz

DDR4-2400 in Write Mode

DeCap Optimization SI/PI Analysis

Page 52: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

Step 8. Compliance Test for Final Design Sign-Off

DDR4 KEF

DesignCon 2015 53

4/22/2015

DC output

Simulated Waveforms saved as .h5 files for

Infiniium Offline

Real test bench Infiniium oscilloscope

• Before committing the board to prototype, a final test can

be performed simulating the transient waveforms for DQ,

CA, CTL lines.

• The waveforms are then used within the Infiniium Offline

software, where the DDR4 Compliance Application is

launched to performs compliance tests.

• This is the same compliance test that is used with the

final board under test on the bench, so there is no

argument as to whether the simulation compliance test

has missed anything critical.

Page 53: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

DDR4 KEF

DesignCon 2015 54

DDR4-2400 DeCap Analysis – Compliance Test Results

DeCap=10pF DeCap= Multiple DeCaps (0.1~0.01uF,1.0uF,2.2uF) Waveform Region where

Test fails compliance

Page 54: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Danger of Disparate Compliance Tests… 1) from EDAco:

55

…and 2) from ScopeCo:

Bridging script generates report Design and simulate

Prototype in lab

Page 55: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Leverage Keysight’s Unique Position with EDA+Instruments

– Keysight’s unique

Design Verification

methodolgy

56

Page 56: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

A Practical Design Methodology for DDR4

DDR4 KEF

DesignCon 2015 57

4/22/2015

DC output

Technical Joint-Paper Presentation:

Xilinx & Keysight Technologies

P.Niu, F. Rao, G. Otonari, J. Wang,

N. Kamdar, Y. Wang

Page 57: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

– Context

– Introduction to DDR4

– New Challenges

– New Thinking

– Exploring a Practical Design Methodology for High-Speed Memory

– What’s New and What’s Coming in Advanced Design System (ADS)

Agenda

58

Page 58: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

New! Design Flow Settings

– HSD “personality”

– Rerun the wizard any time

you like (ADS 2015.01 Main

Window menu bar“Help”)

– Nothing is “taken away”

– Sets suggested defaults,

which can be overridden

locally or globally

59

One-time setup, remembered across versions

Page 59: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

New in ADS 2015.01: Favorite Palettes

60

Design Flow Settings wizard suggests defaults that you can easily

tweak

Page 60: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Usability: HSD Toolbar in ADS 2015.01

61

Quick Access to popular HSD capabilities. Hover over a for tool tip

Enhanced in ADS 2015.01

New in ADS 2015.01

New in ADS 2015.01 Enhanced in ADS 2015.01

Page 61: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Insert SnP Component

Port names in file are

displayed

62

Page 62: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Ease of Learning: 52 One-page Tips

63

http://signal-integrity.blogs.keysight.com/2014/signal-integrity-qa-collection-using-keysight-ads/

Page 63: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Matching Simulation and Measurements to 40GHz

64

T-LINE MODEL

MEASURE

EM MODEL

1.5 inches, 10mil width, 342 mil Stub

CMP28 Starter Kit

Page 64: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Large Boards and Packages Inside ADS Layout

– In previous releases, we had to cookie cut

the board outside of ADS Layout:

• Menu picks in Allegro (ADFI flow)

• W2324 High-capacity Layout Pre-

processor (aka ACS Netex, ODB++ flow)

– New in ADS 2015.01! Complete workflow for

EM modeling

• Import large ODB++ boards

• Pan and zoom quickly

• Net-based connectivity and navigation

• Enhancements to ADS cookie cutter

• EM setup for EM-based models

65

New in ADS 2015.01

Xilinx Virtex 7 eval

Page 65: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Usability: Layer-Specific and Net-Specific EM settings

– In previous releases, it would require

manual manipulation to adapt the EM

settings to trade-off accuracy with

simulation time.

– New in ADS 2015.01! Complete workflow

for EM modeling

• Global selection of mesh and model

detail

• Ability to select model detail to be used

for each Net (Net-Specific)

• Ability to select the meshing options

for each Net

• Ability to select mesh and model detail

options by layer (Layer-Specific)

66

Page 66: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

PAM4 - PHY Simulation and Analysis

System Level Analysis and Model

Development in SystemVue

– System Level PHY simulation

– Confirm TX and RX architecture

– Includes Optical Link Capability

– Direct Creation of IBIS-AMI Models

– FlexDCA for Measurements

Page 67: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

PAM4 - Channel Simulation and Analysis

Channel Simulation in ADS

– IBIS AMI Models of TX and RX

– Uses Class Leading ADS Channel

Simulator

– Fast and Accurate insights

– 1e6 Symbols in Minutes not Hours

– Link by file to FlexDCA

68

Page 68: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Electrical/Optical/Electrical - PHY Simulation and Analysis

Optical Links through IBIS-AMI and

Channel Simulation in ADS

– Use IBIS-AMI modeling for Optical Links

– Allows single analysis of E-O-E PHY

– Detailed Optical Modeling for:

• VCSEL Driver

• VCSEL

• MMF

• PIN

• TIA

– “Will my existing NRZ Optical Link work

with PAM4?”

69

Device Non-Linearity

Page 69: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc. 70

Rack Mount Chassis with 4-layer PCB

EMI Calculation:

Predict Radiated Emission

EMI/EMC with FD-TD

EMC Characterization:

Freq dependency on the noise

received on power plane

Differential Pair

E-field @ 2.235GHz

E-field @ 5.68GHz

Will my chassis meet specs?

Page 70: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

– Context

– Introduction to DDR4

– New Challenges

– New Thinking

– Exploring a Practical Design Methodology for High-Speed Memory

– What’s New and What’s Coming in Advanced Design System (ADS)

Agenda

71

Page 71: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

DDR4 Workflow Summary:

↓Spec \ Task→ Design space

exploration

Pre-fabrication sign-

off compliance

Post-fabrication

(hardware prototype)

compliance

Electrical and timing

via waveforms

Not recommended

because of speed issue

W2351 DDR4

Compliance Test

Bench, and Infiniium

Offline with compliance

app

Infiniium oscilloscope

and compliance app

...via statistical eye W2309 ADS DDR Bus

Simulator

W2309 ADS DDR Bus

Simulator Not applicable

…via dual Dirac

extrapolation on

waveforms

Not recommended

because of

speed/accuracy issue

Not recommended

because of

speed/accuracy issues

Infiniium oscilloscope

and compliance app

72

Ne

w B

ER

Co

nto

ur s

pe

c…

Page 72: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Appendix

73

Page 73: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

74

Page 74: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

75

Page 75: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

76

Page 76: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

77

Page 77: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc. 78

Page 78: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

79

Page 79: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page

Power-aware IBIS Models with Pre-driver Current

– New in ADS 2015.01: Pre-standard extension to account for pre-driver current draw

– So-called “over-clocking problem”

– Pre-driver current draw occurs before the bit period begins

– Overlapping composite current ramps in each unit interval

80

Proposed to IBIS Open Forum

Pre

-

drive

r

Drive

r

Page 80: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

0.01

0.1

1

10

100

1000

10000

8 16 32 64 128 256 512 1024 2048 4096 8192

32

64

128

256

512

1TBGbits/sec

per Pin

# I/O Pins

Channel

Throughput

(GBytes/sec)

Wide-IO S

erial

Current

Technology

Memory Wall

‘15

‘10 ‘12

‘06 ‘04

‘02

Materials and

Process Physics

Signal

Propagation

Physics

Technology Barriers for Memory

Keysight

Education

Forum

81

1/28/2015

Page 81: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

Tough Sledding to Achieve 3D Integration

Keysight

Education

Forum

Year

Imp

ed

imen

ts t

o 3

D S

ilicon

LPDDR2

‘10 ‘11 ‘12 ‘13 ‘14 ‘15 ‘16 ‘17 ‘18

500

10

50

200

Wide-IO

Wide-IO 2

HMC

LPDDR3 LPDDR4 LPDDR5?

LPDDR6?

1/28/2015

82

Page 82: DDR4/LPDDR4: A Practical Design Methodology · The Solution: Optimize the design for BER margin Completely out of the question. 1e16 is 10 quadrillion bits, equivalent to 125,000

Page Copyright 2014 Keysight Technologies Inc.

We Need to Think Differently All Around

Keysight

Education

Forum

Power x Cost Budget

Sp

eed

x In

terc

on

nect

Com

ple

xity

Graphics

Computer

Embedded /

Mobile / Flash

Signal

Integrity

Threshold

“Low Speed”

“High Speed”

Jitter &

Noise

Threshold

“Serial Speed”

Datacom signaling (jitter)

Scopes, BERTS, Protocol

Attention to physical design

Still traditional “digital” system

Pushbutton place

and route

“Hyper Speed”

Eye now invisible

New conceptual model needed

Impulse

Response

Threshold

1/28/2015

83