safely overclocking flash i/o in ssds

17
Safely Overclocking Flash I/O in SSDs Kai Zhao, SanDisk Rino Micheloni, PMC-Sierra Tong Zhang, Rensselaer Polytechnic Institute Flash Memory Summit 2015 Santa Clara, CA 1

Upload: sandisk

Post on 21-Aug-2015

19 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Safely Overclocking Flash I/O in SSDs

Safely Overclocking Flash I/O in SSDs

Kai Zhao, SanDisk

Rino Micheloni, PMC-Sierra

Tong Zhang, Rensselaer Polytechnic Institute

Flash Memory Summit 2015Santa Clara, CA 1

Page 2: Safely Overclocking Flash I/O in SSDs

Outline

• Motivation – ECC capability may be under utilized

• Benefit – improve system performance, especially for read operations

• Reliability – leverage LDPC codes to mitigate the impact of overlocking

Flash Memory Summit 2015Santa Clara, CA 2

Page 3: Safely Overclocking Flash I/O in SSDs

Page Reliability Variation

• Larger reliability variations for chips even from the same batch

• Raw BER varies dramatically under different P/E cycles and with different retention time

Flash Memory Summit 2015Santa Clara, CA 3

0 5 10 15 20 25 3010

-5

10-4

10-3

10-2

Retention Time (day)

Ra

w B

ER

1000 P/E cycles5000 P/E cycles

C

D

B

A

0 5 10 150.00

0.05

0.10

0.15

Pa

ge

Co

un

t F

req

ue

ncy

Number of page errors

1k P/E cycles, 2 days retention

0 200 4000.00

0.05

0.10

0.15

Point DPoint CPoint B

5k P/E cycles, 30 days retention

Point A

Page 4: Safely Overclocking Flash I/O in SSDs

Data Link Overclocking

• Worst-case based ECC is under utilization in most of the time• We could trade reliability (when raw BER is low) for performance• One possible way – Overclocking on data link

Flash Memory Summit 2015Santa Clara, CA 4

Page 5: Safely Overclocking Flash I/O in SSDs

Potential Benefits

• Total latency = Operation (read/program) latency +

Data transfer latency + ECC decoding latency

• ECC decoding latency is relatively small

• Transfer latency is comparable to read latency

• Smaller transfer latency could significantly improve read performance

Flash Memory Summit 2015Santa Clara, CA 5

𝜏 h𝑝 𝑦−𝑎𝑐𝑐=𝜏𝑜𝑝+𝜏 𝑥𝑓𝑒𝑟+𝜏 𝐸𝐶𝐶

Page 6: Safely Overclocking Flash I/O in SSDs

Practical Considerations

• Characterize errors caused by overclocking

• Errors in the overclocked data link

• Errors in the NAND flash cell array

• Overclocking for read and/or write path

• Evaluate read/write path separately

• Permanent errors on write path

• Adaptation of overclocking in late lifetime of the drive

• Read retry

• Multiple data transfer rate

Flash Memory Summit 2015Santa Clara, CA 6

Page 7: Safely Overclocking Flash I/O in SSDs

Feasibility of Overclocking

• ONFI 2.2 compliant chips supporting data transfer rate up to 166 MBPS according to the datasheet

• Data link overclocked up to 275 MBPS

Flash Memory Summit 2015Santa Clara, CA 7

200 212.5 225 237.5 250 262.5 27510

-6

10-5

10-4

10-3

10-2

Transfer Data Rate (MHz)

RB

ER

Read Overclocking: Plane 0Read Overclocking: Plane 1Write Overclocking: Plane 0Write Overclocking: Plane 1

<

Different overclocking capability for the 2 planes

Page 8: Safely Overclocking Flash I/O in SSDs

System Performance for Read Overclocking

Flash Memory Summit 2015Santa Clara, CA 8

WebSearch prj mds hm Financial0.0

0.5

1.0

1.5

Nor

mal

ized

Rea

d R

espo

nse

Tim

e 166 MBps 200 MBps 250 MBps 275 MBps

WebSearch prj mds hm Financial0.00

0.05

0.10

0.15

0.20

0.25

Rea

d R

espo

nse

Tim

e R

educ

tion 4 kB 8 kB 16 kB

•Read response time, page size is 8 kB

•Data link overclocked up to 275 MBPS

•Comparison of average read response time reduction with respect to page size

•Data transfer rate is 275 MBPS

Simulator: DiskSim with SSD addonTraces: MSR Cambridge trace, UMASS trace repository

Page 9: Safely Overclocking Flash I/O in SSDs

WebSearch prj mds hm Financial0.85

0.90

0.95

1.00

1.05

Nor

mal

ized

Rea

d R

espo

nse

Tim

e

250 MBps Read-Retry, 10% retry probability Read-Retry, 5% retry probability) Multi-Transfer-Rate

Late Lifetime Performance

• Average read response time under different adaption strategies• Read retry vs. multi-transfer-rate for different planes

• Read retry doesn’t sense the flash array, just re-transfer data• Direct lower transfer speed for the worse plane

Flash Memory Summit 2015Santa Clara, CA 9

2500 3000 3500 4000 4500 5000

0.9

0.95

1

1.05

1.1

Number of P/E CyclesN

orm

aliz

ed

Re

ad

Re

spo

nse

Tim

e

WebSearchFinancialhmprj

Read retry only

Page 10: Safely Overclocking Flash I/O in SSDs

Late Lifetime Performance

• Average read response time when combining read retry and multi-transfer-rate

• After 4000 P/E cycles, only lower the transfer rate for Plane 0

Flash Memory Summit 2015Santa Clara, CA 10

2500 3000 3500 4000 4500 5000

0.9

0.95

1

1.05

Number of P/E Cycles

No

rma

lize

d R

ea

d R

esp

on

se T

ime

WebSearchFinancialhmprj

Page 11: Safely Overclocking Flash I/O in SSDs

Link Error Mitigation using LDPC

• Effectiveness of the proposed overclocking design strategy depends on how well the ECC can handle the extra errors

• Leveraging LDPC codes and data link error characteristics to• Improving the tolerance to overclocking induced errors, which can lead to

lower data re-transfer rate

• Reducing the average number of LDPC decoding iterations, which can lead to lower power consumption and higher decoding throughput

• Two design techniques• Overclocking-aware LLR Calculation

• Inter-plane interleaving

Flash Memory Summit 2015Santa Clara, CA 11

Page 12: Safely Overclocking Flash I/O in SSDs

Inter-plane Interleaving

• Each plane has its own page register and I/O circuitry

• Performance after overclocking may be varying

• In our measurement, Plane 0 is much more vulnerable to overclocking-induced I/O errors

Flash Memory Summit 2015Santa Clara, CA 12

0 100 2000.0

0.1

0.2

0.3

Plane 0Plane 1

Pag

e C

ount

Fre

quen

cy

Number of Errors in One Page

Read Overclocking 275 MBps

Page 13: Safely Overclocking Flash I/O in SSDs

• The input of LDPC code decoding is the estimated log-likelihood ratio (LLR)

• Binary Asymmetric Channel (BAC). Different error characteristics for each wire and plane

• In the overclocked SSD read channel, LLR can be expressed as

Overclocking-aware LLR Calculation

Flash Memory Summit 2015Santa Clara, CA 13

LLR (𝑥𝑘)=𝑝 (𝑥𝑘=0∨𝑦 𝑘)𝑝 (𝑥𝑘=1∨𝑦𝑘 )

LLR (𝑥𝑘)=𝛴𝑖=0

𝑙 𝑝𝑚(𝑙 ) ( 𝑦𝑘∨𝑣𝑘=𝑖 )𝑝 (𝑣𝑘=𝑖∨𝑥𝑘=0 , 𝑐 )

𝛴 𝑖=0𝑙 𝑝𝑚

( 𝑙 ) (𝑦 𝑘∨𝑣𝑘=𝑖 )𝑝 (𝑣𝑘=𝑖∨𝑥𝑘=1 , 𝑐 )

Page 14: Safely Overclocking Flash I/O in SSDs

Error Pattern Characterization

• Plane 0 suffers from more overclocking induced errors, moreover, these errors non-uniformly distributed across the 8-bit bus.

• The pattern ‘0’=>’1’ dominates the errors from Plane 0, while on Plane 1, patterns ‘0’=>’1’ and ‘1’=>’0’ evenly distributed.

Flash Memory Summit 2015Santa Clara, CA 14

0

0.2

0.4

0.6

0.8

1

1.2

Rel

ativ

e Fr

eque

ncy Plane 0

'0' => '1'

Plane 0'1' => '0'

Plane 1'1' => '0'

Plane 1'0' => '1'

Error Pattern

Page 15: Safely Overclocking Flash I/O in SSDs

Simulation Results

• LDPC decoding power reduction• The decoding power is determined by the number of iterations• Data re-transfer rate reduction• Data link overclocked to 275 MBPS

Flash Memory Summit 2015Santa Clara, CA 15

0 1000 2000 3000 40000

20

40

60

80

100

120

LDP

C d

ecod

ing

pow

er r

educ

tion

(%)

P/E cycles

Overclocking-aware LDPC decodingInter-plane interleavingCombined

0 1000 2000 3000 4000 500010

-6

10-4

10-2

100

Re-

tran

sfer

pro

babi

lity

P/E cycles

Conventional LDPC decoding

Overclocking-aware LDPC decoding

Inter-plane interleaving

Combined

Page 16: Safely Overclocking Flash I/O in SSDs

Conclusion

• Overclock data link to fully utilize the ECC capability

• Read performance can be significantly improved

• Mitigate the impact of overclocking by leveraging LDPC codes and the error characterization.

Flash Memory Summit 2015Santa Clara, CA 16

Page 17: Safely Overclocking Flash I/O in SSDs

Flash Memory Summit 2015Santa Clara, CA 17

Thank you