stochastic analysis on raid reliability for solid-state drives · stochastic analysis on raid...

arX

iv:1

304.

1863

v1 [

cs.P

F]

6 A

pr 2

013

Stochastic Analysis on RAID Reliability for Solid-State Drives

Yongkun Li, Patrick P. C. Lee, John C. S. LuiThe Chinese University of Hong Kong

Email:[email protected],{pclee,cslui}@cse.cuhk.edu.hk

Abstract—Solid-state drives (SSDs) have been widely de-ployed in desktops and data centers. However, SSDs sufferfrom bit errors, and the bit error rate is time dependentsince it increases as an SSD wears down. Traditional storagesystems mainly use parity-based RAID to provide reliabilityguarantees by striping redundancy across multiple devices, butthe effectiveness of RAID in SSDs remains debatable as parityupdates aggravate the wearing and bit error rates of SSDs.In particular, an open problem is that how different paritydistributions over multiple devices, such as the even distribu-tion suggested by conventional wisdom, or uneven distributionsproposed in recent RAID schemes for SSDs, may influence thereliability of an SSD RAID array. To address this fundamentalproblem, we propose the first analytical model to quantify thereliability dynamics of an SSD RAID array. Specifically, wedevelop a “non-homogeneous” continuous time Markov chainmodel, and derive the transient reliability solution. We validateour model via trace-driven simulations and conduct numericalanalysis to provide insights into the reliability dynamicsof SSDRAID arrays under different parity distributions and subje ctto different bit error rates and array configurations. Designerscan use our model to decide the appropriate parity distributionbased on their reliability requirements.

Keywords-Solid-state Drives; RAID; Reliability; CTMC;Transient Analysis

I. Introduction

Solid-state drives (SSDs) emerge to be the next-generationstorage medium. Today’s SSDs mostly build on NAND flashmemories, and provide several design enhancements overhard disks including higher I/O performance, lower energyconsumption, and higher shock resistance. As SSDs continueto see price drops nowadays, they have been widely deployedin desktops and large-scale data centers [10], [14].

However, even though enterprise SSDs generally providehigh reliability guarantees (e.g., with mean-time-between-failures of 2 million hours [17]), they are susceptible towear-outs and bit errors. First, SSDs regularly perform eraseoperations between writes, yet they can only tolerate alimited number of erase cycles before wearing out. Forexample, the erasure limit is only 10K for multi-level cell(MLC) SSDs [5], and even drops to several hundred for thelatest triple-level cell (TLC) SSDs [13]. Also, bit errors arecommon in SSDs due to read disturbs, program disturbs, andretention errors [12], [13], [27]. Although in practice SSDsuse error correction codes (ECCs) to protect data [8], [26],the protection is limited since the bit error rate increasesasSSDs issue more erase operations [12], [27]. We call a post-ECC bit error anuncorrectable bit error. Furthermore, bit

errors become more severe when the density of flash cellsincreases and the feature size decreases [13]. Thus, SSDreliability remains a legitimate concern, especially whenanSSD issues frequent erase operations due to heavy writes.

RAID (redundant array of independent disks) [31] pro-vides an option to improve reliability of SSDs. Using parity-based RAID (e.g., RAID-4, RAID-5), the original data isencoded intoparities, and the data and parities are stripedacross multiple SSDs to provide storage redundancy againstfailures. RAID has been widely used in tolerating harddisk failures, and conventional wisdom suggests that paritiesshould beevenly distributed across multiple drives so asto achieve better load balancing, e.g., RAID-5. However,traditional RAID introduces a different reliability problemto SSDs since parities are updated for every data write andthis aggravates the erase cycles. To address this problem,authors in [2] propose a RAID scheme calledDiff-RAIDwhich aims to enhance the SSD RAID reliability by keepingunevenparity distributions. Other studies (e.g., [16], [20]–[22], [25], [30]) also explore the use of RAID in SSDs.

However, there remain open issues on the proper architec-ture designs of highly reliable SSD RAID [19]. One specificopen problem is how different parity distributions generallyinfluence the reliability of an SSD RAID array subject todifferent error rates and array configurations.In other words,should we distribute parities evenly or unevenly acrossmultiple SSDs with respect to the SSD RAID reliability?This motivates us to characterize the SSD RAID reliabilityusing analytical modeling, which enables us to readily tunedifferent input parameters and determine their impacts onreliability. However, analyzing the SSD RAID reliability ischallenging, as the error rates of SSDs aretime-varying.Specifically, unlike hard disk drives in which error arrivalsare commonly modeled as a constant-rate Poisson process(e.g., see [28], [33]), SSDs have an increasing error arrivalrate as they wear down with more erase operations.

In this paper, we formulate a continuous time Markovchain (CTMC) model to analyze the effects of differentparity placement strategies, such as traditional RAID-5 andDiff-RAID [2], on the reliability dynamics of an SSD RAIDarray. To capture the time-varying bit error rates in SSDs, weformulate anon-homogeneousCTMC model, and conducttransient analysis to derive the system reliability at anyspecific time instant. To our knowledge, this is thefirstanalytical study on the reliability of an SSD RAID array.

In summary, this paper makes two key contributions:

http://arxiv.org/abs/1304.1863v1

• We formulate a non-homogeneous CTMC model tocharacterize the reliability dynamics of an SSD RAIDarray. We use the uniformization technique [7], [18],[32] to derive the transient reliability of the array.Since the state space of our model increases withthe SSD size, we develop optimization techniques toreduce the computational cost of transient analysis. Wealso quantify the corresponding error bounds of theuniformization and optimization techniques. Using theSSD simulator [1], we validate our model via trace-driven simulations.

• We conduct extensive numerical analysis to comparethe reliability of an SSD RAID array under RAID-5and Diff-RAID [2]. We observe that Diff-RAID, whichplaces parities unevenly across SSDs, only improvesthe reliability over RAID-5 when the error rate is nottoo large, while RAID-5 is reliable enough if the errorrate is sufficiently small. On the other hand, when theerror rate is very large, neither RAID-5 nor Diff-RAIDcan provide high reliability, so increasing fault tolerance(e.g., RAID-6 or a stronger ECC) becomes necessary.

The rest of this paper proceeds as follows. In Section II,we formulate our model that characterizes the reliabilitydynamics of an SSD RAID array, and formally define thereliability metric. In Section III, we derive the transientsystem state using uniformization and some optimizationtechiniques. In Section IV, we validate our model via trace-driven simulations. In Section V, we present numericalanalysis results on how different parity placement strategiesinfluence the RAID reliability. Section VI reviews relatedwork, and finally Section VII concludes.

II. System Model

It is well known that RAID-5 is effective in providingsingle-fault tolerance for traditional hard disk storage.Itdistributes parities evenly across all drives and achievesload balancing. Recently, Balakrishnanet al. [2] report thatRAID-5 may result in correlated failures, and hence poorreliability, for SSD RAID arrays if SSDs are worn outat the same time. Thus, they propose a modified RAIDscheme calledDiff-RAID for SSDs. Diff-RAID improvesRAID-5 through (i) distributing parties unevenly and (ii)redistributing parities each time when a worn-out SSD isreplaced so that the oldest SSD always has the most paritiesand wears out first. However, it remains unclear whetherDiff-RAID (or placing parities unevenly across drives) reallyimproves the reliability of SSD RAID over RAID-5 in allerror patterns, as there is a lack of comprehensive studies onthe reliability dynamics of SSD RAID arrays under differentparity distributions.

In this section, we first formulate an SSD RAID array,then characterize the age of each SSD based on the age of thearray (we will formally define the concept of age in later partof this section). Lastly, we model the error rate based on the

age of each SSD, and formulate anon-homogeneous CTMCto characterize the reliability dynamics of an SSD RAIDarray under various parity distributions, including differentparity placement distributions like RAID-5 or Diff-RAID.Table I lists the major notations used in this paper.

Specific Notations of SSDM : Erasure limit of each block (e.g., 10K)B : Total number of blocks in each SSDλi(t) : Error rate of a chunk in SSDi at time tSpecific Notations of RAID ArrayN : Number of data drives (i.e., an array hasN + 1 SSDs)S : Total number of stripes in an SSD RAID arraypi : Fraction of parity chunks in SSDi, and

∑N

i=0 pi = 1k : Total number of erasures performed on SSD RAID array

(i.e., system age of the array)ki : Number of erasures performed on each block of SSDi

(i.e., age of SSDi)T : Average inter-arrival time of two consecutive erasure

operations on SSD RAID arrayπj(t) : Probability that the array hasj stripes that contain

exactly one erroneous chunk each, (0 ≤ j ≤ S)πS+1(t): Probability that at least one stripe of the array contains

more than one erroneous chunk, so∑S+1

j=0 πj(t) = 1R(t) : Reliability at time t, i.e., probability that no data loss

happens until timet, R(t) =∑S

j=0 πj(t)

Table I: Notations.

A. SSD RAID Formulations

An SSD is usually organized inblocks, each of whichtypically contains 64 or 128pages. Both read and program(write) operations are performed in unit of pages, and eachpage is of size 4KB. Data can only be programmed tocleanpages. SSDs use anerase operation, which is performedin unit of blocks, to reset all pages in a block into cleanpages. To improve write performance, SSDs useout-of-placewrites, i.e., to update a page, the new data is programmed toa clean page while the original page is marked as invalid. AnSSD is usually composed of multiplechips (or packages),each containing thousands of blocks. Chips are independentof each other and can operate in parallel. We refer readersto [1] for a detailed description about the SSD organization.

We now describe the organization of an SSD RAID arraythat we consider, as shown in Figure 1. We consider thedevice-level RAID organization where the array is composedof N+1 SSDs numbered from 0 toN . In this paper, weaddress the case where the array is tolerable against a singleSSD failure, as assumed in traditional RAID-4, RAID-5schemes and the modified RAID schemes for SSDs [2], [16],[20]–[22], [25], [30]. Each SSD is divided into multiple non-overlappingchunks, each of which can be mapped to oneor multiple physical pages. The array is further divided intostripes, each of which is a collection ofN +1 chunks fromtheN + 1 SSDs. Within a stripe, there areN data chunks,and one parity chunk encoded from theN data chunks.

Figure 1: Organization of an SSD RAID array.

We call a chunk anerroneous chunkwhen uncorrectable biterrors appear in that chunk; or acorrect chunkotherwise.Since we focus on single-fault tolerance, we require thateach stripe contains at most one erroneous chunk withoutdata loss so that it can be recovered from other survivingchunks in the same stripe.

Suppose that each SSD containsB blocks, and the arraycontainsS stripes (i.e.,S chunks per SSD). For simplicity,we assume that allS stripes are used for data storage.To generalize our analysis, we organize parity chunks inthe array according to some probability distribution. We letSSD i contain a fractionpi of parity chunks. In the specialcase of RAID-5, parity chunks areevenly placedacross alldevices, sopi = 1

N+1 for all i if the array consists ofN+1

drives. For Diff-RAID,pi’s do not need to be equal to1N+1 ,

but only need to satisfy the condition of∑N

i=0 pi = 1.Each block in an SSD can only sustain a limited number

of erase cycles, and is supposed to be worn out after thelimit. We denote the erasure limit byM , which correspondsto the lifetime of a block. To enhance the durability of SSDs,efficient wear-leveling techniques are often used to balancethe number of erasures across all blocks. In this paper, weassume that each SSD achieves perfect wear-leveling suchthat every block has exactly the same number of erasures.Let ki (≤ M ) be the number of erasures that have beenperformed on each block in SSDi, where0 ≤ i ≤ N . Wedenoteki as theageof each block in SSDi, or equivalently,the age of SSDi when perfect wear-leveling is assumed.When an SSD reaches its erasure limit, we assume that itis replaced by a new SSD. For simplicity, we treatki asa continuous value in[0,M ]. Let k be the total number oferase operations that the whole array has processed, and wecall k the system ageof the array.

B. SSD Age Characterization

In this subsection, we proceed to characterize the ageof each SSD for a given RAID scheme. In particular, wederiveki, denoting the age of SSDi, when the whole arrayhas already performed a total ofk erase operations. This

characterization enables us to model the error rate in eachSSD accurately (see Section II-C). We focus on two RAIDschemes: traditional RAID and Diff-RAID [2].

We first quantify theaging rateof each SSD in an array.Let ri be the aging rate of SSDi. Note that for each stripe,updating a data chunk also has the parity chunk updated.Suppose that each data chunk has the same probability ofbeing accessed. On average, theratio of the aging rate ofSSD i to that of SSDj can be expressed as [2]:

rirj

=piN + (1 − pi)

pjN + (1 − pj). (1)

Equation (1) states that the parity chunk agesN times fasterthan each data chunk. Given the aging ratesri’s, we canquantify the probability of SSDi being the target drive foreach erase operation, which we denote byqi. We modelqiby making it proportional to the aging rate of SSDi, i.e.,

qi =ri

∑N

i=0 ri=

piN + (1− pi)∑N

i=0(piN + (1− pi)). (2)

We now characterize the age of Diff-RAID which placesparities unevenly and redistributes parity chunks after theworn-out SSD is replaced so as to maintain the age ratiosand always wear out the oldest SSD first. To mathematicallycharacterize the system age of Diff-RAID, defineAi as theremaining fraction of erasures that SSDi can sustain rightafter an SSD replacement. Clearly,Ai = 1 for a brand-new drive andAi = 0 for a worn-out drive. Without lossof generality, we assume that the drives are sorted byAi

in descending order, i.e.,A0 ≥ A1 ≥ · · · ≥ AN , and wehaveA0 = 1 as it is the newly replaced drive. Diff-RAIDperforms parity redistribution to guarantee that the agingratio in Equation (1) remains unchanged. Therefore, theremaining fraction of erasures for each drive willconverge,and the values ofAi’s in the steady state are given by [2]:

Ai=

∑N

j=i rj∑N

j=0 rj=

∑N

j=i(pjN+(1−pj))∑N

j=0(pjN+(1−pj)), 0≤ i≤N. (3)

In this paper, we study Diff-RAID after the age distributionof SSDs right after each drive replacement converges, i.e.,the initial remaining fractions of erasures of SSDs in Diff-RAID follow the distribution ofAi’s in Equation (3).

We now characterizeki for Diff-RAID. Recall that eachSSD hasB blocks. Due to perfect wear-leveling, every blockof SSD i has the same probabilityqi/B of being the targetblock for an erase operation. Thus, if the array has processedk erase operations, the age of SSDi is:

Diff-RAID: ki =(kqiB

modqiqN

(M−kN0))

+ki0, (4)

where ki0 = M(1−Ai) is the initial number of timesthat each block of SSDi has been erased right after adrive replacement, and the notationmod denotes the modulooperation. The rationale of Equation (4) is as follows. Since

we sort the SSDs byAi in descending order, SSDN alwayshas the highest aging rate and will be replaced first. Thus,after each block of SSDN has performed(M − kN0)erasures, SSDN will be replaced, and each block of SSDihas just been erasedqi

qN(M − kN0) times. Therefore, for

SSD i, a drive replacement happens when each block hasbeen erasedevery qi

qN(M−kN0) times. Moreover, the initial

number of erasures on each block of SSDi right after adrive replacement iski0. Thus, the age of SSDi is derivedas in Equation (4). Sinceki0 = M(1−Ai) andAN = qN ,Equation (4) can be rewritten as:

Diff-RAID: ki = ((kqi/B) mod Mqi)+M(1−Ai). (5)

For traditional RAID (e.g., RAID-4 or RAID-5), paritychunks are kept intact, and will not be redistributed after adrive replacement. So after the array has performedk eraseoperations, each block of SSDi has just performedkqi/Berasures, and an SSD will be replaced every time when eachblock performedM erasures. Thus, the age of SSDi is:

Traditional RAID: ki = (kqi/B) mod M. (6)

C. Continuous Time Markov Chain (CTMC)

We first model the error rate of an SSD. We assume thatthe error arrival processes of different chunks in an SSDare independent. Since different chunks in an SSD have thesame age, they must have the same error rate. We letλi(t)represent the error rate of each chunk in SSDi at timet, andmodel it as a function of the number of erasures on SSDiat time t, which is denoted byki(t) (the notationt may bedropped if the context is clear). Furthermore, to reflect thatbit errors increase with the number of erasures, we modelthe error rate based on a Weibull distribution [34], whichhas been widely used in reliability engineering. Formally,

λi(t) = cα(ki(t))α−1, α>1, (7)

whereα is called theshape parameterandc is a constant.Note that even if the error rates of SSDs are time-varying,

they only vary with the number of erasures on the SSDs. Ifwe let tk be the time point of thekth erasure on the array,then during the period(tk, tk+1) (i.e., between thekth and(k + 1)th erasures), the number of erasures on each SSDis fixed, hence the error rates during this period should beconstant, and the error arrivals can be modeled as a Poissonprocess. In particular,ki(t) = ki(k) if t ∈ (tk, tk+1), andthe functionki(k) is expressed by Equation (5) and (6).

We now formulate a CTMC model to characterize thereliability dynamics of an SSD RAID array. Recall that thearray provides single-fault tolerance for each stripe. We saythat the CTMC is at statei if and only if the array hasistripes that contain exactly one erroneous chunk each, where0≤ i≤S. Data loss happens if any one stripe contains morethan one erroneous chunk, and we denote this state byS+1.Let X(t) be the system state at timet. Formally, we have

��

�

��

� �λ=∑

��

�

��

�λ=

− ∑�

� �

�

��

�λ=∑

�

( )�

��

S tλ=∑

�

� �

�

��

�λ=∑

�

� !

"

##

$λ=∑

µ µ µ

Figure 2: State transition of the non-homogeneous CTMC.

X(t) ∈ {0, 1, ..., S+1}, ∀t ≥ 0. To derive the system state,we letπj(t) be the probability that the CTMC is at statej attime t (0≤j≤S+1), so the system state can be characterizedby the vectorπ(t) = (π0(t), π1(t), ..., πS+1(t)).

Let us consider the transition of the CTMC. For eachstripe, if it contains one erroneous chunk, then the erroneouschunk can be reconstructed from the other surviving chunksin the same stripe. Assume that only one stripe can bereconstructed at a time, and that the reconstruction timefollows an exponential distribution with rateµ. The statetransition diagram of the CTMC is depicted in Figure 2. Toelaborate, suppose that the RAID array is currently at statej, if an erroneous chunk appears in one of the(S−j) stripesthat originally have no erroneous chunk, then it will move tostatej+1 with rate(S−j)

∑N

i=0 λi(t); if an erroneous chunkappears in one of thej stripes that already have anothererroneous chunk, then the system will move to stateS+1(in which data loss occurs) with ratej

∑N

i=0 λi(t).We now define thereliability of an SSD RAID array at

time t, and denote it byR(t). Formally, it is the probabilitythat no stripe has encountered data loss until timet.

R(t) =∑S

j=0πj(t). (8)

Note that our model captures the time-varying nature ofreliability over the lifespan of the SSD RAID array. Next,we show how to analyze this non-homogeneous CTMC.

III. Transient Analysis of CTMC

In this section, we deriveπ(t), the system state of anSSD RAID array at any timet. Once we haveπ(t), we canthen compute the instantaneous reliabilityR(t) according toEquation (8). There are two major challenges in derivingπ(t). First, it involvestransient analysis, which is differentfrom the conventional steady state Markov chain analysis.Second, the underlying CTMC{X(t), t ≥ 0} is non-homogeneous, as the error arrival rateλi(t) is time varying,and it also has a very large state space.

In the following, we first present the mathematical foun-dation on analyzing the non-homogeneous CTMC so asto compute the transient system state, then formalize an

algorithm based on the mathematical analysis. At last, wedevelop an optimization technique to address the challengeof large state space of the CTMC so as to further reduce thecomputational cost of the algorithm.

A. Mathematical Analysis on the Non-homogeneous CTMC

Note that the error rates of SSDs within a period(tk, tk+1)(k = 0, 1, 2, ...) are constant, so if we only focus on a partic-ular time period of the CTMC, i.e.,{X(t), tk < t ≤ tk+1},then it becomes a time-homogeneous CTMC. Therefore, theintuitive way to derive the transient solution of the CTMC{X(t), t ≥ 0} is to divide it into many time-homogeneousCTMCs {X(t), tk < t ≤ tk+1} (k = 0, 1, 2...), then usetheuniformizationtechnique [7], [18], [32] to analyze thesetime-homogeneous CTMCs one by one in time ascendingorder. Specifically, to deriveπ(tk+1), one first derivesπ(t1)from the initial stateπ(0), then takesπ(t1) as the initialstate and derivesπ(t2) from π(t1) and so on.

However, this computational approach may take a pro-hibitively long time to deriveπ(tk+1) whenk is very large,which usually occurs in SSDs. Sincek denotes the numberof erasures performed on an SSD RAID array, it can growup to (N+1)BM , where bothB (the number of blocks inan SSD) andM (the erasure limit) could be very huge, say,100K and 10K, respectively (see Sec. V). Therefore, simplyapplying the uniformization technique is computationallyinfeasible to derive the reliability of an SSD RAID array,especially when the array performs a lot of erasures.

To overcome the above challenge, we propose an opti-mization technique which combines multiple time periodstogether. The main idea is that since the difference of thegenerator matrices at two consecutive periods is very smallin general, we considers consecutive periods together, wheres is called thestep size. For simplicity of discussion, letTbe the average inter-arrival time of two consecutive erasureoperations, i.e.,tk = kT . To analyze the non-homogeneousCTMC over s periods {X(t), lsT < t ≤ (l + 1)sT }(l = 0, 1, ...), we define another time-homogeneous CTMC{X(t), lsT < t ≤ (l+1)sT } to approximate it and alsoquantify the error bound. The derivation ofπ((l+1)sT )givenπ(lsT ) proceeds as follows.Step 1: Constructing a time-homogeneous CTMC{X(t), lsT < t ≤ (l+1)sT } with generator matrix Ql.Note that there ares periods in the interval(lsT, (l+1)sT ).We denote the generator matrices of the original Markovchain{X(t)} during each of thes periods byQls, Qls+1,... , Q(l+1)s−1. To construct{X(t), lsT < t ≤ (l + 1)sT },we defineQl as a function of thes generator matrices.

Ql = f(Qls,Qls+1, ...,Q(l+1)s−1), l = 0, 1, ... (9)

Intuitively, Ql can be viewed as the “average” over thes generator matrices. To illustrate, consider a special casewhereα in Equation (7) is set to beα = 2. Then the error

arrival rate of each chunk of SSDi becomes2cki. In thiscase, each element of the generator matrixQk becomes

qi,j(k)=

−SΣ, i= j=0,

−µ−SΣ, 0<i≤S, j= i,

(S−i)Σ, 0≤ i<S, j= i+1,

iΣ, 0<i≤S, j=S+1,

µ, 0<i≤S, j = i−1,

0, otherwise,

(10)

whereΣ =∑N

i=02cki andki is computed by Equations (5)and (6). Now, for the Markov chainX(t), we let Ql be anaverage of theses generator matricesQk. Mathematically,

Ql =(

∑(l+1)s−1

k=lsQk

)

/s, l = 0, 1, ... (11)

Note that our analysis is applicable for other values ofα, with different choices of definingQl in Equation (9)and different error bounds. We pose the further analysisof different values ofα as future work. In the followingdiscussion, we fixα = 2, whose error bound can be derived.Step 2: Deriving the system stateπ((l+1)sT ) under thetime-homogeneous CTMC{X(t)}. To derive the systemstate at time(l+1)sT , which we denote asπ((l+1)sT ), wesolve the Kolmogorov’s forward equation and we have

π((l+1)sT )=π(lsT )∑∞

n=0(QlsT )

n/n!, l = 0, 1, ... (12)

where the initial state isπ(0) = π(0).Step 3: Applying uniformization to solve Equation (12).We letΛl ≥ maxls≤k≤(l+1)s−1 max0≤i≤S+1 |−qi,i(k)|, and

let P l = I+ Ql

Λl

. Based on the uniformization technique [7],the system state at time(l+1)sT can be derived as follows.

π((l+1)sT )=∑∞

n=0e−ΛlsT

(ΛlsT )n

n!vl(n), l=0, 1, ... (13)

wherevl(n) = vl(n−1)P l andvl(0) = π(lsT ). The initialstate isπ(0) = π(0).Step 4: Truncating the infinite summation in Equa-tion (13) with a quantifiable error bound. We denotethe truncation point for interval(lsT, (l+1)sT ) by Ul anddenote the system state at time(l+1)sT after truncation byˆπ((l+1)sT ). We also denote the error caused by combiningsperiods together and truncating the infinite series in interval(lsT, (l+1)sT ) by ˆǫl||ˆπ((l+1)sT )−π((l+1)sT )||1, whereπ((l+ 1)sT ) denotes the accurate system state obtainedby iteratively analyzing the time-homogeneous CTMCs{X(t), kT < t ≤ (k + 1)T } (k = 0, 1, ..., (l+1)s − 1)from the initial stateπ(0). Now, ˆπ((l+1)sT ) and ˆǫl can becomputed using the following theorem.Theorem 1: After truncating the infinite series, the systemstate at time(l + 1)sT for the Markov chain{X(t)} withstep sizes can be computed as follows.

ˆπ((l+1)sT )=∑Ul

n=0e−ΛlsT

(ΛlsT )n

n!vl(n), l=0, 1, ... (14)

wherevl(n) = vl(n−1)P l andvl(0) = ˆπ(lsT ). The initialstate isˆπ(0) = π(0). The error is bounded as follows.

ˆǫl≤ ˆǫl−1+

(

1−∑Ul

n=0e−ΛlsT

(ΛlsT )n

n!

)

, l=0, 1, ... (15)

whereˆǫ0 = ||ˆπ(0)− π(0)||1 = 0.Proof: Please refer to Appendix.

B. Algorithm for Computing System State

In the last subsection, we present the mathematical foun-dation on computing the system state of SSD RAID arraysand the corresponding error bounds. We now present thealgorithm to computeˆπ(t) according to Theorem 1. Inparticular, we aim to compute the system state at the timewhen the kth erasure operation has just occurred, i.e.,ˆπ(kT ). Without loss of generality, we assume thatk is aninteger multiple of the step sizes. Moreover, we denote themaximum acceptable error byǫ.

Algorithm 1 Algorithm for Computing System Stateπ(kT )

Input: Step sizes, maximum errorǫ and initial stateπ(0) = π(0)Output: System state at timekT : ˆπ(kT )

1: for l = 0→ ks− 1 do

2: Let Ql =∑(l+1)s−1

m=lsQm

s;

3: ChooseΛl ≥ maxls≤m<(l+1)s max0≤i≤S+1 | − qi,i(m)|;

4: Let P l = I + Ql

Λl

;

5: Initialize: ˆǫl ← 0; n ← 0; ˆπ((l + 1)sT ) ← 0; vl(0) ←ˆπ(lsT );

6: while 1− ˆǫl >sǫk

do7: ˆǫl ← ˆǫl + e−ΛlsT (ΛlsT )n

n!;

8: ˆπ((l+1)sT )← ˆπ((l+1)sT )+ e−ΛlsT (ΛlsT )n

n!vl(n);

9: n← n+ 1;10: vl(n)← vl(n− 1)P l;11: end while12: end for

Algorithm 1 describes the pseudo-code of the algorithm.Lines 2 to 11 are to derive the system state in one intervalwith s time periods based on the flow in Section III-A.In particular, Line 2 constructs the generator matrix of ourdefined CTMC{X(t)}. Lines 3 to 5 initialize the necessaryparameters. Lines 6 to 11 implement Equation (14), whilethe truncation point is determined based on Equation (15)and the given maximum error. Note that the condition inLine 6 indicates that the maximum allowable error in oneinterval is sǫ

k, as there arek

sintervals and the aggregate

maximum allowable error isǫ. After computing the systemstate at timekT using Algorithm 1, we can easily computethe RAID reliability based on the definition in Equation (8).

Our implementation of Algorithm 1 uses the followinginputs. We fixs = BM/20, meaning that for each SSD, weconsider at least 20 time points before it reaches its lifetimeof BM erasures. The error bound is fixed atǫ = 10−3. We

%& '

(

))

* +λ=∑

,- ./ - /

0

11

2 3λ=

− ∑4

5 67 5 78

99

: ; <λ=

− + ∑

=> ?

@

AA

B Cλ=∑

D

E F

G

HH

Iλ=∑

J

K L M

N

OO

Pλ=∑

µ µ µ

QR S R S

T

UU

V W Xλ=

− ∑

Figure 3: State transition after truncation.

also setπ0(0) = 1 and πj(0) = 0 for 0 < j ≤ S + 1 toindicate that the array has no erroneous chunk initially.

Note that the dimension of the matrixP l is (S + 2) ×(S + 2) (S is the number of stripes), which could be verylarge for large SSDs. To further speed up our computation,we develop another optimization technique by truncatingthe states with large state numbers from the CTMC soas to reduce the dimension ofP l. Intuitively, if an arraycontains many stripes with exactly one erroneous chunk, itis more likely that a new erroneous chunk appears in oneof such stripes (and hence data loss occurs) rather than ina stripe without any erroneous chunk. That is, the transitionrate qi,i+1 becomes very small wheni is large. We canthus remove such states with large state numbers withoutlosing accuracy. We present the details of the optimizationtechnique in the next subsection.

C. Reducing Computational Cost of Algorithm 1

Note that when state numberi increases, the transitionrate qi,i+1(k) decreases while the transition rateqi,S+1(k)increases. This indicates that the higher the current statenumber is, the harder it is to transit to states with largerstate number, while it is easier to transit to the state of dataloss, or stateS+1. The physical meaning is that the systemwill not contain too many stripes with exactly one erroneouschunk as either the erroneous chunk will be recovered, oranother error may appear in the same stripe so that dataloss happens. Therefore, to reduce the computational costwhen derive the system state, we can truncate the stateswith large state number so as to reduce the state space ofthe Markov chain. Specifically, we truncate the states withstate number bigger thanE, and letE+1 represents the casewhen more thanE stripes contain exactly one erroneouschunk. Moreover, we take stateE+1 as an absorbing state.Furthermore, we denote the state of data loss byE+2. Now,the state transition can be illustrated in Figure 3.

To compute the system state after states truncation, wedenote the new CTMC by{X(t), t ≥ 0}, the new generatormatrix during period(kT, (k+1)T ) by Qk, and the systemstate at time(k+1)T by π((k+1)T ). We use notations witha bar to represent the case when system states of the CTMC

are truncated if the context is clear. Similar to Equation (12),given the initial stateπ(kT ), the system state at time(k +1)T for the CTMC{X(t), t ≥ 0} can be derived as follows.

π((k + 1)T ) = π(kT )

∞∑

n=0

(QkT )n

n!. (16)

If we denote the error caused by truncating the states at timekT by ǫk, then ǫk can be formally defined as follows.

ǫk = max0≤i≤E

|πi(kT )− πi(kT )|,

whereπi(kT ) represents the probability of system being atstatei at time kT for the CTMC {X(t), t ≥ 0}, i.e., theMarkov chain after states truncation, andπi(kT ) representsthe probability of the system being at statei at timekT forthe original CTMC{X(t), t ≥ 0}. Clearly, ǫ0 = 0 as thetwo Markov chains have the same initial states, i.e.,πi(0) =πi(0). The bound of the error caused by states truncation is

ǫk ≤ πE+1(kT ). (17)

Again, we can also follow the steps in Section III-A, i.e.,use Algorithm 1, to compute the system state for the Markovchain after states truncation{X(t), t≥0}.

IV. Model Validation

In this section, we validate via trace-driven simulationthe accuracy of our CTMC model on quantifying the RAIDreliability R(t). We use the Microsoft’s SSD simulator [1]based on DiskSim [3]. Since each SSD contains multiplechips that can be configured to be independent of each otherand handle I/O requests in parallel, we consider RAID atthe chip level (as opposed to device level) in our DiskSimsimulation. Specifically, we configure each chip to have itsown data bus and control bus and treat it as one drive, andalso treat the SSD controller as the RAID controller whereparity-based RAID is built.

To simulate error arrivals, we generate error events basedon Poisson arrivals given the current system agek of thearray. As the array ages, we update the error arrival ratesaccordingly by varying the variableki(t) in Equation (7). Wealso generate recovery events whose recovery times followan exponential distribution with a fixed rateµ = 1. Botherror and recovery events are fed into the SSD simulator asspecial types of I/O requests. We consider three cases:errordominant, comparable, andrecovery dominant, in which theerror rate is larger than, comparable to, and smaller than therecovery rate, respectively.

Our validation measures the reliability of the traditionalRAID and Diff-RAID with different parity distributions.Recall that Diff-RAID redistributes the parities after eachdrive replacement, while traditional RAID does not. Weconsider (N + 1) chips whereN = 3, 5, 7. For traditionalRAID, we choose RAID-5, in which parity chunks areevenly placed across the chips; for Diff-RAID, 10% of parity

chunks placed in each of theN chips and the remainingparity chunks are placed in the last flash chip.

We generate synthetic uniform workload in which thewrite requests access the addresses of the entire addressspace with equal probability. The workload lasts until alldrives are worn out and replaced at least once. We runthe DiskSim simulation 1000 times, and in each run werecord the age when data loss happens. Finally, we derivethe probability of data loss and the reliability based onour definitions. To speed up our DiskSim simulation, weconsider a small-scale RAID array, in which each chipcontains 80 blocks with 64 pages each, and the chunk sizeis set to be the same as the page size 4KB. We also set alow erasure limit atM = 100 cycles for each block.

Figure 4 shows the reliabilityR(t) versus the system agek obtained from both the model and DiskSim results. Weobserve that our model accurately quantifies the reliabilityfor all cases. Also, Diff-RAID shows its benefit only inthe comparable case. In the error dominant case, traditionalRAID always shows higher reliability than Diff-RAID; inthe recovery dominant case, there is no significant differencebetween traditional RAID and Diff-RAID. We will furtherdiscuss these findings in Section V.

V. Numerical Analysis

In this section, we conduct numerical analysis on thereliability dynamics of a large-scale SSD RAID array withrespect to different parity placement strategies. To this end,we summarize the lessons learned from our analysis.

A. Choices of Default Model Parameters

We first describe the default model parameters used in ouranalysis, and provide justifications for our choices.

We consider an SSD RAID array composed ofN + 1SSDs, each being modeled by the same set of parameters. Bydefault, we setN = 9. Each block of an SSD has 64 pages ofsize 4KB each. We consider 32GB SSDs withB = 131, 072blocks. We configure the chunk size equal to the block size,i.e., there areS = B = 131, 072 chunks1. We also haveeach block sustainM =10K erase cycles.

We now describe how we configure the error arrival rate,i.e., λi = 2cki, by setting the constantc. We employ 4-bitECC protection per 512 bytes of data, the industry standardfor today’s MLC flash. Based on the uncorrectable bit errorrates (UBERs) calculated in [2], we choose the UBER inthe range[10−16, 10−18] when an SSD reaches its ratedlifetime (i.e., the erasure limitM is reached). Since we setthe chunk size to be equal to the block size, the probabilitythat a chunk contains at least one bit error is roughly inthe range of[2 × 10−10, 2× 10−12]. Based on the analysison real enterprise workload traces [29], an RAID array can

1In practice, SSDs are over-provisioned [1], so the actual number ofblocks (or chunks) that can be used for storage (i.e.,S) should be smaller.However, the key observations of our results here still hold.

0 0.5 1 1.50

0.2

0.4

0.6

0.8

1

System Age ( in 10 4 )

Rel

iabi

lity

R(t

)

ModelDiskSim

c=1×10−6

Diff−RAID

RAID−5

(a) Error dominant case (3+1 RAID)

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1


Rel

iabi

lity

R(t

)

ModelDiskSim

c=0.5×10−6

Diff−RAID

RAID−5

(b) Comparable case (3+1 RAID)

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1


Rel

iabi

lity

R(t

)

ModelDiskSim

c=0.2×10−6

Diff−RAID

RAID−5

(c) Recovery dominant case (3+1 RAID)

0 0.5 1 1.50

0.2

0.4

0.6

0.8

1


Rel

iabi

lity

R(t

)

ModelDiskSim

c=1.0×10−6

Diff−RAIDRAID−5

(d) Error dominant case (5+1 RAID)

0 1 2 3 40

0.2

0.4

0.6

0.8

1


Rel

iabi

lity

R(t

)

ModelDiskSim

c=0.3×10−6

Diff−RAID

RAID−5

(e) Comparable case (5+1 RAID)

0 1 2 3 40

0.2

0.4

0.6

0.8

1

System Age (in 10 4)

Rel

iabi

lity

R(t

)

ModelDiskSim

c=0.1×10−6

RAID−5Diff−RAID

(f) Recovery dominant case (5+1 RAID)

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1


Rel

iabi

lity

R(t

)

ModelDiskSim

c=0.5×10−6

RAID−5

Diff−RAID

(g) Error dominant case (7+1 RAID)

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1


Rel

iabi

lity

R(t

)

ModelDiskSim

c=0.2×10−6

Diff−RAID

RAID−5

(h) Comparable case (7+1 RAID)

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1


Rel

iabi

lity

R(t

)

ModelDiskSim

c=0.1×10−6

Diff−RAID

RAID−5

(i) Recovery dominant case (7+1 RAID)

Figure 4: Model validation with respect to different valuesof N and different error rates.

have several hundred gigabytes of data being accessed perday. If the write request rate is set as 1TB per day (i.e., 50blocks per second), then the error arrival rate per chunk atits rated lifetime (i.e.,λi = 2cM ) is approximately in therange [10−8, 10−10]. The corresponding parameterc is inthe range[0.5× 10−12, 0.5× 10−14].

For the error recovery rateµ, we note that the aggregateerror arrival rate when allN +1 drives are going to die outis 2cMS(N +1). If N = 9, then the aggregate error arrivalrate is roughly in the range[10−2, 10−4]. We fix µ = 10−3.

We compare different cases when the error arrivals aremore dominant than error recoveries, and vice versa. Weconsider three cases of error patterns:c = 1.1 × 10−13,c = 0.4 × 10−13, and c = 0.1 × 10−13, which correspondto theerror dominant, comparable, and recovery dominantcases, respectively. Specifically, whenc = 0.4× 10−13, theaggregate error arrival rate of the array when all SSDs reachtheir rated lifetime is around2cMS(N +1) ≈ 10−3 (where

N = 9, M =10K, andS = 131, 072).

We now configureT , the time interval between twoneighboring erase operations. Suppose that there are 1TBof writes per day as described above. The inter-arrival timeof write requests is around3× 10−4 seconds for 4KB pagesize. Thus, the average time between two erase operations is1.9× 10−2 seconds as an erase is triggered after writing 64pages. In practice, each erase causes additional writes (i.e.,write amplification [15]) as it moves data across blocks, soT should be smaller. Here, we fixT = 10−2 seconds.

We compare the reliability dynamics of RAID-5 anddifferent variants of Diff-RAID. For RAID-5, each driveholds a fraction 1

N+1 of parity chunks; for Diff-RAID, wechoose the parity distribution (i.e.,pi’s for 0 ≤ i ≤ N ) basedon a truncated normal distribution. Specifically, we considera normal distributionN (N + 1, σ2) with mean N + 1,and standard deviationσ, and let f be the corresponding

probability density function. We then choosepi’s as follows:

pi =

∫ i+1

if(x)dx

∫ N+1

0f(x)dx

, 0 ≤ i ≤ N. (18)

We can choose different distributions ofpi by tuning theparameterσ. Intuitively, the largerσ is, the more evenlypi’sare distributed. We consider three cases:σ = 1, σ = 2, andσ = 5. Suppose thatN = 9. Then forσ = 1, SSDN andSSDN−1 hold 68% and 27% of parity chunks, respectively;for σ = 2, SSDN , SSDN − 1, and SSDN − 2 hold 38%,30%, and 18% of parity chunks, respectively; forσ = 5, theproportions of parity chunks range from 2.8% (in SSD 0)to 16.6% (in SSDN ). After choosingpi’s, the age of eachblock of SSDi (i.e., ki) can be computed via Equation (5).

B. Impact of Different Error Dynamics

We now show the numerical results of RAID reliabilitybased on the parameters described earlier. We assume thatdrive replacement can be completed immediately after theoldest SSD reaches its rated lifetime. When the oldest driveis replaced, all its chunks (including any erroneous chunks)are copied to the new drive. Thus, the reliability (or theprobability of no data loss) remains the same. We considerthree error cases: error dominant, comparable, and recoverydominant cases, as described above.Case 1: Error dominant case. Figure 5(a) first showsthe numerical results for the error dominant case. Initially,RAID-5 achieves very good reliability as all drives arebrand-new. However, as SSDs wear down, the bit error rateincreases, and this makes the RAID reliability decrease veryquickly. In particular, the reliability drops to zero (i.e., dataloss always happen) when the array performs around5×109

erasures. For Diff-RAID, the more evenly parity chunksare distributed, the lower RAID reliability is. In the errordominant case, since error arrival rate is much bigger thanthe recovery rate, the RAID reliability drops to zero veryquickly no matter what parity placement strategy is used. Wenote that Diff-RAID is less reliable than traditional RAID-5 in the error dominant case. The reason is that for Diff-RAID, the initial ages of SSDs when constructing the RAIDarray are non-zero, but instead follow the convergent agedistribution (i.e., based onAi’s in Equation (3)). When errorarrival rate is very large, the array suffers from low reliabilityeven if the array only performs small number of erasures.However, for RAID-5, since it is always constructed by usingbrand-new SSDs, it starts with a very high reliability.Case 2: Comparable case.Figure 5(b) shows the results forthe comparable case. RAID-5 achieves very good reliabilityinitially, but decreases dramatically as the SSDs wear down.Also, all drives wear down at the same rate, the reliabilityof the array is about zero when all drives reach their erasurelimits, i.e., when the system age is around1.3 × 1010

erasures. Diff-RAID shows different reliability dynamics.Initially, Diff-RAID has less reliability than RAID-5, but

the drop rate of the reliability is much slower than that ofRAID-5 as SSDs wear down. The reason is that Diff-RAIDhas uneven parity placement, SSDs are worn out at differenttimes and will be replaced one by one. When the worn-outSSD is replaced, other SSDs perform fewer erase operationsand have small error rates. This prevents the whole arraysuffering from a very large error rate as in RAID-5. Also,the reliability is higher when the parity distribution is moreskewed (i.e., smallerσ), as also observed in [2].Case 3: Recovery dominant case.Figure 5(c) shows theresults for the recovery dominant case. RAID-5 shows highreliability in general. Between two replacements (whichhappens every1.3× 1010 erasures), its data loss probabilitydrops by within 3%. Its reliability drops slowly right aftereach replacement, and its drop rate increases as it is close tobe worn out. Diff-RAID shows higher reliability than RAID-5 in general, but the difference is small (e.g., less than 6%between Diff-RAID forσ = 1 and RAID-5). Therefore, inthe recovery dominant scenario, we may deploy RAID-5instead of Diff-RAID, as the latter introduces higher costsin parity redistribution in each replacement and has smallerI/O throughput due to load imbalance of parities.

C. Impact of Different Array Configurations

We further study via our model how different array config-urations affect the RAID reliability. We focus on Diff-RAIDand generate the parity distributionpi’s with σ = 1. Our goalis to validate the robustness of our model on characterizingthe reliability for different array configurations.Impact of N . Figure 6(a) shows the impact of the RAID sizeN . We fix other parameters as the same in the comparablecase, i.e.,µ = 10−3, c = 0.4 × 10−13, and M = 104.The larger the system size, the lower the RAID reliability.Intuitively, the probability of having one more erroneouschunk in a stripe increases with the stripe width (i.e.,N+1).Note that the reliability drop is significant whenN increases.For example, at2.6×1010 erasures, the reliability drops from0.7 to 0.2 whenN increases from 9 to 19.Impact of ECC. Figure 6(b) shows the impact of differentECC lengths. We fixµ = 10−3, M = 104, andN = 9. Wealso fix the raw bit error rate (RBER) as1.3×10−6 [2], andcompute the uncorrectable bit error rate using the formulasin [27]. Then as described in Section V-A, we derivecfor different ECCs that can correct 3, 4, 5 bits per 512byte sector, and the corresponding values are4.4 × 10−11,4.7× 10−14, and4.2× 10−17, respectively. We observe thatthe RAID reliability drops to zero very quickly for 3-bitECC at around105 erasures, while the RAID reliabilityfor 5-bit ECC starts to decrease until the array performs1011 erasures. This shows that the RAID reliability heavilydepends on the reliability of each single SSD, or the ECClength employed in each SSD.Impact of M . Figure 6(c) shows the impact of the erasurelimit M , or the endurance of a single SSD, on the RAID

0 0.1 0.2 0.3 0.4 0.50

0.2

0.4

0.6

0.8

1


Rel

iabi

lity

R(t

)

Diff−RAID( σ=1)Diff−RAID( σ=2)Diff−RAID( σ=5)RAID−5

(a) Error dominant case (c = 1.1× 10−13)

0 0.2 0.4 0.6 0.8 1 1.20

0.2

0.4

0.6

0.8

1


Rel

iabi

lity

R(t

)


(b) Comparable case (c = 0.4× 10−13)

0 1 2 3 40.9

0.92

0.94

0.96

0.98

1

Rel

iabi

lity

R(t

)



(c) Recovery dominant case (c = 0.1×10−13)

Figure 5: Reliability dynamics of SSD arrays.

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

1


Rel

iabi

lity

R(t

)

N=4N=9N=19

(a) Impact ofN

104

106

108

1010

10120

0.2

0.4

0.6

0.8

1

System Age

Rel

iabi

lity

R(t

)

ECC=5ECC=4ECC=3

(b) Impact of ECC length

0 0.2 0.4 0.6 0.8 1 1.20.8

0.85

0.9

0.95

1

Rel

iabi

lity

R(t

)


M=10KM=5KM=1K

(c) Impact ofM

Figure 6: Impact of different RAID configurations on the reliability.

reliability. We fix other parameters withµ = 10−3, N = 9and c = 0.4× 10−13. We observe that whenM decreases,the RAID reliability increases. For example, at1.3 × 1010

erasures, the RAID reliability increases from 0.85 to 0.99whenM decreases from 10K to 1K. Recall that the errorrates increase with the number of erasures in SSDs. Wenow have the increase of bit error rates capped by the smallerasure limit. The trade-off is that the SSDs are worn outand replaced more frequently with smallerM .

D. Discussion

Our results provide several insights into constructingRAID for SSDs.

• The error dominant case may correspond to the low-endMLC or TLC SSDs with high bit error rates, especiallywhen these types of SSDs have low I/O bandwidthfor RAID reconstruction. Both traditional RAID-5 andDiff-RAID show low reliability. A higher degree offault tolerance (e.g., using RAID-6 or stronger ECC)becomes necessary in this case.

• When the error arrival and recovery rates are similar,Diff-RAID, with uneven parity distribution, achieveshigher reliability than RAID-5, especially when RAID-5 reaches zero reliability when all SSDs are worn outsimultaneously. This conforms to the findings in [2].

• In the recovery dominant case, which may correspondto the high-end single-level cell (SLC) SSDs that typi-cally have very small bit error rates, RAID-5 achieves

very high reliability. We may choose RAID-5 over Diff-RAID in RAID deployment to save the overhead ofparity redistribution in Diff-RAID.

• Our model can effectively analyze the RAID reliabilitywith regard to different RAID configurations.

VI. Related Work

There have been extensive studies on NAND flash-basedSSDs. A detailed survey of the algorithms and data struc-tures for flash memories is found in [11]. Recent papers em-pirically study the intrinsic characteristics of SSDs (e.g., [1],[5]), or develop analytical models for the write performance(e.g., [9], [15]) and garbage collection algorithms (e.g.,[23])of SSDs.

Bit error rates of SSDs are known to increase with thenumber of erase cycles [12], [27]. To improve reliability,prior studies propose to adopt RAID for SSDs at the devicelevel [2], [16], [21], [22], [25], [30], or at the chip level [20].These studies focus on developing new RAID schemes thatimprove the performance and endurance of SSDs over tra-ditional RAID. The performance and reliability implicationsof RAID on SSDs are also experimentally studied in [19]. Incontrast, our work focuses on quantifying reliability dynam-ics of SSD RAID from a theoretical perspective. Authorsof Diff-RAID [2] also attempt to quantify the reliability,but they only compute the reliability at the instants of SSDreplacements, while our model captures the time-varying

nature of error rates in SSDs and quantifies the instantaneousreliability during the whole lifespan of an SSD RAID array.

RAID was first introduced in [31] and has been widelyused in many storage systems. Performance and reliabilityanalysis on RAID in the context of hard disk drives hasbeen extensively studied (e.g., see [4], [6], [24], [28], [35]).On the other hand, SSDs have a distinct property that theirerror rates increase as they wear down, so a new model isnecessary to characterize the reliability of SSD RAID.

VII. Conclusions

We develop thefirst analytical model that quantifies the re-liability dynamics of SSD RAID arrays. We build our modelas a non-homogeneous continuous time Markov chain, anduse uniformization to analyze the transient state of the RAIDreliability. We validate the correctness of our model viatrace-driven DiskSim simulation with SSD extensions.

One major application of our model is to characterize thereliability dynamics of general RAID schemes with differentparity placement distributions. To demonstrate, we comparethe reliability dynamics of the traditional RAID-5 schemeand the new Diff-RAID scheme under different error patternsand different array configurations. Our model provides auseful tool for system designers to understand the reliabilityof an SSD RAID array with regard to different scenarios.

REFERENCES

[1] N. Agrawal, V. Prabhakaran, T. Wobber, J. D. Davis, M. Man-asse, and R. Panigrahy. Design Tradeoffs for SSD Perfor-mance. InProc. of USENIX ATC, Jun 2008.

[2] M. Balakrishnan, A. Kadav, V. Prabhakaran, and D. Malkhi.Differential RAID: Rethinking RAID for SSD Reliability.ACM Trans. on Storage, 6(2):4, Jul 2010.

[3] J. S. Bucy, J. Schindler, S. W. Schlosser, and G. R. Ganger.The DiskSim Simulation Environment Version 4.0 ReferenceManual. Technical Report CMUPDL-08-101, May 2008.

[4] W. Burkhard and J. Menon. Disk Array Storage SystemReliability. In Proc. of IEEE FTCS, Jun 1993.

[5] F. Chen, D. A. Koufaty, and X. Zhang. UnderstandingIntrinsic Characteristics and System Implications of FlashMemory Based Solid State Drives. InSIGMETRICS, 2009.

[6] S. Chen and D. Towsley. A Performance Evaluation of RAIDArchitectures.IEEE T. on Comp., 45(10):1116–1130, 1996.

[7] E. de Souza e Silva and H. R. Gail. Transient Solutions forMarkov Chains. Computational Probability, W. K. Grass-mann (editor). Kluwer Academic Publishers:43–81, 2000.

[8] E. Deal. Trends in NAND Flash Memory Error Cor-rection. http://www.cyclicdesign.com/whitepapers/CyclicDesign NAND ECC.pdf, Jun 2009.

[9] P. Desnoyers. Analytic Modeling of SSD Write Performance.In Proc. of SYSTOR, Jun 2012.

[10] R. Enderle. Revolution in January: EMC Brings Flash Drivesinto the Data Center. http://www.itbusinessedge.com/blogs/rob/?p=184, Jan 2008.

[11] E. Gal and S. Toledo. Algorithms and Data Structures forFlash Memories.ACM Comput. Surv., 37(2):138–163, 2005.

[12] L. M. Grupp, A. M. Caulfield, J. Coburn, S. Swanson,E. Yaakobi, P. H. Siegel, and J. K. Wolf. CharacterizingFlash Memory: Anomalies, Observations, and Applications.In Proc. of IEEE/ACM MICRO, Dec 2009.

[13] L. M. Grupp, J. D. Davis, and S. Swanson. The Bleak Futureof NAND Flash Memory. InUSENIX FAST, Feb 2012.

[14] K. Hess. 2011: Year of the SSD? http://www.datacenterknowledge.com/archives/2011/02/17/2011-year-of-the-ssd/, Feb 2011.

[15] X.-Y. Hu, E. Eleftheriou, R. Haas, I. Iliadis, and R. Pletka.Write Amplification Analysis in Flash-based Solid StateDrives. In Proc. of SYSTOR, May 2009.

[16] S. Im and D. Shin. Flash-Aware RAID Techniques for De-pendable and High-Performance Flash Memory SSD.IEEETrans. on Computers, 60:80–92, Jan 2011.

[17] Intel. Intel Solid-State Drive 710: Endurance. Perfor-mance. Protection. http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-710-series.html.

[18] A. Jensen. Markoff Chains As An Aid in The Study ofMarkoff Processes.Scandinavian Actuarial Journal, 3:87–91, 1953.

[19] N. Jeremic, G. Muhl, A. Busse, and J. Richling. The Pitfallsof Deploying Solid-state Drive RAIDs. InSYSTOR, 2011.

[20] J. Kim, J. Lee, J. Choi, D. Lee, and S. H. Noh. EnhancingSSD Reliability Through Efficient RAID Support. InProc.of APSys, Jul 2012.

[21] S. Lee, B. Lee, K. Koh, and H. Bahn. A Lifespan-awareReliability Scheme for RAID-based Flash Storage. InProc.of ACM Symp. on Applied Computing, SAC ’11, 2011.

[22] Y. Lee, S. Jung, and Y. H. Song. FRA: A Flash-awareRedundancy Array of Flash Storage Devices. InProc. ofACM CODES+ISSS, Oct 2009.

[23] Y. Li, P. P. C. Lee, and J. C. S. Lui. Stochastic Modeling ofLarge-Scale Solid-State Storage Systems: Analysis, DesignTradeoffs and Optimization. InProc. of SIGMETRICS, 2013.

[24] M. Malhotra and K. S. Trivedi. Reliability Analysis ofRedundant Arrays of Inexpensive Disks.J. Parallel Distrib.Comput., 17(1-2):146–151, Jan 1993.

[25] B. Mao, H. Jiang, S. Wu, L. Tian, D. Feng, J. Chen, andL. Zeng. HPDA: A Hybrid Parity-based Disk Array forEnhanced Performance and Reliability.ACM Trans. onStorage, 8(1):4, Feb 2012.

[26] M. Mariano. ECC Options for Improving NANDFlash Memory Reliability. http://www.micron.com/∼/media/Documents/Products/Software%20Article/SWNLimplementingecc.pdf, Nov 2011.

http://www.cyclicdesign.com/whitepapers/Cyclic_Design_NAND_ECC.pdf

http://www.cyclicdesign.com/whitepapers/Cyclic_Design_NAND_ECC.pdf

http://www.itbusinessedge.com/blogs/rob/?p=184

http://www.itbusinessedge.com/blogs/rob/?p=184

http://www.datacenterknowledge.com/archives/2011/02/17/2011-year-of-the-ssd/



http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-710-series.html

http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-710-series.html

http://www.micron.com/~/media/Documents/Products/Software%20Article/SWNL_implementing_ecc.pdf



[27] N. Mielke, T. Marquart, N. Wu, J. Kessenich, H. Belgal,E. Schares, F. Trivedi, E. Goodness, and L. Nevill. Bit ErrorRate in NAND Flash Memories. InIEEE Int. ReliabilityPhysics Symp., Apr 2008.

[28] R. R. Muntz and J. C. S. Lui. Performance Analysis of DiskArrays under Failure. InProc. of VLDB, Aug 1990.

[29] D. Narayanan, E. Thereska, A. Donnelly, S. Elnikety, andA. Rowstron. Migrating Server Storage to SSDs: Analysis ofTradeoffs. InProc. of ACM EuroSys, Mar 2009.

[30] K. Park, D.-H. Lee, Y. Woo, G. Lee, J.-H. Lee, and D.-H.Kim. Reliability and Performance Enhancement Techniquefor SSD Array Storage System Using RAID Mechanism. InIEEE Int. Symp. on Comm. and Inform. Tech., 2009.

[31] D. A. Patterson, G. Gibson, and R. H. Katz. A Case forRedundant Arrays of Inexpensive Disks (RAID). InProc. ofACM SIGMOD, Jun 1988.

[32] A. Reibman and K. S. Trivedi. Transient Analysis of Cumu-lative Measures of Markov Model Behavior.Communicationsin Statistics-Stochastic Models, 5:683–710, 1989.

[33] M. Schulze, G. Gibson, R. Katz, and D. Patterson. HowReliable Is A RAID? InIEEE Computer Society InternationalConference: Intellectual Leverage, Digest of Papers, 1989.

[34] W. Weibull. A Statistical Distribution Function of WideApplicability. J. of Applied Mechanics, 18:293–297, 1951.

[35] X. Wu, J. Li, and H. Kameda. Reliability Analysis of DiskArray Organizations by Considering Uncorrectable Bit Errors.In Proc. of IEEE SRDS, Oct 1997.

APPENDIX

A. Proof of Theorem 1 in Section III-A

The computation of the system state in Equation (14) isintuitive since the truncation point isUl in interval(lsT, (l+1)sT ). In the following, we focus on the derivation of theerror bound. Note thatπ((l + 1)sT ) is the system state attime (l+1)sT for the CTMC{X(t)}. Moreover, given thestate at timelsT , π((l + 1)sT ) is computed iteratively bycomputingπ((ls + 1)T ), π((ls + 2)T ), ..., π((ls + s)T )sequentially. During each step, e.g., derivingπ((k + 1)T )from π(kT ) (ls ≤ k < (l + 1)s), uniformization is used.Without loss of generality, we can letΛk = Λl (ls ≤ k <(l+1)s) asΛl ≥ max0≤i≤S+1 |−qi,i(k)| for all k (ls ≤ k <(l+1)s). SinceQk is denoted as the generator matrix of thehomogeneous CTMC{X(t), kT < t ≤ (k+1)T }, to applythe uniformization, we letP k = I+Q

k

Λl

(ls ≤ k < (l+1)s).Since every element ofP k is a linear function ofk, thedifference between two matricesP k+1 − P k must be thesame for allk, and we denote it byD. Formally, we have

D = P k+1 − P k, ls ≤ k < (l + 1)s (19)

Now, we can easily find thatP k = P ls + (k − ls)D (ls ≤

k ≤ (l+ 1)s− 1). Moreover, sinceP l = I + Ql

Λl

andQl is

defined as∑(l+1)s−1

k=lsQ

k

sin Equation (11), we have

P l =

∑(l+1)s−1k=ls P k

s=

∑(l+1)s−1k=ls (P ls + (k − ls)D)

s

= P ls +s− 1

2D. (20)

Note that based on the analysis of{X(t), kT < t ≤ (k+1)T } by using uniformization,π((k+ 1)T ) (ls ≤ k < (l+1)s) can be rewritten as follows.

π((k + 1)T ) = π(kT )e−ΛlT eΛlTP k

= π(kT )e−ΛlT eΛlT (P ls+(k−ls)D).

Observe that most elements in the difference matrixD

are zero, and the non-zero elements are all very small, byexamining the elements inDP ls and the elements inP lsD,we find that the multiplication of matrixD and matrixP ls

can be assumed to be commutative, orDP ls ≈ P lsD.Therefore, we have

π((l + 1)sT ) ≈ π(lsT )e−ΛlsT eΛlT∑(l+1)s−1

k=lsP k

= π(lsT )e−ΛlsT eΛlTsP l .

Now, the upper bound of the errorˆǫl is derived as follows.

ˆǫl=||ˆπ((l + 1)sT )− π((l + 1)sT )||1

=||ˆπ(lsT )Ul∑

n=0

e−ΛlsT(ΛlsT )

n

n!P

n

l−π(lsT )e−ΛlsT eΛlTsP l||1

≤||ˆπ(lsT )e−ΛlsT eΛlsT P l−π(lsT )e−ΛlsT eΛlTsP l ||1

+||ˆπ(lsT )∞∑

n=Ul+1

e−ΛlsT(ΛlsT )

n

n!P

n

l ||1

≤||ˆπ(lsT )− π(lsT )||1e−ΛlsT eΛlsT ||P l||∞

+

(

1−Ul∑

n=0

e−ΛlsT(ΛlsT )

n

n!

)

=ˆǫl−1 +

(

1−Ul∑

n=0

e−ΛlsT(ΛlsT )

n

n!

)

.

The last equation comes from the fact that||P l||∞ = 1 as

P l = I + Ql

Λl

, and ˆǫl−1 = ||ˆπ(lsT )−π(lsT )||1. Therefore,we have the results stated in Theorem 1.