hard disk drive reliability modeling and failure prediction

9
3676 IEEE TRANSACTIONS ON MAGNETICS, VOL. 43, NO. 9, SEPTEMBER 2007 Hard Disk Drive Reliability Modeling and Failure Prediction Brian D. Strom , SungChang Lee , George W. Tyndall , and Andrei Khurshudov Samsung Information Systems America, San Jose, CA 95134 USA Seagate Technology, Longmont, CO 80503 USA A reliability model for the hard disk drive (HDD) is developed, focusing on head–disk separation as the primary independent vari- able. The model is structured to incorporate the theoretical effects of environmental factors, plus empirical dependence on the product operating mode. An experimental method based on magnetic spacing loss theory is used to characterize the head–media separation as a function of temperature, altitude, humidity, and HDD operating mode. A statistical model based on these empirical data is developed to predict HDD reliability for various operating conditions. The predictions of the model are verified experimentally through comparison with HDD product reliability test data. Index Terms—Failure prediction, HDD, head–disk clearance, magnetic hard disk drive, reliability model. I. INTRODUCTION T HE hard disk drive (HDD) is a highly complex, mass-pro- duced, electro-mechanical device that utilizes principles of magnetic recording for data storage. As fundamental elements of modern computer systems and consumer electronic devices, HDDs have managed to combine a steady increase in storage density and capacity with a concomitant decrease in the cost per megabyte. The HDD combines the most recent achievements in the science and technology of magnetic recording, material sci- ence, and digital signal processing. The slider, read/write heads, and magnetic recording media are the key components of the HDD and form together the head–disk interface (HDI), illustrated in Fig. 1. These heads are integrated into the ceramic slider, which includes an air-bearing surface (ABS) formed in relief on its surface facing the disk [Fig. 1(a)]. Air entrained between the ABS and disk generates lift by virtue of the viscous properties of air being squeezed through the gap. The flow of air is guided by the ABS to control the head–media spacing within close tolerance ( 0.1 nm). The separation distance between read/write elements and the disk directly affects both signal strength and resolution, and is therefore critical to the recording density of the HDD. As magnetic recording densities increase, the magnetic spacing and hence the head flying height must decrease correspondingly. Today, only a few nanometers separate the slider from the disk surface moving at 30 m/s. A. Definition of Head–Disk Clearance We define clearance as the difference between the flying height (FH) and the glide height avalanche (GHA), which is the level of highest detected disk asperities. The distribution of flying height in a slider population is typically normal, charac- terized by its mean and standard deviation. In a well-designed interface, the lowest-flying slider of the total population will fly significantly above the GHA, eliminating the possibility that Digital Object Identifier 10.1109/TMAG.2007.902969 Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Fig. 1. Schematic diagrams of the head–disk interface. (a) The slider contains the read/write heads and is positioned on the disk by its suspension. The slider also includes the air-bearing surface (ABS) facing the disk. (b) The slider in side view, flying over a cross section of the disk surface (roughness exaggerated). Recording transducers write to and read from the disk magnetic layer, while carbon overcoat (COC), lubricant layers, and surface roughness limit how close to the disk the recording elements can fly. any heads drag on the disk at high speed. Fig. 2(A) illustrates such a condition for an exemplary product operating at room temperature: it has mean clearance 5.3 nm and standard devia- tion 0.8 nm, thus operating at a 6-sigma level ensuring positive clearance. The demand for HDD products with higher areal densities, coupled with the cost pressures of a mass-production environ- ment where inspection of all incoming components is not prac- tical, can result in HDD product populations having some finite “interference” of the FH distribution with the GHA. This inter- ference can be exacerbated by the environmental conditions in which the HDD is operated. In Fig. 2(B), we illustrate the case of “minor interference” where the tail of the flying height dis- tribution overlaps the disk GHA level. Fig. 2(c) shows the case 0018-9464/$25.00 © 2007 IEEE

Upload: a

Post on 26-Feb-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

3676 IEEE TRANSACTIONS ON MAGNETICS, VOL. 43, NO. 9, SEPTEMBER 2007

Hard Disk Drive Reliability Modeling and Failure PredictionBrian D. Strom1, SungChang Lee1, George W. Tyndall1, and Andrei Khurshudov2

Samsung Information Systems America, San Jose, CA 95134 USASeagate Technology, Longmont, CO 80503 USA

A reliability model for the hard disk drive (HDD) is developed, focusing on head–disk separation as the primary independent vari-able. The model is structured to incorporate the theoretical effects of environmental factors, plus empirical dependence on the productoperating mode. An experimental method based on magnetic spacing loss theory is used to characterize the head–media separation as afunction of temperature, altitude, humidity, and HDD operating mode. A statistical model based on these empirical data is developed topredict HDD reliability for various operating conditions. The predictions of the model are verified experimentally through comparisonwith HDD product reliability test data.

Index Terms—Failure prediction, HDD, head–disk clearance, magnetic hard disk drive, reliability model.

I. INTRODUCTION

THE hard disk drive (HDD) is a highly complex, mass-pro-duced, electro-mechanical device that utilizes principles of

magnetic recording for data storage. As fundamental elementsof modern computer systems and consumer electronic devices,HDDs have managed to combine a steady increase in storagedensity and capacity with a concomitant decrease in the cost permegabyte. The HDD combines the most recent achievements inthe science and technology of magnetic recording, material sci-ence, and digital signal processing.

The slider, read/write heads, and magnetic recording mediaare the key components of the HDD and form together thehead–disk interface (HDI), illustrated in Fig. 1. These heads areintegrated into the ceramic slider, which includes an air-bearingsurface (ABS) formed in relief on its surface facing the disk[Fig. 1(a)]. Air entrained between the ABS and disk generateslift by virtue of the viscous properties of air being squeezedthrough the gap. The flow of air is guided by the ABS to controlthe head–media spacing within close tolerance ( 0.1 nm).

The separation distance between read/write elements and thedisk directly affects both signal strength and resolution, andis therefore critical to the recording density of the HDD. Asmagnetic recording densities increase, the magnetic spacing andhence the head flying height must decrease correspondingly.Today, only a few nanometers separate the slider from the disksurface moving at 30 m/s.

A. Definition of Head–Disk Clearance

We define clearance as the difference between the flyingheight (FH) and the glide height avalanche (GHA), which isthe level of highest detected disk asperities. The distribution offlying height in a slider population is typically normal, charac-terized by its mean and standard deviation. In a well-designedinterface, the lowest-flying slider of the total population will flysignificantly above the GHA, eliminating the possibility that

Digital Object Identifier 10.1109/TMAG.2007.902969

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Fig. 1. Schematic diagrams of the head–disk interface. (a) The slider containsthe read/write heads and is positioned on the disk by its suspension. The slideralso includes the air-bearing surface (ABS) facing the disk. (b) The slider in sideview, flying over a cross section of the disk surface (roughness exaggerated).Recording transducers write to and read from the disk magnetic layer, whilecarbon overcoat (COC), lubricant layers, and surface roughness limit how closeto the disk the recording elements can fly.

any heads drag on the disk at high speed. Fig. 2(A) illustratessuch a condition for an exemplary product operating at roomtemperature: it has mean clearance 5.3 nm and standard devia-tion 0.8 nm, thus operating at a 6-sigma level ensuring positiveclearance.

The demand for HDD products with higher areal densities,coupled with the cost pressures of a mass-production environ-ment where inspection of all incoming components is not prac-tical, can result in HDD product populations having some finite“interference” of the FH distribution with the GHA. This inter-ference can be exacerbated by the environmental conditions inwhich the HDD is operated. In Fig. 2(B), we illustrate the caseof “minor interference” where the tail of the flying height dis-tribution overlaps the disk GHA level. Fig. 2(c) shows the case

0018-9464/$25.00 © 2007 IEEE

STROM et al.: HARD DISK DRIVE RELIABILITY MODELING AND FAILURE PREDICTION 3677

Fig. 2. Schematic representation of (A) the initial clearance distribution in a well-designed interface. The mean clearance of the entire slider population is denotedas z and is defined as: z = FH �GHA. (B), (C) Distributions with higher probability of head–disk interference. The hashed regions represent the portion ofthe population in contact with the disk asperities.

of the major interference when exactly 50% of the populationof the heads interfere with the disk and the clearance as definedpreviously is less than zero. We describe below why in this caseas many as 50% of the population will eventually fail.

B. Clearance as a Critical Factor for HDD Reliability

Experience has shown that a conventional HDI will failif clearance is less than zero. Attempts to eliminate thehead–media spacing through the design of contact recordingsystems have not emerged from the laboratory as products, forwant of both performance shortfalls and reliability concerns[1]–[5]. A robust, long-lived HDI requires sufficient clearanceto prevent the heads and slider from contacting the mediaduring operation.

The hatched regions of the distributions in Fig. 2(B) and (C)represent the portion of the HDI population having negativeclearance. The heads in this portion of the population arerubbing against the disk and will likely fail. Following thisfailure criterion, Fig. 3 summarizes the relationship betweenthe normalized clearance and the cumulative failure rate ofthe HDD population. As the clearance decreases, the rateof cumulative failures increases rapidly. Also, for a givenFH probability density function, the cumulative failure rateincreases with increasing number of interfaces, following theequation , where is the number of interfaces andis the probability of one interface failing. Increasing interfacesis a common way to increase the HDD capacity. This analysisillustrates how rapidly the HDD failure rate can increase withdecreasing clearance.

Moreover, even as recording densities have increased at anaverage annual rate of 60% [1], the requirements for HDD reli-ability, such as the Mean Time to Failure (MTTF), increase as aresult of the competitive pressure from inside the disk drive in-dustry and from other data storage technologies, such as flashmemory. For example, HDD designers have managed to in-crease MTTF ratings from 100 000 h to over 1 000 000 h in thespan of the last 10 to 15 years (Fig. 4, [6]).

It has proven extremely challenging to simultaneously reducethe head flying height (and clearance) while enhancing the re-liability of the head–disk interface. Looking forward, this chal-lenge will become more severe as the HDI is exposed to morecontact than before and the thickness of the protective films in

Fig. 3. Cumulative failure rates calculated for HDDs having 1–6 HDIs as afunction of mean clearance z , assuming the flying height distribution is normalwith standard deviation �. The failure criterion is: clearance < 0.

Fig. 4. HDD manufacturer’s MTTF specifications [1].

the HDI is minimized. Continued progress will be facilitated byincreased understanding of the factors that impact slider–diskclearance and by our ability to manipulate these factors.

II. THEORETICAL CONSIDERATIONS

Over the past 20 years of HDI technology development,clearance, surface roughness, film thicknesses, and appliedloads have all diminished considerably. Consequently, thesignificance of nanometer scale object dimensions and inter-molecular forces has increased, and the theory of HDI behaviorhas evolved accordingly. Significant factors that impact the

3678 IEEE TRANSACTIONS ON MAGNETICS, VOL. 43, NO. 9, SEPTEMBER 2007

HDI clearance and reliability include properties of: a) the disk,such as roughness and lubricant; b) the head, such as dynamicpitch; as well as c) the environment.

A. Disk Factors

Disk roughness can affect clearance through two differentmechanisms. First, as is apparent from the illustration inFig. 1(b), the slider must fly higher than the ridges and as-perities on the rough disk to maintain positive clearance. Inthis regard, higher disk roughness produces higher disk glideavalanche (GHA) and reduces clearance. This considerationhas driven technology for ever-smoother disk surfaces over thepast 20 years.

Secondly, disk roughness can significantly affect the clear-ance by attenuating the intermolecular forces (IMF) between theslider and disk. When exposed to the IMF, the slider flies lowerand the clearance is reduced. The non-retarded, long-range IMF(predominantly the van der Waals forces) increase as a cube ofthe decrease in separation distance. It follows that increasing thesurface roughness results in an overall decrease in the IMF.

Thus, experience of a decade ago showed that lower rough-ness produced higher clearance because the GHA was reduced.But at today’s slider–disk clearance, the influence of IMF ismore significant, so that lowering roughness increases the effec-tive interaction area, thereby strengthening IMF and decreasingclearance [7].

The lubricant film is another important component of diskdesign, and has become increasingly important as the clearancehas been reduced to a few nanometers. This film is typically theweakest mechanical element in the HDI, in that it moves underthe influence of air-bearing pressure, disjoining pressure, andintermolecular forces.

In the conventional HDI, peak air-bearing pressures can reach20 atm, and the lubricant tends to be squeezed thinner as a conse-quence. The lubricant is displaced until the disjoining pressure,which resists film thinning, comes into hydrostatic equilibriumwith the ABS pressure [8]. This lubricant thinning has a rela-tively minor effect on clearance.

More significant effects on clearance are produced by in-termolecular forces acting between the slider and disk, whichweaken the adhesion of the lubricant to the disk [9]. Underthis effect, the lubricant expands towards the slider and thusreduces the slider–disk clearance. (Should this effect be suffi-ciently large, the lubricant “jumps” between the disk and slidersurfaces thereby momentarily reducing the effective clearanceto zero. This “soft” contact can cause errors in writing andreading data, or even failure of the HDD.) The magnitude of thiseffect depends on the initial clearance as well as the lubricantstructure, thickness, and molecular weight. For example, sincethe lubricant disjoining pressure decreases with increasingthickness [9], the IMF generated between the flying head androtating disk will have a larger effect on thicker lubricant filmscompared to thinner ones. Experiments involving the commonlubricant Zdol have also shown that clearance decreased asmolecular weight increased, approximately as the square rootof the lubricant molecular weight [10], [11]. The longer the

lubricant molecule, the greater its vertical expansion is whenaffected by IMF. In summary, clearance will generally beenhanced by the use of a thin, low molecular weight lubricantfilm.

B. Head Factors

As described in discussion of the effects of disk roughness,the roughness of the slider affects clearance through two com-peting mechanisms. For the conventional HDI, the effects ofIMF are most significant, so increased roughness reduces inter-action forces between the slider and disk with consequent ben-efits for HDI reliability [12].

The air bearing is a critical technology for maintaining con-sistent clearance throughout the HDD operating life, despiterapid changes in radial position on the disk, changing environ-mental factors, and manufacturing tolerances. One considera-tion in the design of air bearings is the dynamic pitch angle,defined as the angle between the plane of the disk and the flyingslider surface. The pitch angle is largely responsible for es-tablishing the air squeeze film which produces lift. Recently,because intermolecular forces have become stronger at lowerclearance, these forces are an important consideration in the de-sign of pitch angle. At lower pitch angles, a greater portion ofthe slider is close enough to the disk for IMF to be significant,and clearance is consequently reduced. Conversely, higher pitchangles can increase the effective slider–disk clearance. Largerpitch angles have consequently been associated with greater re-liability through on-track durability tests [13].

C. Environmental Factors

Temperature, air pressure, and humidity are the environ-mental parameters that commonly affect HDI clearance.Temperature affects the shape of the slider and disk as aconsequence of thermal expansion of materials. The viscosityand density of air, and the energetics of the lubricant also areaffected by temperature. The typical net effect is a reduction ofclearance with increasing temperature [14].

Significant changes in air pressure most commonly occurbecause of a change in altitude. Air entrained between thehead and disk generates lift that keeps the head and disk sep-arated (Fig. 1). If ambient pressure decreases, then air densitydecreases, and the lift provided by the entrained air weakens.Clearance consequently decreases, creating a thinner squeezefilm to compensate for the lower ambient air density.

The Role of Water: HDD reliability can be severely reducedwhen operated under conditions of high humidity and high tem-perature. As an illustration of this effect, Fig. 5 shows the cumu-lative failure rates for a HDD product as a function of relativehumidity (RH) at high temperature. The reason for increasedfailure rate at high RH has in the past been attributed to con-tact tribology issues associated with starting and stopping theHDD [15], [16]. But the reliability of today’s HDD products issignificantly affected by RH even when operated continuously.As described below, the increased failure rate with RH resultsbecause water vapor is largely ineffective at supporting the airbearing.

STROM et al.: HARD DISK DRIVE RELIABILITY MODELING AND FAILURE PREDICTION 3679

Fig. 5. Cumulative failure rates as they depend on humidity, at 70 C tempera-ture, shown for two groups of HDDs tested on different occasions. The numberof samples was 7 (8) for squares (circles). The failure mode in each case was adata error.

Relative humidity is a measure of the water concentration inthe air at a given temperature. Mathematically, we can expressthe relative humidity (RH) as

% (1)

where is the saturation vapor pressure. The saturation vaporpressure is the maximum amount of water vapor that can besupported by the air at a given temperature. An empirical ex-pression relating these two quantities can be written as [17]:

(2)

where is given in units of atmospheres, and is the tem-perature in degrees Celsius. Inspection of this equation, a psy-chrometric chart, or chemical table [18] reveals that the vaporpressure of water in saturated air increases dramatically withtemperature. By reference to (1), the partial pressure of water ata given RH shows a similar sensitivity to temperature.

Should the water partial pressure exceed the saturation vaporpressure, then by definition there is more water in the air than isthermodynamically stable, and water condenses.

We have recently found that the water vapor in a HDD is rou-tinely compressed beyond the saturation vapor pressure whensubjected to the high compression in the squeeze film of the airbearing compression is typical). Consider, for example,a HDD operating at ambient pressure (1 atm), 50 C, and 50%RH. The saturation vapor pressure of water at C is

atm. At 50% RH, the partial pressure of water willbe atm. Following Dalton’sLaw, the total pressure can be written in terms of the partial pres-sures of all the gaseous constituents [16]:

(3)

TABLE IAPPROXIMATE PRESSURES (IN atm) OF GASEOUS COMPONENTS: (A) INSIDE

THE DRIVE INTERIOR, (B) FOLLOWING COMPRESSION UNDER THE REAR PAD

OF FLYING SLIDER, AND (C) AFTER COALESCENCE OF THE WATER VAPOR

DUE TO SUPERSATURATION

If we approximate dry air to be comprised of 80% nitrogenand 20% oxygen, then we have the partial pressures shown inTable I.

Considering a 10 compression typical for a common airbearing, and assuming the saturation vapor pressure at a giventemperature is independent of the external pressure, it followsthat the water vapor in the compression zone becomes supersat-urated. Water molecules flowing under the slider are thereforethermodynamically driven to coalesce until the partial pressureof water in the compression zone is reduced to . In thecurrent example, coalescence of the water vapor results in a 5%drop in the total pressure (Table I).

This condensation process results in lower clearance, ac-cording to details which are published separately [19], [20].

III. EXPERIMENTAL PROCEDURE

The theoretical considerations described above predict theeffects of HDI design and environment on HDD reliability,through their effects on clearance. Many of these theoreticalrelationships have been established through careful study ofthe behavior of HDI components, or of model systems, not ofcomplete HDDs. To verify the validity of these predictions, wemeasure the product clearance and reliability independentlyand on a statistically significant number of HDD samples. Thediscussion below concerns the methods used to collect thesedata.

A. Clearance Measurements

Several methods have been proposed to measure clearance[13], [21]–[25]. Some require an external sensor such as anacoustic emission sensor; others employ transducers internal tothe HDD. Our chosen method bears some similarity to that em-ployed in [13], and uses an internal sensor (the data reader) andan external instrument for signal processing (a digital oscillo-scope).

Our HDD clearance measurement system, called the Alti-tude Clearance Tester, includes the following hardware: a gen-eral-purpose computer with ATA interface card, a digital os-cilloscope, differential probe, vacuum chamber with controller,and the HDD sample itself (Fig. 6). Most of the software con-trolling the HDD, oscilloscope, and vacuum chamber controllerresides on the computer. Communication with the HDD is ac-complished through its ATA interface, while communication

3680 IEEE TRANSACTIONS ON MAGNETICS, VOL. 43, NO. 9, SEPTEMBER 2007

Fig. 6. Diagram of altitude clearance tester.

with the oscilloscope and vacuum controller are accomplishedthrough TCP-IP and RS-232 interfaces, respectively.

The process flow for the Altitude Clearance Test is shown inFig. 7. First, the temperature and humidity conditions of interestare established in the chamber containing the HDD. The clear-ance measurements proceed by first writing a regular, short-wavelength (about 0.15 m) data pattern to the media usinga specified head at the radial location of interest. The head’sreader is then positioned over this track, and the read-back am-plitude is probed directly downstream from the HDD’s pream-plifier and processed by the oscilloscope. Subsequent measure-ments are conducted at successively lower air pressure. Theair-bearing response to decreasing air pressure results in lowerclearance following the mechanism described in Section II-C.This smaller head–disk spacing is characterized by higher signalamplitude in accordance with the Wallace spacing loss equation[20]

(4)

where is the head–media spacing, is the wavelength ofthe recorded data pattern, is the signal amplitude, andis a constant value that depends on static properties of therecording system. Applying (4) to two operating conditionsand computing the difference in head–media spacing for eachleads to the relation

(5)

which allows us to compute the change in spacing between anytwo pressures.

Fig. 8 shows a typical pattern of head–media spacing changesin response to pressure changes for one HDD having six heads.The magnetic spacing decreases from its zero reference at101.3 kPa air pressure to as low as 6 nm, at which pointthe heads contact the disk and the signal cannot be properlyresolved. From this graph is obtained the clearance altitude sen-sitivity (0.13 nm/kPa in this case) as well as the total clearanceto disk contact (5–6 nm, depending on the head).

We measure the effect of temperature on clearance by re-peated measurements of total clearance as described above butat several temperatures. Fig. 9 shows the temperature sensi-tivity of total clearance for several heads from similar HDDs

Fig. 7. Sequence of events in altitude clearance test.

Fig. 8. Pressure (altitude) sensitivity of clearance showing average sensitivitya = 0:13 nm/kPa and total clearance 5–6 nm.

Fig. 9. Clearance sensitivity to temperature measured on 18 HDIs showing av-erage sensitivity b = �0:035 nm/C.

( 0.035 nm/C). Many temperature effects combine to producethis overall result: lubricant response, slider deformation, diskdistortion, etc.

B. HDD Reliability Tests

Several HDDs are installed in an environmental chamber,each HDD being connected to a central controller providingboth power and data communication. For this study, two testsequences were used to characterize the reliability of a HDDsample population: Operational Altitude Test and CSS test.

The Operational Altitude Test comprises a sequence of envi-ronmental conditions and operational tasks described in Fig. 10.A sample population of HDDs is subjected to ever lower airpressure (thus simulating higher altitudes) while continuouslyreading and writing at randomly chosen portions of the HDD’sstorage capacity. As the environment steps to higher altitudelevels, clearance decreases, increasing the likelihood of failure

STROM et al.: HARD DISK DRIVE RELIABILITY MODELING AND FAILURE PREDICTION 3681

Fig. 10. Operational altitude test profile. Drives spend 7 h at each stage beforeprogressing to the next.

Fig. 11. 50 k CSS test under 70 C/80% RH condition. Total test time at 70 C,80% RH is 14 days. Preceding and succeeding operations each last severalhours.

for HDIs having the lowest clearance in the population. Typi-cally, an increasing number of HDDs fail as altitude increases,as described in Section II-C, and as illustrated in Fig. 16.

The Contact Start Stop (CSS) test, described schematically inFig. 11, includes power cycle stress as well as sequential readand write tasks. After first collecting performance data at mod-erate environmental conditions, the environment is changed toa more extreme condition (e.g., 70 C, 80% RH) for extendedpower cycling interrupted by occasional read and write tasks. Inthis test, stress remains constant over time; HDI failure occursas a consequence of extended exposure to the stress.

IV. A MODEL FOR HDD RELIABILITY

Our primary objective is to develop a statistical model ofthe HDD reliability under the influence of all significant envi-ronmental and operational parameters. The benefits of such amodel are: improved diagnosis of product reliability problems,improved capability to design for reliability using conventionaltechnology, and the ability to evaluate candidate HDD technolo-gies for improved reliability through a simulated technologyintegration.

A. Structure of the Model

Since clearance is the key factor determining the mechan-ical reliability of the HDI, we place population clearance sta-tistics at the center of the model. We begin by specifying an ini-tial clearance distribution for the population, characterized forfixed, moderate environment and operating conditions. This ini-tial clearance distribution is represented by sample populationdata of the type shown in Fig. 12.

Fig. 12. Anderson–Darling normal probability plot of initial clearance distri-bution of the target product, measured while operating in read mode at 25 C,20% RH. Insert shows clearance mean (5.4 nm) and standard deviation (0.6 nm).

Each member of the HDD population operating under theseconditions is characterized by initial clearance . Any devia-tion in environment or operating condition from its initial stateresults in a new clearance, , such that

(6)

where , and are temperature, total air pressure, andwater partial pressure, respectively, and is a function of theHDD operating condition. The sensitivities and are deter-mined from controlled experiments (described in Section III-A)as represented in Figs. 8 and 9. Similarly, the details ofare defined through experiments on a sample HDD populationunder controlled conditions, for example to include the effectsof operating in write mode versus read mode.

Each parameter and sensitivity in (6) is associated with adistribution of values, because the response of each head–diskinterface is slightly different. For example, the distribution oftemperature sensitivity of our six-headed HDD has mean value

0.035 nm/C and standard deviation 0.014 nm/C, assuming anormal distribution (Fig. 9). To determine the population re-sponse to a given input change in environment or operatingcondition, the initial clearance distributions and distributions ofclearance changes are combined by the Monte Carlo method.The new clearance distribution thus calculated represents theHDD population operating at the new, fixed condition, or dis-tribution of conditions (Fig. 13).

Table II and Fig. 14 show the effect of selected changes toenvironment on the population clearance distribution. Eachchange both broadens the distribution and shifts its mean tolower values.

B. Failure Prediction

As described in Section I-B, it stands to reason that an HDIoperating at negative clearance will eventually fail. Indeed, theHDI may fail if it should operate at small, positive clearancevalues, because of intermittent disturbances in the HDI thatcause contact when clearance is small. Such disturbances couldinclude operating vibration, lubricant transfer between sliderand disk, or others. The threshold for failure will depend onthe design of the head–disk interface, and may also depend

3682 IEEE TRANSACTIONS ON MAGNETICS, VOL. 43, NO. 9, SEPTEMBER 2007

Fig. 13. Graphical representation of the model structure, showing the distribu-tion of sensitivities combined through (6) to predict clearance distributions atany operating condition.

TABLE IIEFFECT OF SELECTED PARAMETERS ON FLYING CLEARANCE STATISTICS,

FURTHER ILLUSTRATED IN FIG. 14

Fig. 14. Effect of temperature and altitude changes on flying clearance distri-butions, as predicted by the model. “Ambient” is initial clearance distribution at25 C, sea level; temperature and altitude changes are to 70 C, 10 000 ft. Foreach condition, (Mean, Standard Deviation) are: Ambient (5.41, 0.62), Temp(3.83, 0.91), Altitude (1.31, 0.92), Temp & Altitude (�0.27, 1.13). The popula-tion comprised 2 000 samples.

on environmental conditions. Ultimately, the failure thresholdmust be determined by reconciling the model results with HDDreliability test data.

For that purpose we consider reliability test data ac-quired through the Operational Altitude Test (described inSection III-B), conducted at different temperatures. We applyour model to the same environmental conditions as used in thetest and determine the clearance level associated with failure.

Considering the tests conducted at 25 C, we evaluate threecandidate values of the clearance limit, i.e., the minimum clear-ance value for reliable operation. Compared to the experimentaldata plotted in Fig. 15, it appears the best fit is obtained if weassume model HDDs fail at a clearance limit 0.4 nm. That is,the model HDD can survive the Operation Altitude Test with upto 0.4 nm interference. For tests conducted at 60 C, the best

Fig. 15. Model failure predictions for three chosen values of clearance limit:0, �0:4, and �1:0 nm, plotted as a function of altitude. Also shown are theOperational Altitude Test failure data. The temperature for both model and testwas 25 C.

Fig. 16. Operational Test Failure data and best-fit model results for tempera-tures (a) 60 C, clearance limit �1.0 nm and (b) 0 C, clearance limit 0.5 nm.More clearance margin may be required at lower temperatures to avoid failure.

fit is obtained when the model failure criterion is selected as:nm, whereas at 0 C, failure occurs when nm

(Fig. 16). For tests conducted at successively lower tempera-tures, the best-fit failure criterion is found to match successivelyhigher clearance values. Apparently, the HDD product requiresgreater clearance for reliable operation at lower temperatures.

We conclude from these results that the minimum clearancerequired for successful HDD operation is temperature depen-dent. This is the first report of any such phenomenon. To un-derstand this temperature dependence, we examine the detailsof the HDD failure mode (Table III): At high temperatures, fail-ures are due to uncorrectable errors, which are events where anerror is detected and confirmed on each re-try. At lower tem-peratures, more failures are due to correctable errors, which areevents where an error is detected, but can be recovered upon

STROM et al.: HARD DISK DRIVE RELIABILITY MODELING AND FAILURE PREDICTION 3683

TABLE IIIOPERATIONAL ALTITUDE EXPERIMENT DATA UNDER DIFFERENT TEMPERATURE

CONDITIONS

Fig. 17. Clearance limit for reliable operation as a function of temperature,determined from Operational Altitude Test results. Greater clearance is neededfor reliable operation at lower temperatures.

TABLE IVEXTENDED 50 K CSS EXPERIMENT DATA AND MODEL PREDICTIONS UNDER

DIFFERENT TEMPERATURE AND HUMIDITY

re-try. Intermittent contacts at 0.5 nm clearance are apparentlymore severe at lower temperatures, possibly due to stronger sur-face attractive forces at lower temperatures [27], [28]. At highertemperatures, such intermittent contacts may occur at 0.5 nmclearance, but are not severe enough to cause errors.

Having established the failure criteria for these three oper-ating temperatures, failure criteria at other temperatures canbe inferred from the curve in Fig. 17. For further confirmationof the model’s capacity to predict the population reliability,we consider results from CSS Tests of the kind described inSection III-B, and summarize the results in Table IV. At 0 Ctemperature, the model shows that 0.65% of HDIs are operatingbelow 0.5 nm clearance. Assuming any HDI operating with thisclearance margin will fail, we calculate that HDDs containingsix heads will fail at the rate 3.8%, which matches closely theHDD sample failure rate 1/29, or 3.5%. At 70 C temperature,80% RH, the model shows 3.0% of HDI’s are operating below

1.3 nm clearance, the clearance limit at 70 C extrapolatedfrom the experiments at lower temperatures. The impliedfailure rate 16.7% for six-headed HDDs matches the actualHDD failure rate 2/10 or 20%.

The success of the model predicting failure for HDDs oper-ating at such variable environmental and operating conditionsconfirms the central thesis that clearance is the primary factor

TABLE VNOMENCLATURE

determining HDI reliability. This success also justifies using thismodel as the central design tool for reliable HDD products.

V. CONCLUSION

A reliability model for the HDD, until recently beyond the ca-pabilities of applied science, is now within reach. The first stepstoward producing a truly predictive model are described in thecurrent work. The critical elements put forward are: 1) a theoret-ical framework necessary to predict the impact of some slider,disk and environmental conditions on the resulting clearance;and 2) an experimental methodology that is sensitive enough tomeasure the nanoscopic separations characteristic of the headdisk interface. This clearance measurement methodology, alongwith HDD reliability test results, provide the data by which dif-ferent aspects of the theoretical model can either be verified,modified, or discarded.

While development of the reliability model is still in its in-fancy, the understanding derived to date allows us to target de-sign parameters for greatest impact on HDD reliability. Specif-ically, our work suggests that sustained effort needs to be de-voted to: a) alternative ABS designs with less dependence onthe pressure drops associated high altitude or humid conditions;b) improved design tolerances on incoming components, e.g.,flying height variance on heads; and c) selection of materials thatminimize the impact of the intermolecular attractive forces, thusminimizing the frequency and severity of head–disk contacts.

An important outcome of this work is the novel capabilityto quantitatively evaluate the impact of various design alterna-tives on the resulting clearance, and hence the HDD failure rates.Having such a capability will reduce the development time, andhence cost, of bringing new products to market while ensuringsatisfactory reliability.

3684 IEEE TRANSACTIONS ON MAGNETICS, VOL. 43, NO. 9, SEPTEMBER 2007

REFERENCES

[1] D. P. Danson, “Pseudo-contact recording,” IDEMA Insight, vol. 9,1996.

[2] W. C. Cain, “Modeling of various magnetoresistive head designs forcontact recording,” IEEE Trans. Magn., vol. 31, no. 6, pp. 2645–2647,Nov. 1995.

[3] H. Hamilton, R. Anderson, and K. Goodson, “Contact perpendicularrecording on rigid media,” IEEE Trans. Magn., vol. 27, no. 6, pp.4921–4926, Nov. 1991.

[4] Y. Li and A. R. Kumaran, “The determination of flash temperature inintermittent magnetic head/disk contacts using magnetoresistive heads:Part 1—Model and laser simulation,” Trans. ASME J. Tribol., vol. 115,pp. 170–184, 1993.

[5] J. K. Spong, M. M. Dovek, and G. Vurens, “Mechanically-inducedreadback errors in contact recording,” IEEE Trans. Magn., vol. 30, no.6, pp. 4152–4154, Nov. 1994.

[6] E. Grochowsky, Data Storage Industry Roadmap 2003 [Online]. Avail-able: http://www.hitachigst.com/hdd/hddpdf/tech/hdd_technology2003.pdf

[7] A. Khurshudov and V. Raman, “Roughness effects on head-disk inter-face durability and reliability,” Trib. Int., vol. 38, no. 6–7, pp. 646–651,2005.

[8] R. J. Waltman and G. W. Tyndall, “Lubricant and overcoat systems forrigid magnetic recording media,” J. Magn. Soc. Jpn., vol. 26, no. 3, pp.97–107, 2002.

[9] R. J. Waltman, A. Khurshudov, and G. Tyndall, “Autophobic dewet-ting of perfluoropolyether films on amorphous-nitrogenated carbonsurfaces,” Trib. Lett., vol. 12, no. 3, pp. 163–169, 2002.

[10] A. Khurshudov and R. J. Waltman, “The contribution of thin PFPElubricants to slider—Disk spacing,” Trib. Lett., vol. 11, no. 34, pp.143–149, 2001.

[11] R. J. Waltman and A. Khurshudov, “The contribution of thin PFPE lu-bricants to slider-disk spacing. 2. Effect of film thickness and lubricantend groups,” Trib. Lett., vol. 13, no. 3, pp. 197–202, 2002.

[12] L. Zhou, M. Beck, H. H. Gatzen, K. Kato, and F. E. Talke, “Tribologyof textured sliders in near contact situations,” IEEE Trans. Magn., vol.38, no. 5, pp. 2159–2161, Sep. 2002.

[13] K. Schouterden, M. Suk, and V. Raman, “Influence of slider air-bearingdesign on disk effective take-off height,” in MIPE’03 Conf. Dig., Yoko-hama, Japan, JAST, 2003.

[14] D. W. Meyer, “Slider with temperature responsive transducer posi-tioning,” U.S. Patent 5 991 113, Nov. 23, 1999.

[15] Y. Kokaku and M. Kitoh, “Influence of exposure to an atmosphere ofhigh relative humidity on tribological properties of diamondlike carbonfilms,” J. Vac. Sci. Technol. A, vol. 7, no. 3, 1989.

[16] R. G. Walmsley, B. Natarajun, and J. Brandt, “Temperature and hu-midity effects in lubricant loss and recovery,” in Proc. IEEE Int. Conf.Magnetics, 1993.

[17] A. L. Buck, “New equations for computing vapor pressure and en-hancement factor,” J. Appl. Meterorol., vol. 20, p. 1527, 1981.

[18] “CRC Handbook of Chemistry and Physics,” 59th ed. 1979, D-232.[19] B. D. Strom, S. Zhang, S. C. Lee, A. Khurshudov, and G. W. Tyndall,

“Effects of humid air on air bearing flying height,” IEEE Trans. Magn.,vol. 43, no. 7, pp. 3301–3304, Jul. 2007.

[20] S. Zhang, B. D. Strom, S. C. Lee, and G. W. Tyndall, “Calculating airbearing pressure and flying height in a humid environment,” in Proc.APMRC, 2006.

[21] W. K. Shi, L. Y. Zhu, and D. B. Bogy, “Use of readback signal modu-lation to measure head/disk spacing variations in magnetic disk files,”IEEE Trans. Magn., vol. MAG-23, no. 1, pp. 233–240, Jan. 1987.

[22] K. B. Klaassen and J. C. K. van Peppen, “Slider-disk clearance mea-surements in magnetic disk drives using the readback transducer,” IEEETrans. Instrum. Meas., vol. 43, no. 2, pp. 121–126, Apr. 1994.

[23] A. Khurshudov and P. Ivett, “Head-disk contact detection in the hard-disk drives,” Wear, vol. 255, no. 7–12, pp. 1314–1322, 2003.

[24] G. R. Briggs and P. G. Herkart, “Unshielded capacitor probe techniquefor determining disk file ceramic slider flying characteristics,” IEEETrans. Magn., vol. MAG-7, no. 3, pp. 428–431, Sep. 1971.

[25] M. Suk, T. Ishii, and D. B. Bogy, “Evaluation of capacitance displace-ment sensors used for slider-disk spacing measurements in magneticdisk drives,” IEEE Trans. Magn., vol. 28, no. 5, pp. 2542–2544, Sep.1992.

[26] R. I. Wallace, Jr., “The reproduction of magnetically recorded signals,”Bell Tech. J., pp. 1145–1173, 1951.

[27] J. N. Israelachvilli, Intermolecular and Surface Forces, 2nd ed. NewYork: Academic, 1991.

[28] A. W. Adamson and A. P. Gast, Physical Chemistry of Surfaces, 6thed. New York: Wiley, 1997

.

Manuscript received December 15, 2006. Corresponding author: B. Strom(e-mail: [email protected]).