2009-01-3274

8/12/2019 2009-01-3274

1/7

2009-01-3274

The Systems Engineering Relationship between Qualification, EnvironmentalStress Screening and Reliability

James A. RoblesThe Boeing Company

Copyright 2009 SAE International

ABSTRACT

The Systems Engineering Relationship betweenQualification, Environmental Stress Screening (ESS),and Reliability is often poorly understood: as aconsequence resources are expended on efforts thatdegrade inherent hardware reliability and vitiatereliability predictions. This article expatiates on theSystems Engineering relationship between Qualificationand ESS, and how their proper application enhancesinherent reliability and supports credible reliabilitypredictions. Examples of how their uninformedapplication degrades inherent hardware reliability andvitiates reliability predictions, and howprogram/equipment managers can avoid this, arepresented.

INTRODUCTION

There is a problem with the reliability of recently fieldedsystems: Department of Defense (DoD) concerns havebeen widely reported.

Emerging data shows that a significant numberof U.S. Army systems are failing to demonstrateestablished reliability requirements duringoperational testing and many of these are fallingwell short of their established requirement. . . .Enclosure 1 outlines the process for establishingand reporting the new reliability threshold, aswell as a mechanism for detecting and reportingthreshold breaches. The routine use of this

process and the implementation of reliabilitybest practices (Enclosure 2) will help the Armyachieve its reliability requirements. 1

Ensure programs are formulated to execute aviable systems engineering strategy from thebeginning, including a RAM growth program, asan integral part of design and development. 2

I share your concerns regarding the recentdownward trend of reliability, availability, andmaintainability (RAM) test results, and agreewith your assessment that RAM considerationsmust be strengthened as our weapons systemsmove through the development and production

phases and into operational service. 3

This is not, as discussed below, a commercial-off-the-shelf (COTS) vs. custom or military specificationdesign issue: the focus of the above references is onProgram Management and System Engineeringprocesses and best practices.

This focus on processes and practices is a positivedevelopment; however it is essential that we get thecontent right. Two areas where there seems to bewidespread failure to do so are the definition of durabilityenvironments, and the application of ESS: showing whythis is so requires some review of fatigue engineeringprinciples, the bathtub curve, and the limitations of ourreliability prediction methods.

FATIGUE ENGINEERING PRINCIPLES

Materials will fatigue 4, 5, 6, 7 under the repeatedapplication of stresses and strains that do not causefailure on the first application: imagine bending a paperclip back and forth. For engineering materials thestress level (SL) vs. cycles to failure typically plots as astraight line on a log-log scale: this is a power law

SAE Int. J. of Aerosp. | Volume 2 | Issue 1268

8/12/2019 2009-01-3274

2/7

relationship (see ARP 5890 8, Equation B-9). The stresslevel may be expressed as pounds per square inch, asstrain (inch per inch), power spectral density (PSD) forrandom vibration, or as the magnitude of a temperaturecycle, etc. The cycles to failure may be expressed ascycles or as time (assuming a consistent cyclic rate).

There is scatter in the data. The time or cycles to failurefor a number of identical samples tested at the samestress level will form a Gaussian distribution. Thisscatter is due to the variety of defects in each sample:as a consequence larger samples, with a greaterprobability of containing a more severe defect, will failsooner.

We use Miners Rule 9 to determine fatigue damageaccumulation and a Composite Damage Index (CDI) foritems subjected to combinations of different stresslevels.

CDI = the sum of n x/Nxnx = number of applied stress cycles at stresslevel xNx = number of cycles to failure at stress level

x

Failure is expected to occur when the CDI isapproximately equal to one.

THE BATHTUB CURVE

The bathtub curve 10 , describing failure rate as a functionof time, is described in a number of sources and shownin Figure 1. This bathtub curve can be used to describea range of phenomena including human death rates as afunction of age, and electronic failure rates as a functionof time.

The Infant Mortality portion of the curve is the initialsection for which the failure (death) rate decreases withtime (age). For military electronics this higher initialfailure rate is purported to be due to latentmanufacturing defects. Environmental Stress Screening(ESS), comprising random vibration and temperaturecycling, is used to precipitate these defects as failuresso that they can be repaired to produce items withoutinfant mortality defects.

The Constant Failure Rate portion of the curve is thesection after Infant Mortality defects have been

eliminated, but before Wearout has begun to occur.Failures are random. This is the period for whichConstant Failure Rate statistical prediction techniques(MIL-HDBK-217 11 , VITA 51.1 12 , etc.) have some validity.

The Wearout portion of the curve is the last sectionand has a Gaussian distribution that goes to zero whenthe last item in a set starting population has failed.Failures in this portion of the curve are due to fatigue,and follow the Gaussian distribution that was previouslydiscussed for fatigue phenomena. Durability 8 verification(analysis and/or test) during item Qualification, as shown

in Topic 2.2 Bathtub Curve 10 , is commonly used todemonstrate that wearout will not occur during theplanned life of the item.

Typically, reliability analysis is aimed atassessing the random failures that will occur inthe equipment during its useful life. Thesefailures are usually assumed to be repairable,and may be due to a variety of causes, such asdefects in the equipment, improper use, damagedue to unusual conditions, inadequatemaintenance, etc. Durability analysis, on theother hand, assesses failures due to wearout ofcertain elements of the design. 8

LIMITATIONS OF THE RELIABILITYPREDICTION PROCESS

MIL-HDBK-217 (as do most similar analysis techniques)relies on a number of assumptions, two of which aregermane here: 1) infant mortality failures have beeneliminated by good process control, or screened out byan effective ESS program that consumes a relativelysmall tranche of demonstrated life, and 2) the period ofperformance, after ESS, is within the demonstrated lifeof the item, so that wearout failures will not occur:these analysis techniques typically do not modelwearout mechanisms 13 .

Selection of the appropriate MIL-HDBK-217 PiE-factorwill not remotely compensate for the failure toadequately specify durability environments. The PiE-factor ratios assume, as does everything in the MIL-HDBK-217 methodology, that durability has beendemonstrated and that the item is in the constant failurerate portion of the bathtub curve: they do not accountfor limited life due to wearout.

DURABILITY ENVIRONMENTS

The salient contributors to equipment durabilityenvironments are vibration (high cycle fatigue) andtemperature (low cycle fatigue). The deleterious effectsof these environments are widely understood, and havebeen thoroughly investigated in a number of venues.

AVIP also broadens the tools and focus ofelectronic packaging design to address the lifecycle issues through fatigue analysis 14

Complexities include 1) Surface MountTechnology, . . . Thermal cycling fatigue life ofelectronics was improved through 1) Coefficientof thermal expansion (CTE) matching, 2) Omegaand other strain relieving lead wire designs forlarge devices, 3) Plated Thru hole improvements. . . The F-22 programs typical durability life testrequires from 500 to 1500 thermal cycles on oneunit. . . . The design analysis includedconsideration of the damaging effects onelectronics from thermal fatigue. 15

SAE Int. J. of Aerosp. | Volume 2 | Issue 1 269

8/12/2019 2009-01-3274

3/7

ConstantFailureRate

InfantMortality Wearout

Figure 1 The Bathtub Curve

Figure 2 The Bathtub Curve with 95% to 100% of Demonstrated Life consumed by ESS

FailureRate

Time

ESS Wearout

FailureRate

Time

ESS Qualification Margin Against Wearout

Margin on Elimination of Infant Mortality Failures

Table 1 -- Vibration Durability Life Consumed by ESSESS Durability Percent of

Demonstrated Life

Consumed byESS

Duration(Minutes)

PSD(g2/Hz)

Duration x

PSD^4

Duration(Minutes ) PSD(g2/Hz)

Duration xPSD^4

10 0.04 2.56E-05 300

0.002 4.8E-09 99.98%0.004 7.68E-08 99.70%0.008 1.2288E-06 95%0.016 1.96608E-05 57%

0.032 0.000314573 8%0.040 0.000768 3%

0.064 0.005033165 1%

0.128 0.080530637 0.03%


8/12/2019 2009-01-3274

4/7

The Bolton Memorandum 1 Enclosure 2 Reliability BestPractices also confirms the need to address the fatigueaspects of both thermal and vibration fatigue.

The supplier routinely conducts thermal andvibration analyses to address potential failuremechanisms and failure sites (i.e., a physics-of-failure approach to reliable design). Theseanalyses would likely include the use of fatigueanalysis tools, finite element modeling, dynamicsimulation, heat transfer analyses, etc. 1

Appendix 1.4 Standard Evaluation Criteria, Reliability Analysis

Comprehensive Thermal and Vibration analysesand/or Finite Element Analyses (FEA) areconducted to address potential failuremechanisms and failure sites.

From ANSI/GEIA-STD-009, Figure 1

Engineering analysis and test data indentifyingthe system/product failure modes anddistributions that will result from the life-cycleloads. 16

Ideally the durability environments should be derivedfrom the planned usage of the item. The BoltonMemorandum 1 Enclosure 2 Reliability Best Practicesalso affirms that

The supplier has characterized the critical loadsand stresses. A good design team willcharacterize the life cycle environment and

operational duty cycle stresses that theircomponents will see.

From the Halpin Highlights 14

a. Realistic systems requirements derived fromthe users intended application.b. Through understanding of operational usageand environments.

Enclosure 1. Section C Statement of Work ReliabilityLanguage and Tailoring Instructions, 4. System-LevelOperational & Environmental Life-Cycle Loads.

The contractor shall estimate and periodicallyupdate the operational & environmental loads(e.g., mechanical shock, vibration, andtemperature cycling) that the system is expectedto encounter in actual usage throughout the lifecycle.

From ANSI/GEIA-STD-009, Figure 2

User and environmental profile that defines thesystem/products life cycle (operating and non-

operating environments, expected operating andnon-operating times, etc.). 16

From ANSI/GEIA-STD-009, 4.5.1.4

The developer shall estimate the user andenvironmental loads (e.g., mechanical shock,vibration, and temperature/humidity cycles . . .)

The temperature cycling fatigue environment is usuallythe result of the combination of diurnal nighttime lowtemperatures; and the maximum temperature achievedat each potential failure site (solder joint, componentlead, etc.) as a result of diurnal daytime hightemperatures, cooling system performance, operationalcycles and equipment power on-off cycles. Experienceon programs where Durability fatigue analyses havebeen conducted, and validated, show that thetemperature cycling fatigue contribution is typicallyeighty percent (80%) to ninety percent (90%) of theComposite Damage Index (CDI): this is true even forplatforms with relatively severe vibration environments.

Vibration and Temperature Cycling Environments areOrthogonal to Each Other - In the case of a circuit cardassembly (CCA) vibration fatigue (primarily componentleads and solder joints) is typically due to the flexure(Figure 3) of the CCA, perpendicular to the plane of theCCA: as the CCA flexes repeatedly the strains imposedon the component leads and solder joints lead to theaccumulation of fatigue damage.

Again in the case of a CCA temperature cycling fatigue(again primarily component leads and solder joints) isdue coefficient of thermal expansion (CTE) mismatch(Figure 4) between the component and the CCA in theplane of the CCA: as the CCA goes through repeatedthermal cycles the strains imposed on the component

leads and solder joints lead to the accumulation offatigue damage.

Figure 3CCA Flexure in Vibration

Figure 4CTE Mismatch Strains Leads and Solder Joints


8/12/2019 2009-01-3274

5/7

Changes to improve performance in one durabilityenvironment can degrade performance in the otherdurability environment. For example, stiffening the cardto improve vibration performance, could degradeperformance in temperature cycling. It follows that longlife in one durability environment does not imply any lifein the other.

ENVIRONMENTAL STRESS SCREENING

As noted above the intent of ESS is to precipitate infantmortality (latent manufacturing) flaws so that they can berepaired, and the fielded item will be at the beginning ofthe flat (Constant Failure Rate) portion of the bathtubcurve.

A long-standing industry rule of thumb holds that powerspectral density (PSD) levels below 0.04 g 2/Hz 17 areinsufficient to precipitate flaws: vibration at lower levelsis otiose.

We have another industry rule of thumb that ESS shouldnot consume more than five percent (5%) of thedemonstrated durability life of the item: this is toincrease the probability that the item remains on the flatportion of the Bathtub Curve (Figure 1) for its planneduseful life.

Table 1 uses the equation from MIL-HDBK-810F 18 ,Paragraph 2.2 Fatigue Relationship to determine thepercentage of demonstrated durability life consumed byESS on a hypothetical program.

For this hypothetical program ESS is performed for 10minutes at 0.04 g 2/Hz. Durability vibration testing isconducted for five (5) hours (300 minutes) at differentlevels depending on the item installation zone. In thishypothetical case conducting ESS for items installed ininstallation zones with PSDs of 0.04g 2/Hz or above may,assuming that the items do have infant mortality defects,make sense. For items installed in the zones with lowerPSDs, the conduct of ESS is non-value added (thefield/durability vibration level is too low to precipitate anyinfant mortality defects), and deleterious (an excessiveportion of demonstrated durability vibration life isconsumed) to the items reliability.

ESS is an attempt to inspect in quality for lowproduction rate equipment. Defects in high production

rate equipment can be reduced or eliminated by theapplication of statistical process control and automation.High production rate equipment is far more likely to beCOTS than custom military specification design. Itfollows that COTS is far more likely to be defect free (atleast prior to ESS) than custom military specificationdesign.

One way to decompose reliability is into two questions.First, is the item inherently robust (durabilityenvironments address this) enough? Second, is theitem defect (ESS is intended to address this) free? We

have experience flying COTS items such as RicohPrinters, Sony Satellite Dish Receivers, and HP Servers(without conducting ESS) on military derivative aircraft:in this relatively benign environment (commercial aircraftconverted to a military application) these COTS itemshave proven to be considerably more reliable than themilitary specification Government Furnished Equipment(GFE). These COTS items are clearly not robustenough for severe environment platforms, such asfighter aircraft, but their reliable performance on militaryderivative aircraft confirms that ESS would be non-valueadded since field experience has shown these items tobe relatively free of infant mortality defects. In additiongiven that they were not designed for flight environment,ESS would be more likely to degrade reliability byconsuming an excessive portion of the items durabilitylife.

ESS vs. Burn-In - ESS is distinct from "Burn-In" whichis typically applied to components and subassemblies toaccelerate/screen time-temperature dependent thermallyactivated failure mechanisms that can be modeled using

Arrhenius relationship: this includes solid state reactionssuch as diffusion, grain growth etc. As with ESS, Burn-in is intended to ensure that components orsubassemblies start off in the constant failure rateregion. ESS focuses on thermo-mechanicalmechanisms that are related more to componentassembly (leads, solder joints, etc.), but does notnecessarily drive solid state (thermally activated)mechanisms to the constant failure rate region.

ESS DEGRADING INHERENT RELIABILITY ANDVITIATING RELIABILITY PREDICTIONS

Consider a hypothetical example from the hypotheticalprogram described above. As noted above the validityof our reliability predictions, for fielded items, rest on twoassumptions: 1) Infant mortality defects have beeneliminated; and 2) the item does not enter the wearoutportion of the Bathtub Curve during its planned usefullife.

In this hypothetical case:

The durability vibration requirement (Table 1) is five (5)hours at 0.008 g 2/Hz.

A durability temperature cycling requirement is not

specified.

The ESS requirement is five (5) minutes of randomvibration, followed by twelve thermal cycles, and thenanother five (5) minutes of random vibration. The lastfive (5) thermal cycles and the final five (5) minutes ofvibration must be failure free: there is no limit on thenumber of repeats allowed to achieve the required fivethermal cycles and five minutes of vibration failure free.


8/12/2019 2009-01-3274

6/7

An item that is no better than required by this set ofrequirements would be inherently unreliable the momentit was fielded.

In this hypothetical case, the item went though ESS,prior to qualification, without having to repeat the lastfive (5) temperature cycles or the final five (5) minutes ofvibration. The total demonstrated vibration durability lifeis the sum of five (5) hours (300 minutes) at 0.008 g 2/Hzand ten minutes at 0.040 g 2/Hz: as discussed above,assuming that a production unit went through ESSwithout having to repeat the last five (5) minute ofvibration, ninety-five percent (95%) of the demonstrateduseful life would have been consumed before the itemwas fielded. If the unit had to repeat (again, there is nolimit to how many times this could happen) the last fiveminutes of vibration, after correction of a failure, thenwell over 100% of demonstrated useful life would havebeen consumed.

For temperature cycling, even assuming that there areno repeated cycles following correction of a failure, atleast 100% of demonstrated useful life has beenconsumed when ESS is completed, since in the absenceof a durability temperature cycling requirement one passthrough ESS is all that is included in the demonstratedtemperature cycling durability life. If there are repeatESS cycles then the situation would be considerablyworse.

In this case, the bathtub curve would be as shown inFigure 2: the actual item might be better than therequirements, but there would be no evidence or data toshow that this is the case. The flat (constant failure rate)portion of the Bathtub Curve, where our reliabilitypredictions have some validity does not exist, so our

reliability prediction is vitiated. The inherent reliability ofthe unit has been degraded by the fatigue damage it hasaccumulated. In the case of vibration, this was done inthe attempt to eliminate latent defects that the field levelis too low to precipitate, thus artificially activating failuremechanisms not relevant to the field environment.

HOW PROGRAM/EQUIPMENT MANAGERS CANAVOID THESE PITFALLS

1. Durability environments must include vibration andtemperature cycling requirements that are consistentwith the planned usage and the planned useful life.Note: the temperature cycling verification does not haveto be an expensive test, but in many cases may beaccomplished by analysis or similarity.

2. ESS vibration and temperature cycling must belimited, in each case, to some small portion ofdemonstrated (typically five percent [5%]) useful life,including a specified number of allowed repeat/repaircycles.

3. Vibration ESS should not be conducted when thedurability vibration level is too low to precipitate InfantMortality (latent manufacturing) defects.

4. ESS should not be conducted on items (typicallyCOTS) that have been shown to be free of infantmortality (latent manufacturing) defects.

These measures can enhance reliability while reducingcost.

CONCLUSION

Finally, Program/Equipment Managers should have aUseful Life Strategy that reflects the expected fieldfatigue life of each class of items, and the customersdesire for technology insertion/refresh. For example, ifthe item can only be expected to survive three to fiveyears in the field and the customer desires technologyinsertion (how often do you replace your laptop?) everythree years, then attempting to ruggedize/qualify/ESSthe item for an longer life will only add cost whiledegrading reliability. The proper application ofqualification, ESS and reliability prediction methods todetermine a useful life strategy while avoiding thesystem engineering pitfalls described herein, willminimize total ownership cost while enhancingeffectiveness for the war fighter.

All activities, methods and tools used should beevaluated and applied in a manner that addsdemonstrated value to the program, at optimizedlife cycle cost and utilization of resources 16

. . . . (which may include COTS, NDI, and CFI,as well . . .) shall identify and confirm through

analysis, test, or accelerated test, the failuremodes and distributions that will result whenthese life-cycle loads are imposed on theseitems.

REFERENCES

1. MEMORANDUM FOR SEE DISTRIBUTION,SUBJECT: Reliability of U.S. Army MaterielSystems ; 06 DEC 2007; Claude M. Bolton Jr.;DEPARTMENT OF THE ARMY, Assistant Secretaryof the Army, Acquisition Logistics and Technology

2. MEMORANDUM FOR DIRECTOR, OPERATIONALTEST AND EVALUATION, DEPUTY UNDERSECRETARY OF DEFENSE FOR ACQUISITION

AND TECHNOLOGY; SUBJECT: Report ofReliability Improvement Working Group , Office of theSecretary of Defense

3. MEMORANDUM FOR UNDER SECRETARY OFDEFENSE (ACQUISITION, TECHNOLOGY, ANDLOGISTICS); SUBJECT: Reliability, Availability, andMaintainability Policy ; Department of the Air Force


8/12/2019 2009-01-3274

7/7

4. Robert C. Junvinall; Stress, Strain, and Strength;McGraw-Hill.

5. Joseph Edward Shigley, Mechanical Engineering Design, McGraw-Hill.

6. Joseph H. Faupel,\; Engineering Design ; John Wileyand Sons, Inc.

7. Edited by Rao R. Tummala and Eugene J.Rymaszewski Microelectronics Packaging Handbook , Van Nostrand Reinhold.

8. Guidelines for Preparing Reliability Assessment Plans for Electronic Engine Controls ; ARP 5890.9. Dave S. Steinberg, Vibration Analysis for Electronic

Equipment , John Wiley & Sons; Chapter 10Structural Fatigue.

10. RAC (Reliability Analysis Center) Reliability Toolkit:Commercial Practices Edition A Practical Guide for Commercial Products and Military Systems Under

Acquisition Reform .11. Military Handbook, Reliability Prediction of Electronic

Hardware ; MIL-HDBK-217F, Notice 2; Rome Air Development Center; 28 February 1995.

12. Reliability Prediction, MIL-HDBK-217, Subsidiary

Specification ; VITA 51.1; June 2008.13. Lori Bechtold, Physics of Failure in Handbook

Reliability Predictions , Components for Military &Space Electronics (CMSE), 2009.

14. Halpin, Dr. J. C.; Avionics/Electronics Integrity (AVIP) Highlights.

15. Glista, Stefan; Lessons Learned from the F-22 Avionics Integrity Program , 0-7803-5086-3 /98 IEEE.

16. ITAA Standard, Reliability Program Standard for Systems Design, Development, and Manufacturing;

ANSI/GEIA-STD-0009-2008; November 13, 2008.17. Navy Manufacturing Screening Program, Decrease

Corporate Costs, Increase Fleet Readiness ;Department of the Navy; NAVMAT P-9492; May1979.

18. Department of Defense Test Method Standard for Environmental Engineering Considerations andLaboratory Test ; MIL-STD-810F; 1 January 2000.

CONTACT

[email protected]

http s ://www. e-standard.org


2009-01-3274

Documents