the journal of the first quarter 1998 reliability analysis...

24
the Journal of the Reliability Analysis Center RAC is a DoD Information Analysis Center sponsored by the Defense Technical Information Center and operated by IIT Research Institute First Quarter 1998 Pioneering Reliability Laboratory Addresses Information Technology Written by the RAC Staff with the assistance of the Air Force Research Laboratory Information Directorate history office. The Air Force Research Laboratory Information Directorate at the Rome (NY) Research Site is the current name for a laboratory which has been the DoD focal point for electronic reliability since the 1950s. Established in 1951 as the Rome Air Development Center (RADC), becoming the Rome Laboratory (RL) in December 1990, and officially adopting its present title in October 1997, the organization has been the primary DoD electronic reliability research and development agency since reliability was recognized as an engineering discipline. It has twice received the annual award of the IEEE Reliability Society, in 1974 and in 1998. One of the first publications considering reliability as an engineering discipline was Reliability Factors for Ground Electronic Equipment, published by RADC in 1955. It’s author was Joseph J. Naresky (see biography on page 5), who created the laboratory’s reliability program and developed it from a personal commitment to a multi-faceted effort by about 100 specialists. Another of the first reliability efforts of RADC was the publication of reliability prediction models. An RCA document, TR 1100, became the basis of an RADC technical report which was the first military reference on failure rates of electronic components. A series of RADC Reliability Notebooks provided updates, until RADC became the preparing activity for MIL- HDBK-217, Reliability Prediction of Electronic Equipment. Prediction models for electronic parts became the specialty of Lester Gubbins and, on his retirement, Seymour Morris. The laboratory also produced references on failure rates for non-electronic parts, the prediction and demonstration of maintainability, and the use of Bayesian statistics for reliability demonstration. It was one of the first to develop the concept of system effectiveness, a figure of merit combining availability, reliability, and capability measures into a single measure of the overall worth of a system to its user. Early studies on operational influences on reliability foreshadowed more recent development of a means for translating desired operational parameters into reliability requirements. The original team created by Naresky for reliability technology development was a four man group: Edward Krzysiak (group leader), Anthony Coppola, Anthony D. Pettinato, Jr., and John Fuchs. As the number of reliability personnel grew, statistical studies in reliability were performed by various individuals and groups under the overall direction of David F. Barber and his successor Anthony J. Feduccia, division chiefs in a directorate headed by Naresky. In 1961, the laboratory began to create facilities for research in reliability physics. These Volume 6 Number 1 1-888-RAC-USER (1-888- 722-8737) Inside this Issue ISSRE ‘97................................ 7 Calls for Papers ...................... 9 Call for Memories ................... 9 New from RAC ....................... 10 Industry Brief ......................... 12 From the Editor ...................... 13 Letter to the Editor ................. 16 Some Observations on Demonstrating Availability. 17 Mark Your Calendar ............... 20 Help from RAC....................... 22 RAC Order Form ................... 23

Upload: others

Post on 26-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

the Journal of theReliability Analysis Center

RAC is a DoD Information Analysis Center sponsored by the Defense Technical Information Center and operated by IIT Research Institute

FFFFiiiirrrrsssstttt QQQQuuuuaaaarrrrtttteeeerrrr 1111999999998888

Pioneering Reliability Laboratory AddressesInformation Technology

Written by the RAC Staff with the assistance of the Air Force Research Laboratory Information Directorate history office.

The Air Force ResearchLaboratory InformationDirectorate at the Rome (NY)Research Site is the currentname for a laboratory whichhas been the DoD focal pointfor electronic reliability sincethe 1950s. Established in 1951as the Rome Air DevelopmentCenter (RADC), becoming theRome Laboratory (RL) inDecember 1990, and officiallyadopting its present title inOctober 1997, the organizationhas been the primary DoDelectronic reliability researchand development agency sincereliability was recognized asan engineering discipline. Ithas twice received the annualaward of the IEEE ReliabilitySociety, in 1974 and in 1998.

One of the first publicationsconsidering reliability as anengineering discipline wasReliability Factors for GroundElectronic Equipment,published by RADC in 1955.It’s author was Joseph J.Naresky (see biography onpage 5), who created thelaboratory’s reliabilityprogram and developed it froma personal commitment to amulti-faceted effort by about100 specialists.

Another of the first reliabilityefforts of RADC was thepublication of reliabilityprediction models. An RCAdocument, TR 1100, became thebasis of an RADC technicalreport which was the firstmilitary reference on failurerates of electronic components.A series of RADC ReliabilityNotebooks provided updates,until RADC became thepreparing activity for MIL-HDBK-217, ReliabilityPrediction of ElectronicEquipment. Prediction modelsfor electronic parts became thespecialty of Lester Gubbins and,on his retirement, SeymourMorris.

The laboratory also producedreferences on failure rates fornon-electronic parts, theprediction and demonstration ofmaintainability, and the use ofBayesian statistics forreliability demonstration. Itwas one of the first to developthe concept of systemeffectiveness, a figure of meritcombining availability,reliability, and capabilitymeasures into a single measureof the overall worth of asystem to its user. Early studieson operational influences onreliability foreshadowed more

recent developmentof a means fortranslating desired

operational parameters intoreliability requirements. Theoriginal team created byNaresky for reliabilitytechnology development was afour man group: EdwardKrzysiak (group leader),Anthony Coppola, Anthony D.Pettinato, Jr., and John Fuchs.As the number of reliabilitypersonnel grew, statisticalstudies in reliability wereperformed by variousindividuals and groups underthe overall direction of DavidF. Barber and his successorAnthony J. Feduccia, divisionchiefs in a directorate headedby Naresky.

In 1961, the laboratory began tocreate facilities for research inreliability physics. These

Volume 6 Number 1

1-888-RAC-USER (1-888- 722-8737)

Inside this IssueISSRE ‘97................................ 7Calls for Papers...................... 9Call for Memories ................... 9New from RAC ....................... 10Industry Brief ......................... 12From the Editor ...................... 13Letter to the Editor................. 16Some Observations on Demonstrating Availability. 17Mark Your Calendar ............... 20Help from RAC....................... 22RAC Order Form................... 23

Page 2: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

2 RAC Journal, Volume 6, No. 1

facilities included a variety ofequipment useful for analyzingthe cause of failures inmicroelectronic devices. Thesehave been used to isolate andcorrect problems in operationalAir Force systems such as theMinuteman missile. In 1962,the laboratory sponsored thefirst symposium on reliabilityphysics, which has continuedunder IEEE sponsorship as theInternational ReliabilityPhysics Symposium (IRPS).

With knowledge gained froman in-house thin film andmonolithic microcircuitmanufacturing facility createdby Richard Nelson, the centerproduced RADC Exhibit 2867,Quality and ReliabilityAssurance Procedures forMonolithic Microcircuits. Thisdocument was the directancestor of MIL-STD-883, TestMethods and Procedures forMicrocircuits, the foundation ofboth military and commercialmicrocircuit quality assurance,and of MIL-M-38510, thegeneral specification formicrocircuits under themilitary qualified partsprogram, both authored bycenter personnel. Thelaboratory later created MIL-I-38535 General Specificationfor Integrated Circuits(Microcircuits) Manufacturing,MIL-H-38534 GeneralSpecification for HybridMicrocircuits, and MIL-STD-1772 CertificationRequirements for HybridMicrocircuit Facility and Lines,which are the basis of thecurrent dual-use qualifiedmanufacturers system.

The laboratory has alwaysbeen heavily involved in thereliability and qualityassurance of new technologies.John Farrell, for example,chaired the VHSIC (VeryHigh Speed Integrated

Circuits) qualificationcommittee. More recently,Daniel Fayette led a programfor multichip module (MCM)reliability and performanceassessment. Hybrids, MMIC(Microwave MonolithicIntegrated Circuits), solid-state Transmit/Receivemodules, and most recently,Microelectromechanical (MEM)devices have all benefited fromlaboratory programs.

Noted leaders of thelaboratory’s microcircuitreliability and qualityassurance activities throughthe years include JosephVaccaro, Joseph Brauer,Edward P. O’Connell, AlTamburrino, Regis Hilow,Charles Messenger, and Dr.Robert Thomas, among others.In 1994, both the statisticalstudies and reliability physicswere integrated into theElectronics ReliabilityDivision under EugeneBlackburn. The new divisioninstituted a program forintegrating diagnostics formultichip modules permittingefficient chip to systemtestability.

One of the first pictures showingphysics of failure phenomena, this 1960sRADC photo shows the beginnings of

corrosion of aluminum circuitinterconnections.

As reliability studies began inthe 1950s, the laboratory alsobegan studies into electro-magnetic compatibility (EMC)which resulted in, among otherthings, models used foranalyzing the EMC problems of

Air Force systems, such as theairborne command post, and the“HAVE NOTE” program, underwhich the laboratory tested avariety of Air Force weaponssystems for susceptibility toelectromagnetic interference.The laboratory also creatednumerous test sites in whichaircraft were mounted invertedon pedestals so that theirradiation patterns could bequickly and inexpensivelymeasured, in comparison to in-flight tests. From thesefacilities, RADC became knownas the keeper of the “upsidedown Air Force.” Air Forceaircraft ranging in vintage fromthe F-4 to the F-22, haveappeared on the test stands,and recent participants havebeen the Navy EA-6B Prowlerand some automotivecommunications systems. Otherrecent developments have beenthe use of infrared imaging tomap electromagnetic fieldswithout the perturbation ofconventional probingtechniques, and thedevelopment of an electro-magnetic performance monitor(EMPM) which can record theelectromagnetic environmentinside a system. Leading EMCand related studies through theyears were Samuel Zaccari,Robert McGregor and CarmenLuvera.

The laboratory has oftenapplied its knowledge ofadvanced technologies to createprototype equipment. In 1967,it contracted with GeneralElectric for the development ofMIRAGE (MicroelectronicIndicator for Radar GroundEquipment) a technologydemonstration model of adisplay having one-tenth thevolume of the UPA-35 radarindicators in the field, and 100times the reliability. RADCwas designated the manager fora joint Air Force -FAA

Page 3: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

RAC Journal, Volume 6, No. 1 3

procurement of the FYQ-47radar data digitizer, whichwas so superior in reliability tothe FYQ-40 it replaced that arequirement for back up units inthe field was rescinded. Morerecent activity included thedevelopment of the Time StressMeasurement Device (TSMD) tomeasure the environmentalstresses in localized areas of anequipment, and the integrationof the electromagneticperformance monitor (EMPM)into the TSMD. Thedevelopment of the RH-32RISC(Radiation Hardened 32 bitReduced Instruction SetComputer) is also a recentlaboratory effort. These effortswere, and are, performed byteams cutting across theseparate reliability offices atthe Laboratory.

RADC also pioneered the use ofcomputer aided design forreliability and the applicationof artificial intelligence as adiagnostic aid. AnthonyCoppola, lead a ReliabilityPanel of the Air Force ProjectForecast II in 1985. Leading ateam of reliability specialistsfrom various Air Force agencieshe proposed and evaluatednovel ideas for improving AirForce effectiveness. Coppola

also chaired a committee onartificial intelligenceapplications to maintenance fora 1984 study on reliability andmaintainability by theInstitute for Defense Analysis.Since then, led by DaleRichards, the laboratory hasbeen involved with thedevelopment of “SMART BIT”applying artificial intelligenceto fault detection in the JointSurveillance and Target AttackRadar System (JSTARS), andthe development of an R&MDesign Expert Systemintegrating reliability,maintainability andtestability analysis into asoftware tool capable ofimproving itself by learningfrom its experiences.

The laboratory ultimatelybecame the custodian of most ofthe DoD Specifications,Standards, and Handbooks onreliability andmaintainability. Many ofthese were written bylaboratory personnel recognizedas industry leaders in thespecialties involved, includingJerome Klion, EugeneFiorentino, et al. These expertsalso produced or technicallyguided the production of alibrary of technical references,such as a redundancy notebook,

a thermal design guide,reliability managementmanuals, a built-in-test designguide, and many others. One ofthe organization’s mostpopular products has been theReliability Toolkit, firstproduced in 1988 as the RADCReliability Toolkit, updated in1993 as the Rome LaboratoryReliability Toolkit, and lastproduced in 1995 as theReliability Toolkit:Commercial Practices Edition.The latter provides a guide forcommercial products andmilitary systems under theDoD acquisition reformpolicies. Preston MacDiarmid(now Director of TheReliability Analysis Center)initiated the series of toolkitsand Seymour Morris was theprinciple architect of theirevolution.

The laboratory has been aconsultant in reliabilityengineering to system programoffices since 1957, whenCoppola initiated support tothe acquisition of the 465-LStrategic Air CommandControl System. Later systemssupport was managedprincipally by Anthony D.Pettinato, Jr., and, on hisretirement, Bruce W. Dudley.Programs supported includedthe 412-L European AirWarning and Control System,414-L Over the Horizon Radar,425-L NORAD Command andControl System, 474-L BallisticMissile Early Warning System,440-L Over the Horizon Radar,and many other large systems.Laboratory personnel weresignificantly involved withJSTARS which received itsbaptism of fire in the Gulf War.Laboratory reliabilityengineers also supportedvirtually all Air Force groundradar programs and a host ofother smaller equipmentacquisitions.

Lee Lopez of General Electric and Capt. Edwin Deady of RADC compare the prototype MIRAGEdisplay, on table, to the current (1967) standard UPA-35, right.

Page 4: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

4 RAC Journal, Volume 6, No. 1

Laboratory personnel were alsoasked to support developmentsby the Wright Laboratory (theSPN/GEANs navigationsystem and the ElectronicallyAgile Radar) and procurementsby the Aeronautical SystemsDivision (the center won an AirForce Systems Command awardfor its performance in achievingunprecedented reliability inthe AN/ARC-164 commandradio, an ASD program).Center personnel were namerequested for a series ofcommand level “tiger teams”convened in the early 1970’s toreview reliability progress onavionic systems, including theF-111 Mark II avionics suiteand the radars for the F-111, F-4 and F-20 aircraft. Air ForceDivisions using laboratoryreliability support includedthe Electronic Systems Division(ESD), The AeronauticalSystems Division (ASD), theSpace and Missile Systemsdivision (SAMSO) and theArmaments Division (AD), allsince renamed.

The FAA was a joint sponsorwith the Air Force for theAN/FYQ-40 and AN/FYQ-47radar digitizer developments,and was so impressed by thereliability expertise of thelaboratory personnel supportingthese, that it requested theirsupport for FAA radarprocurements. Among the FAAsystems supported under aninteragency agreement were theASR-8, ASR-9 and ASR-10airport radars and the ARSR-2,ARSR-3 and ARSR-4 long rangeradars. The United StatesBureau of Mines also obtainedlaboratory reliability supportunder an interagencyagreement.

Reliability support activitiesalso included advice onelectronic part selection andcontrol, based on the center-

produced MIL-STD-965, PartsControl Program. Support inthis area was led by JohnFarrell and provided to the F-16 and Advanced MediumRange Air To Air Missile(AMRAAM) programs, amongothers.

In 1961, the center initiated anelectronics parts datarepository, to collect, analyzeand distribute informationabout the reliability ofelectronic parts. The initial in-house facility soon became aDoD asset called theReliability Analysis Center(RAC) which is now one of 13information analysis centers(IACs) funded by the DefenseTechnical Information Center(DTIC). Technical direction toRAC is still provided by thelaboratory, specifically byRichard Hyle, DTIC’sContracting Officer’s TechnicalRepresentative (COTR).

The laboratory has alsopioneered the development ofsoftware engineering tools,including metrics and asoftware engineeringframework, in anotherlaboratory directorateconcerned with software andnot administered by Nareskyor his successors. Principlecontributors in these were AlSukert, Samuel Dinitto andAndrew Chruscicki . Thesesoftware specialists andNaresky’s reliabilityspecialists, notably Fiorentino,collaborated on developingmodels for combining hardwareand software reliabilitypredictions.

Many of the traditionalreliability activities, such asmaintenance of thespecifications and standards,are being transitioned to othergovernment agencies (althoughthe organization recently

produced a newmaintainability handbook inresponse to DoD acquisitionreform requirements forreferences providing technicalguidance rather thanmandating procedures).

The new InformationDirectorate, in fiscal year 98, iscompleting all commitments inthe area of reliability sciencesand transitioning activitieswhere appliable towardsresearch and development ininformation technologies. Foresample, the experiences andknowledge obtained whileworking in the field ofreliability are now beingapplied to development ofcomputer aided design (CAD)tools and techniques for micro-electro-mechanical (MEM)devices; design and advancedpackaging of wafer scale signalprocessors; design anddevelopment of adaptable re-configurable computers andmodeling and simulation ofinformation systems. Afterfiscal year 98, all reliabilityrelated tasks will be referred toother governmentorganizations. “RomeLaboratory” has had a longheritage in the area ofreliability sciences andaccomplished many significantprojects to advance the field ofreliability. The efforts of“Rome Laboratory” personnelwil be missed but their legacywill go on.

For more information ontechnical activities at theInformation Directorate of theAir Force Laboratory at theRome Research Site, contactJohn Bart, AFRL/IF, 36Electronic Parkway, Rome NY13340. Tel: (315) 330-7701. E-mail: [email protected]. TheDirectorate has a web site atURL: http://www.rl.af.mil.

Page 5: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

RAC Journal, Volume 6, No. 1 5

Joseph J. Naresky: Reliability Pioneer

Joseph J. Naresky, thearchitect of reliabilityengineering research anddevelopment at the then RomeAir Development Center(RADC) in 1955, was a leadingfigure in reliability engineeringuntil his accidental death inJuly, 1982.

Mr. Naresky’s accomplish-ments in reliability began whenhe produced the first generalreference manual for reliabilityengineers, Reliability Factorsfor Ground ElectronicEquipment. The document waswidely used, including aChinese translation by thePeople’s Republic of China.Mr. Naresky also served on theAdvisory Group for Reliabilityof Electronic Equipment, whosereport in 1957 launchedreliability as an engineeringdiscipline throughout theDepartment of Defense.

From these pioneering effortsand a four man reliabilitygroup he founded at RADC, hedeveloped a programrecognized for its significantcontributions and the largestconcentration of reliabilityspecialists in the Departmentof Defense. In 1980, RADC wasawarded the Air ForceOutstanding Unit Citation forits reliability contributions.Mr. Naresky’s leadershipearned him Sustained SuperiorPerformance Awards in 1959and 1966, and the Air ForceDecoration for ExceptionalCivilian service, the highestaward that can be given to anAir Force civilian, in 1968. Healso received the IEEEReliability Society Award in1974 and was elected an IEEEFellow in 1976.

Mr. Naresky served with theU. S. Army Air Corps duringWorld War II. After the war,he was employed by theArmy’s Watson Laboratories,Red Bank, NJ, which wastransferred to Rome, NY as theRome Air Development Centerin 1951. While at RADC, heearned a Bachelor’s degree inPhysics and Master’s Degrees inboth Electrical Engineering and

Engineering Administrationfrom Syracuse University.

Mr. Naresky was a member ofSigma Pi Sigma, a physicshonor society, the IEEEReliability Society,Engineering ManagementSociety and ElectromagneticCompatibility Group, and theAIAA Committee onReliability andMaintainability. He waselected an Associate Fellow ofthe AIAA. He served asPresident of the ReliabilitySociety, a member of themanagement committee for theAnnual Reliability andMaintainability Symposium,and U. S. representative on theInternational ElectrotechnicalCommission TechnicalCommittee 56 (Reliability).He was also President of theRome Rotary Club in 1967 and1968.

In 1979, he retired fromgovernment service and joinedthe IIT Research Institute,operator of the ReliabilityAnalysis Center. His lastproject before his untimelydeath three years later was tocomplete the ReliabilityDesign Handbook, now MIL-HDBK-338.

Reliability Through the Years at the Air Force Rome, NY Facility: A Sampling

1951 Army WatsonLaboratories becomes Rome AirDevelopment Center (RADC)

1955 Reliability Factors forGround Electronic Equipmentpublished

1956 Four man reliabilitygroup organized

1957 Reliability supportprovided to “Big-L” Systemprograms

1958 RADC Exhibit 2629uses parts count to establishreliability requirements andsequential tests for verification

1959 MIL-R-26474,Reliability Requirements for

Production Ground ElectronicEquipment, prepared

1960 Microelectronics testingfacility established

1961 Requirements for anelectronic parts data repositoryestablished (led to creation ofRAC)

Page 6: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

6 RAC Journal, Volume 6, No. 1

1962 RADC sponsored firstreliability physics symposium

1963 Reliability datacollected on 412-L Air WeaponsControl System sites inGermany

1964 Nondestructivescreening techniques forelectronic parts developed

1965 RADC supportedWeapons Systems EffectivenessIndustry Advisory Committee

1966 First compendium ofstorage failure rates forelectronic parts published

1966 RADC Specification2867 established screeningrequirements for integratedcircuits (ancestor of MIL-STD -883)

1967 First microelectronicpackaging handbook published

1968 Minuteman integratedcircuit failures analyzed

1969 Tests of plasticencapsulated integratedcircuits begin

1970 Reliability supportprovided to F-111 Mark IIavionics development

1971 “Tiger Team” reviewsof avionics systems reliabilityperformed by command request

1972 Antenna measurementfacility established (start of“upside down Air Force”)

1973 F-16 AvionicsReliability Review Teamchaired by RADC

1975 NonelectronicReliability Notebookpublished

1976 Guidelines forapplication of warrantiesformulated

1977 Liquid crystals used totest and analyze failures inlarge-scale integrated circuits

1978 Design guide for built-in-test published

1979 Reliability andMaintainability ManagementManual published

1980 Artificial intelligenceapplications to testabilityidentified

1981 Bayesian reliabilitytest procedures developed forrepairable equipment

1982 Parts derating guidepublished

1983 RADC sponsors AirForce Academy development ofBayesian tests for one-shotsystems.

1984 Design considerationsfor fault tolerant systemsidentified

1985 “Smart BIT” conceptsformulated

1986 Guide to electronicstress screening (ESS) published

1987 Finite element analysisapplied to surface mountedpackage

1988 RADC ReliabilityEngineer’s Toolkit published

1989 MIL-I-38535 GeneralSpecification for IntegratedCircuits (Microcircuits)Manufacturing, and MIL-H-38534 General Specification forHybrid Microcircuits published

1990 RADC becomes RomeLaboratory

1990 Thermal analysisprogram written for use onpersonal computers

1991 Guide to finite elementanalysis published

1992 Quantitativeevaluation made ofenvironmental stress screening(ESS) effectiveness

1993 Rome LaboratoryReliability Engineer’s Toolkitpublished

1994 Techniques for on-waferreliability testing ofmicrowave monolithicintegrated circuits (MMIC)developed

1995 Reliability Toolkit:Commercial Practices Editionpublished

1996 Integrated Diagnosticsfor Multichip Modules (MCM)developed

1997 Rome Laboratorybecomes the Air Force ResearchLaboratory InformationDirectorate at the Rome site

1997 CAD formicroelectromechanical(MEMs) devices started

1998 Organization receivessecond IEEE Reliability Societyaward

A complete history of theorganization now designatedthe Air Force ResearchLaboratory InformationDirectorate at Rome will beavailable in 1998. Contact:Thomas W. Thompson,Chief History Office,AFRL/IFOIHO, 36 ElectronicParkway, Rome, NY 13440.Tel: (315) 330-2757.

Page 7: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

RAC Journal, Volume 6, No. 1 7

ISSRE ’97By: Ellen Walker, Attendee

ISSRE ’97, the 8th InternationalSymposium on SoftwareReliability Engineering, washeld Nov 2-5, 1997 in beautiful,sunny and warm Albuquerque,New Mexico. It was attendedby more than 150 participantsfrom government, industry, andacademia with a strongshowing from thetelecommunications industry,and from the researchcommunity. Approximatelyone-third of the attendees werepresenters. Other attendeeswere primarily leaders inquality or testing departmentsof their respectiveorganizations. Of the 40 ormore presentations given inthree days, 80% addressedresearch in SoftwareReliability Engineering (SRE)with only a small numberaddressing actual experience.

The opening keynote addresswas given by Dieter Rombach,Director of the FraunhoferInstitute for ExperimentalSoftware Engineering,Kaiserslautern, Germany andwas titled “Inspections andTesting: Core Competence forReliability Engineering”. Thetalk focused on the use ofsystematic inspections for earlydefect detection and the use oftesting for reliabilityassessment and prediction.Much of what was said is alsoproposed in the “cleanroom”approach to softwaredevelopment and embodiessome of the principles of totalquality management althoughthe “TQM” term was notmentioned.

His address was followed by akeynote titled “Launching

Automated SRE CompanyWide “ given by James Tierneyfrom Microsoft, a co-sponsor ofthe conference. (“Automated”refers to the automation oftesting and measurement.)Terney claims that SRE hashad great success at Microsoftand that improved customerusage data and usefulpredictions of ship dates havebeen invaluable. He laterstated however that adoptionof SRE is closer to 50% than100% - leaving theinterpretation of “great success”to be pondered.

Tuesday’s Keynote, “SoftwareReliability in Theory andPractice” was given by LarryDalton, Manager, HighIntegrity Software SystemsEngineering, Sandia NationalLaboratories, Albuquerque,NM. His observation is that insurety critical applicationssuch as nuclear weapons control,software-based systems are tobe avoided, and if unavoidable,then “expect the unexpected”and make provisions to protectagainst it. He describeddousing an aircraft (loadedwith a nuclear weapon) with1000 gallons of jet fuel andsetting it on fire as a test todetermine behavior of thedetonator in the “unexpected”realm. He ended his keynotewith Dalton’s Axioms forReliability in Theory andPractice:

Specify the RIGHT THING

Construct the THING RIGHT

The THING may fail, soreduce the consequences

This is not unlike the phrase“Do the Right Thing Right theFirst Time” which has beenspoken so often in total qualitytraining courses given in thelast ten years. It is not unique tosoftware engineering.

PresentationsUnderlying threads of thepresentations ranged from “Yes,you can test reliability intoyour software” to “You need todesign reliability in because bythe time you get to test it’soften too late or too expensiveto test it in.” There was generalconsensus that you need a wayto determine when the softwareis reliable and that isaccomplished only throughtesting and tracking failuresthroughout the testinginterval. Although muchdiscussion centered around howto structure the testing, all theexperts seemed to be inagreement that fault intensityis the primary metric andshould be measured relative tothe testing interval in terms oftime or natural units. While itis difficult for many softwareengineers to accept thisparadigm as relevant forsoftware, it is neverthelessasserted to be necessary fordetermining softwarereliability.

In most sessions, regardless ofthe main emphasis of the topic,some form of reliability growthmodeling was used to determinewhen the software would meeta pre-determined reliabilityobjective.

Another reoccurring thread inmany of the presentations wasthe concept of an “operational

Page 8: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

8 RAC Journal, Volume 6, No. 1

profile” driving the testing. Inan organization adaptingcleanroom methodology thismight be called a “statisticalusage probability distribution”.If the organizational culturehas roots in TQM, this might bereferred to as focusing on thecustomer. In essence it reflectshow the software is used ratherthan how it is built. Theprofile weights functionalityaccording to frequency of use orcriticality of the function.Weighting is then used todetermine the quantity andtypes of test cases that willcomprise testing, insuring thatthe most important or mostserious errors will be detectedearly in the test cycle.

Major ChallengesThere were very few sessionsthat addressed softwarereliability relative to systemreliability. The thrust of onesession was that trueassessment of softwarereliability is very complex andwe really can’t have muchconfidence in the results givenour current technology usage, soassume the worst and ensurethat the system as a whole cantolerate an uncertain softwarereliability element.Determining what softwarereliability number to plug intothe system reliability equationis perhaps still a mystery.

Software integration and itsimpact on reliability wasaddressed. Software productsare no longer used in isolation.They must work with amultitude of other softwareproducts from competingvendors. Who is responsible forthe reliability of theintegration? Customers areoften left in a vacuum becauseno one owns the integration. Noone is responsible for the bigpicture. Each vendor accounts

for what they perceive to betheir part of the big picture.Standards for interoperabilityare becoming more important.

Concerns of ConstituentsSome participants wereexpecting to hear aboutpractical experiences inpredicting and measuringsoftware reliability insafety/surety criticalapplications. None werepresented. A gentleman whoworks for a company thatmakes pacemakers seemed veryconfused by the reference to theoperational profile and acomment that one can oftennegotiate reliability objectiveswith the customer.

Few, if any, presentationsdelineated how a specificreliability objective was setand then actually tracked toattainment. It is not clearwhether this was due to thereviewing process, or simply toa lack of papers addressingactual implementation ofsoftware reliabilityengineering.

Concluding ThoughtsThroughout the conference Ifound myself relating topicsunder discussion to anunderlying question posed byguest speaker Gerry Weinberg,renown software developmentconsultant and author of“Software Reliability, WhyAren’t We Doing What WeKnow How To Do? “ He askedthe question “Why don’t webuild reliable software?” ofconference attendees and got aplethora of excuses that boileddown to the following:

• The business area is notaligned with thetechnical area.

• External issues forceorganizations to release

software before it isready.

• Software Engineers do nothave equal status withhardware or systemengineers and theirrecommendations are nottaken seriously, or theyare not consulted at all.

• It takes too much time todo the detailedrequirements definitionand testing needed toensure the software isreliable. If we did thatwe would never makedeadlines, and ourproducts would cost toomuch.

• There is little confidencein the reliability metrics.Why waste timemeasuring if the resultshave little value to thedecision-makers?

As I reviewed in my mind thevarious strategies and researchpresented I discovered thatvery little of what waspresented is “new”. We reallydo know how to test. We knowhow to gather requirements,and who to communicate within requirement definition. Weknow that our organizationalculture impacts what weactually do versus what weknow we should do. Technologyis providing us with newchallenges so fast that we arecaught up in it and the basicprinciples of softwareengineering often go by thewayside.

Is ISSRE ’97 simply an instanceof IEEE “preaching to thechoir” in that it is attendedprimarily by the researchcommunity and not by softwarepeople with real worldproblems seeking real worldsolutions? Is there anotherforum for addressing practicalsoftware reliability issues than

Page 9: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

RAC Journal, Volume 6, No. 1 9

this conference? Or is it simplyreflecting the possibility thatproportionately few softwareengineers are concernedspecifically with softwarereliability? Is softwarereliability engineeringactually being practiced by thesoftware community at large?Do the “real world” softwarepeople use other names forsoftware reliability issues?

In closing, in spite of the concernover attendance, and the factthat the conference wasresearch focused, I came backwith a sense of personalresponsibility to my role as asoftware engineer that I did nothave before the conference. Isimply have to start “doingwhat I know how to do”. Iwonder what the impact wouldbe if every attendee was calledto action in the same way?

About the Author:Ellen Walker is a RACspecialist in softwarereliability. She holdsBachelor Degrees inMathematics and ComputerScience and a Masters Degree inManagement. In a twelve-yeartenure as a computer scientist,she has worked with allphases of the softwaredevelopment cycle andsupported both engineeringservices and business processes.She has been a facilitator andtechnical consultant for severallong term quality initiatives,and is an Examiner for NewYork State’s Excelsior (QualityAwards) program.

Calls for Papers1998 Military/ Aerospace(Transportation) COTSConference, to be heldAugust 26-18, 1998 inAlbuquerque, NM, seeks paperson assuring the highest quality,availability, reliability andcost effectiveness in micro-electronic technology and itsinsertion into high per-formance, affordable systems.Authors are requested to submitfive copies of a one-pageabstract by May 31, 1998 to:

Edward B. HakimThe Center for Commercial Component Insertion, Inc.2412 Emerson Ave.Spring Lake NJ 07762.Tel. and Fax: (732) 449-4729.E-mail:

[email protected]

The International Journalof Quality and ReliabilityManagement (IJQRM) isseeking papers that focus on thepractice of quality, reliabilityand maintainability. IJQRM isa major journal that will beentering its 15th year ofpublication. Authors shouldsubmit three copies of thearticle to the North AmericanEditor:

Professor Christian N. MaduDept. of Management andManagement ScienceLubin School of BusinessPace University1 Pace PlazaNew York, NY 10038Tel: (212) 346-1919Fax: (212) 346-1573.E-mail: [email protected]

The 69th Shock andVibration Symposium to beheld October 12-16, 1998 inMinneapolis/St. Paul,Minnesota is seeking two

categories of papers: fullpapers to be published in thesymposium proceedings andshort topics to be discussed atthe conference but notpublished. Abstracts are due byMay 4, 1998. Submittal detailswere not available at receipt ofthis announcement, but will beposted on the Shock andVibration Information AnalysisCenter (SAVIAC) website atURL:http://saviac.xservices.com,and published in future issues ofthe SAVIAC CurrentAwareness Newsletter. Forsubscription to the latter, mailrequest to SAVIAC, 2231Crystal Drive, Suite 711,Arlington VA 22202, or Fax to(703) 412-7500.

Call forMemoriesIn December 1998, the IEEEReliability Society willpublish a special issue of theIEEE Transactions onReliability commemorating the50th anniversary of thefounding of the society. Theissue will include a featurepresenting interestingexperiences of practitioners inany of the assurance sciences.Your funny, poignant orhistorically relevant memoriesmay be submitted at any timeup to 1 August 1998.Reminiscences will not berefereed, but will be edited tofit the available space. Sendto the special issue editor:

Anthony CoppolaIITRI201 Mill StreetRome NY 13440Tel: (315) 339-7075Fax: (315) 337-9932E-mail:[email protected]

Page 10: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

10 RAC Journal, Volume 6, No. 1

New from RAC

New Catalog AvailableJust released is the new RAC Catalog of Productsand Services. Besides presenting the latest listingof RAC printed and software products and theirprices, the catalog provides descriptions andcontacts for:

• RAC consulting services

• RAC training courses

• The RAC Data Sharing Consortium

• Selected Topics in Assurance RelatedTechnologies (START), a set of introductorypamphlets available free from RAC

• RAC Journal subscriptions (also free) andadvertising rates (not free)

• The RAC website

The catalog includes descriptions of products still indevelopment. For example, computer text files forMIL-HDBK-338, Electronic Reliability DesignHandbook, and for MIL-HDBK-470,Maintainability Design Handbook, are described,priced, and labeled “available soon.”

The catalog is free on request to RAC. Use any ofthe contact data listed on the back cover of thisissue, or call (888) 722-8737.

Failure Modes and MechanismsReference UpdatedIn 1991, RAC published FMD-91, Failure Modes/Mechanism Distributions, providing componentfailure modes and their relative probabilities ofoccurrence as an aid to fault tolerant design, thepreparation of diagnostics, and failure modes,effects and criticality analysis (FMECA). A newdocument, FMD-97, adds about 50% new data tothat of FMD-91. The hardcopy document containsdata on electronic, mechanical andelectromechanical parts and assemblies. It isavailable at a cost of $100 for U.S. orders, and $120for Non-U.S. orders. The order blank on the insiderear cover of this issue may be used to obtain a copy.For more information, call (888) 722-8737.

FMD-97 will also soon be made available on CD-ROM or 3.5” diskette software packages for use onpersonal computers under Windows 95 or NT. Whenavailable, the software will include graphical user

interfaces and help files at an expected price of$100 for U.S. orders and $120 for Non-U.S. orders.Order code will be FMD-CD. Availability will beannounced in the RAC Journal.

Optoelectronics to be NextComponent Applications GuideSubjectAvailable in the Spring of 1998 will be ReliableApplications of Optoelectronic Devices, acompilation of information to assist optoelectronicequipment designers in enhancing the reliability oftheir products. It will provide criteria for selectionof devices, failure rate data, failure mode data,potential reliability concerns and otherinformation necessary to apply the part in areliable manner. The product will the latest in aseries of application guides (See Help From RAC,this issue). Ordering information will be presentedin the RAC Journal when the publication isreleased.

New Software Tool Will Aid COTSSelectionA software tool, Selection of Equipment to LeverageCommercial Technology (SELECT), will be offeredby RAC to quantify the estimated reliability andrisk of using equipment designed for relativelybenign environments in applications where theexpected stresses(temperature,vibration, shock,humidity, etc.)are considerablymore severe. Thetool wasoriginallydeveloped by IIT Research Institute (operator ofthe RAC) under contract to Rome Laboratory (nowcalled Air Force Research Laboratory InformationDirectorate - Rome Site). In addition to allowingrelative risk comparisons between differentcommercial off-the-shelf (COTS) equipment, thetool identifies the predominant environmental andprocess risk drivers of each equipment beingconsidered, quantifying the impact of designchanges and testing options on the risk scores andestimated reliability. Release is expected inSpring, 1998. Order code will be SELECT, and theprice will be $300 for U.S. orders and $340 for Non-U.S. orders.

SSelection ofelection ofEEquipment toquipment to

LELEverageverageCCommercialommercialTTechnologyechnology

CommercialCommercialEquipmentEquipment MilitaryMilitary

ApplicationsApplications

Page 11: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

RAC Journal, Volume 6, No. 1 11

Page 12: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

12 RAC Journal, Volume 6, No. 1

Industry BriefISO 9000 in the NewsRecent news about ISO 9000, the family of Quality Management System international standards, includes:

• AS 9000, Aerospace Basic Quality System Standard, is now available from the Society of AutomotiveEngineers (SAE) as a guide to quality management in the Aerospace industry. Like QS 9000, theAutomotive Quality System Standard, it contains a verbatim citation of ISO 9000 provisions andindustry specific additional requirements and notes. For more information on AS 9000, Contact SAE, 400Commonwealth Drive, Warrendale PA 15086-0001. Tel: (412) 776-4970.

• Despite recent statements that the automotive sponsors of QS 9000 would discontinue the verbatimcitation of ISO 9001, present plans are to retain the ISO standard.

• ISO plans to “integrate” ISO 9000 and ISO 14000, Environmental Management System, apparently do notintend a merger of the documents, but rather the creation of the capability to audit a site for complianceto either or both in a single visit.

SLDA Coolers Developed Through SBIR ProgramSemiconductor laser diode arrays (SLDAs) require a high degree of thermal stability to maintain specifiedlaser beam wavelengths and optimum efficiency. However, they generate large amounts of heat which must beremoved, and existing cooling technology has been inadequate. Under a Small Business Innovation Research(SBIR) program sponsored by the former Air Force Phillips Laboratory, Saddleback Aerospace, Los Alamitos,CA, has developed water-cooled heat sinks using microchannel coolant flow passages . One of the prototypeunits exhibited the lowest thermal resistance ever reported for these types of devices. Saddleback is nowmanufacturing units in a variety of shapes, sizes and materials.

MTIAC Supporting Internet Product News NetworkThe Manufacturing Technology Information Analysis Center (MTIAC), one of three Defense TechnicalInformation Center (DTIC) information analysis centers operated by IIT Research Institute, is providingmanufacturing expertise to assist the Thomas Magazine Group in the launch of it’s new commercial internetventure, the “Product News Network.” MTIAC is developing hierarchical nomenclatures for six categories ofindustrial products: adhesives and sealants; data acquisition; machine tools; quality control; automaticidentification systems; and plant maintenance and equipment. The Product News Network is intended to be anon-line source of up-to-date product information. The Thomas Magazine Group is the publisher of the “ThomasRegister of Manufacturers.”

ADPA + NSIA = NDIAOn March 1, 1997, the American Defense Preparedness Association (ADPA) and the National SecurityIndustrial Association (NSIA) merged. On October 1, 1997, the resulting organization was officially named TheNational Defense Industrial Association (NDIA). The association headquarters is located at 2111 WilsonBoulevard, Suite 400, Arlington VA 22201. Tel: (703) 522-1820. Fax: (703) 522-1885. E-mail:[email protected].

DTIC Offers IAC DirectoryThe Defense Technical Information Center (DTIC) is offering a Directory of the DoD and service sponsoredInformation Analysis Centers (IACs). For a copy contact: Mr. Ron Hale, IAC Program Management Office,Defense Technical Information Center, 8725 John J. Kingman Road, Suite 0944, Ft. Belvoir VA 22060-6218. Tel:(703) 767-9171. Fax: (703) 767-9119. E-mail: [email protected]. The directory can also be downloaded from theIAC hub page on the DTIC webesite: http://www.dtic.mil/iac/. An Adobe reader is needed.

The appearance of advertising in this publication does not constitute endorsement by theDepartment of Defense or RAC of the products or services advertised.

Page 13: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

RAC Journal, Volume 6, No. 1 13

From the EditorThe Quality Troika

How often has been heard the complaint thatschedule and cost are the enemies of quality (orreliability, maintainability, testability, logisticsplanning, etc.)? I would like to suggest, rather,that cost and schedule are appropriate and validmeasures of quality. David Garvin once identifiedfive categories of quality measures, includingvalue-based measures and user-based measures.Value-based measures integrate “goodness” and thecost involved (Phil Crosby’s “cost of quality “would be a value-based quality measure). User-based measures try to quantify the ability tosatisfy the customer. I submit all measures mustultimately be user-based (no one else’s opinionreally counts) and that cost and schedule arecustomer measures of quality, just as important asfreedom from defects or features of the product.

For example, when in the market for a car, wouldyou consider a Rolls-Royce? It has a reputation forunsurpassed engineering, elegant appointments, andfreedom from defects, making it one of the highestquality automobiles in conventional quality(“goodness”) terms. Yet the average personconsiders the cost of an automobile important, andthe Rolls-Royce does not have the property ofbeing priced low enough for consideration by mostcar buyers. I contend this demonstrates cost isindeed a quality factor.

How about schedule as a quality factor? If youpicked a new car to replace the one you have, buthad to wait for a year before it was delivered,wouldn’t you pick another to buy? Perhaps youwould wait, but I suspect you would think long andhard about it. QED: quality includes timelydelivery.

My conclusion is that measures of “goodness” joinwith measures of cost and measures of schedule toform a “quality troika,” the three legs of whichmust all be considered in making decisions. This isessentially a premise of total quality management,(TQM), where the word “total” implicitlyrecognizes the existence of the troika.

So where does this lead? First, we need to measureour efforts in three dimensions. Among all thedifferent parameters, we can perhaps select majorfactors for “goodness,” cost and schedule and usethis troika as an overall indicator of ourperformance. Continuing our automotive example,

an automaker might measuredefect rate, sticker price andtime from order to delivery.With this, or an equivalent set of three measures,he could track his progress and benchmark hisperformance against the competition. These threeparameters, and many others, are certainly usednow by the automotive industry, but I am suggestingthat a troika of the top three goodness, cost andschedule parameters may have value as an overallfigure of merit. Also, each component of anorganization can select their own appropriatetroika as their metric for overall performance. Andthe recommendations of the assurance specialistsshould include a three-dimensional estimate ofimpact, using an appropriate quality troika.

A characteristic of the quality troika is that anychange which improves one leg without adverselyaffecting the other two is always a winner. Manyprocess improvements fall into this category, andpay off handsomely. It is the case where animprovement in one leg hurts another that requiresa trade-off analysis. In the past, the relativelyless visible “goodness” parameters suffered whenthey crossed cost or schedule. In the current climatewhere quality (in terms of “goodness”) isrecognized as a customer demand, I thinkmanagement is more willing to listen. However,it’s up to the analyst to have his facts ready toclearly show the trade-off involved, or, better, tofind a way where no leg of the troika suffers a loss(TQM, again).

Considering cost and schedule as qualityparameters, the assurance specialist may begin tounderstand why his recommendations have notalways been accepted. More importantly, this maybe a way for the specialist to communicate betterwith the decision makers, and it can improve thevalue of his recommendations. Bettercommunication of better recommendations shouldimprove his success ratio.

Anthony Coppola, Editor

What do you think? Letters to the editor on this, orany other topic, are always welcome. Those ofgeneral interest may be printed, unless the authorrequests otherwise.

Page 14: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

14 RAC Journal, Volume 6, No. 1

Page 15: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

RAC Journal, Volume 6, No. 1 15

Page 16: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

16 RAC Journal, Volume 6, No. 1

Letter to the Editor Dear Sir,Reference “FMEA - A Curse or Blessing” by S.J.Jakuba, in the RAC Journal, Volume 5, Number 4, Iwould like to make the following comments:

The article was very well developed and written,and I feel it hit the right note regarding theimpressions the subject gives to both analysts,designers, and other personnel normally exterior toreliability, who regard the usefulness of a FMEAwithout any real understanding of its aims andachievability.

In discussing the uses of a FMEA Stan Jakuba doesnot address the uses of a FMEA when an item ofequipment or a system has to be modified or is anexisting item (off-the-shelf) where no previous FMEA has been developed or produced. In suchcases one of the main uses of the FMEA is to definethe types, numbers, frequencies and safety relatedhazard quantifications, so that the information can

be used in developing the Logistics SupportAnalysis for Maintenance Planning. It alsoprovides the basis for Life-Cycle Costoptimization, and is one of the most importantbuilding blocks in the development of a supportplan.

Nevertheless, in his appraisal of the design anddevelopment of systems he has put his finger on themost misused and misunderstood analysis, which ifimplemented and used in the manner he hassuggested will benefit all areas of system/equipment development.

Currently in the European environment mostweapon platform acquisition programmes utilize amixture of existing/modified/new developmentequipments or a system comprising existing/modified items with new software. With softwareand software/hardware configurations, existingFMEA methods do not suffice, but as Stan points out,it is the manner in which the FMEA is approached,the timing and how it is assessed, that is mostimportant, and which if implemented correctly

will provide the most rewards.

However, what must be clearlydefined in each programme is why theFMEA is to be developed, and whatare the aims of the FMEA. Mostpeople involved in anysystem/equipment developmentprogramme fail to lift their heads outof the water and remind themselveswhat they are trying to achieve, andwhy they are doing such tasks.

On the matter of training, Stan has itjust right; experience, team effort, andteamwork are the only answer. Toooften the team leader thinks that therest of the team do the work andhe/she is the team! Education is onepossibility, but the real answers arefar reaching, and go well beyond thesubject of quality and reliability.

Please, let’s have more articles likethat of Stan Jakuba’s. The morepeople that read and digest sucharticles, the better are the chances ofimproving the quality of ourprogrammes.

Yours very truly,Stewart AllenLogistics Executive Director,Philotech GmbH

Page 17: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

RAC Journal, Volume 6, No. 1 17

Some Observations on Demonstrating AvailabilityBy: Anthony Coppola

Availability is easy to measure. It is also easy tomeasure MTBF and MTTR and calculate inherentavailability by the formula:

Ai =

MTBFMTBF MTTR+

The problem is in assigning risks. For example,suppose we wanted an availability of 95%. If wehad one failure in 100 hours and it took one hour torepair, we have measured an availability of 99%,well over spec. However, we would be basing theconclusion on one failure and one repair, whichwould be quite risky. We would feel moreconfidence if we had more data (i.e. more failuresand repairs). There are established procedures forcomputing confidence intervals around MTBFmeasurements and for confidence intervals aroundMTTR estimates. (confidence = 1-risk. A confidenceof 90% that a calculated MTBF or better has beenachieved means a 10% risk of being in error. In turna 10% risk of error means a probability of 0.10 thatthe true MTBF will be lower than the calculatedvalue.) As might be expected, the more dataavailable, the closer will the calculated values fora specified confidence be to the measured value.

The problem with availability is that there is noeasy way to compute confidence limits about ameasured availability.

Following is a procedure for calculating risks onavailability from confidence intervals on MTBFand MTTR.

First, collect data on MTBF and MTTR. (MTBF isoperating hours/failures: MTTR is downtime/failures not counting administrative andlogistics delays). The data is collected to the endof the last repair so that the operating time doesnot extend beyond the time of the last failure.

Then use the Chi-square procedure described belowto calculate MTBF and MTTR at a desired risk. Weshall be using 10% risks, and calculating a value ofMTBF for which there is only a .10 probabilitythat the true MTBF will be lower, and a value ofMTTR for which there is only a .10 probabilitythat the true MTTR will be higher. The resultantvalues are an MTBF and MTTR demonstrated to a90% confidence. (Note: this method assumes theexponential distribution applies to both timesbetween failure and times to repair: there areprocedures for computing confidence intervals based

on other distributions, such as the log-normal,which is more commonly assumed for MTTR).

Selected values from a Chi-square table are:

Degrees offreedom ( = 2 x no.

of failures)Chi-square valueat tenth percentile

Chi-squaredvalue at 90th

percentile2 .211 4.614 1.064 7.786 2.20 10.648 3.49 13.36

10 4.87 15.99

12 6.30 18.5514 7.79 21.0616 9.31 23.5418 10.86 25.9920 12.44 28.41

22 14.04 30.8124 15.66 33.2026 17.29 35.5628 18.94 37.9230 20.60 40.26

40 29.05 51.8150 37.69 63.1760 46.46 74.4080 64.28 96.58

100 82.36 118.5

We then use the formulas:

MTBF =

2 x total operating timeChi - square value for 2 x no. of failures at 90th percentile

MTTR =

2 x total downtimeChi - square value for 2 x no. ofrepairs at tenth percentile

The MTBF formula gives the value of MTBF whichwe are 90% sure will be exceeded by the true MTBF(there is a .10 probability that the true MTBF isless that that calculated by the formula).

The MTTR formula gives the value of MTTR whichwe are 90% sure will not be exceeded by the trueMTTR (there is a .10 probability that the trueMTTR is grater than that calculated by theformula).

These values may be used to calculate anavailability at some risk determined by the riskson MTBF and MTTR. We shall discuss later whatthe risk on the calculated availability might be.

Example: for 1000 hours of operating time with 10failures which required 10 hours of repair time:

Page 18: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

18 RAC Journal, Volume 6, No. 1

Measured MTBF =

100010

= 100 hours;

Measured MTTR =

1010

= 1 hour.

Using the formulas given above:

MTBF =

200028 41.

= 70.4 hours

MTTR =

2012 44.

= 1.6 hours

The results are interpreted as showing only a 10%chance that the true MTBF will be less than 70.4hours, and that there is only a 10% chance that thetrue MTTR will exceed 1.6 hours.

Calculating availability by the formula A =MTBF/(MTBF+MTTR):

Measured availability =

100100 1+

=

100101

= .99

Availability using calculated MTBF and MTTR:

A =

70 470 4 1 6

.. .+

=

70 472

. = .978

If the desired availability were .98, the measuredresults would indicate achievement, but thecalculated results would show that when the MTBFand MTTR are calculated to a 90% confidence, theavailability is not demonstrated as beingachieved. More test data would be needed toresolve the issue.

The important, as yet unanswered, question is:what is the risk on the calculated availabilitynumber?

In attempting to answer the question, note that thecalculated MTBF will be exceeded by the trueMTBF with probability .90 and the true MTTR willbe less than the calculated MTTR with probability.90. Hence, the probability of both figures of meritbeing better than calculated is .90 x .90 = .81. Whenboth figures of merit are better than calculated, thetrue availability must exceed the value calculatedfrom the calculated MTBF and MTTR. So ourconfidence in the calculated availability (theprobability that the true availability will begreater than the calculated value) is at least 81%.(we shall show that it is more).Note that the probability of the true MTBF beingworse than that calculated is .10 and theprobability of the true MTTR being worse than that

calculated is also .10. Hence the probability ofboth values being worse is (.10 x .10 = .01). In such acase, the true availability must be worse than thecalculated value using the calculated MTBF andMTTR. Hence the risk on the calculatedavailability is at least 1% (we shall show it ismore) and the confidence (1 - risk) cannot exceed99%

The above cases cover 82% of all possibilities. Thisleaves 18% probability that the true value of one ofthe figures of merit (MTBF or MTTR) is better thanthe calculated value and the true value of theother is worse. In this case the true value ofavailability may be either better or worse than thecalculated availability. Unfortunately, themajority of the time, the true availability will beworse, rather than better than that calculated.This is because a slight deviation in the percentiletowards the tails of the distribution (the bad way)makes a bigger difference (in calculating MTBF orMTTR) than the same deviation towards the center(the good way). For example, The difference in the90th and 91th percentile values is greater than thedifference between the 90th and 89th percentiles.Hence, in less than half the cases will the trueavailability exceed the calculated value. So thecontribution of this 18% probable situation to theconfidence in the calculated value will be less than9%. The exact value will depend on how sharp thecurves are skewed and the ratio of MTBF to MTTR.

We can conclude that calculating the availabilityusing the 90% confidence values of MTBF and MTTRwill provide a confidence of something more than81% but less than 90% (81% + less than 9% asshown above) that the true availability willexceed the calculated value. This will often besatisfactory.

Using other values of confidence (e.g., 95%) for thecalculation of MTBF and MTTR will result in otherranges of confidence on availability. The aboveprocedure, and a more complete chi-square tablewill be needed. Note however, that the higher theconfidence demanded, the longer the test timeneeded to achieve it. Also note that all thisassumes that the measured availability (thatcalculated from the measured MTBF and MTTR)exceeds the specified availability. If themeasured availability is inadequate, theconfidence in any higher value must be less than50% (a poor bet).

Page 19: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

RAC Journal, Volume 6, No. 1 19

Page 20: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

20 RAC Journal, Volume 6, No. 1

Mark Your CalendarMay 18-22, 1998 Tuscon, AZ24th Annual Reliability Testing InstituteContact: Dr. Kececioglu, University of Arizona, Aerospaceand Mechanical Eng. Dept., 1130 N. Mountain Avenue,Bldg. 119, Room N517, PO Box 210119, Tuscon, AZ 85721-0119. Tel: (520) 621-6120. Fax: (520) 621-8191. E-mail:[email protected].

May 19-21, 1998 Adelaide, AustraliaMaintenance Engineering ‘98Contact: Secretariat, Maintenance Engineering ‘98, PO Box5142, Clayton Victoria 3168, Australia. Tel: 61 3 95440066. Fax: 61 3 9543 5905.

June 4, 1998 Pittsburgh. PANovember 5, 1998 Arlington, VASEI Visitor’s DaysContact: Customer Relations, Software EngineeringInstitute (SEI), Carnegie Mellon University, Pittsburgh, PA15213-3890. Tel: (412) 268-5800. E-mail: [email protected]. More info at website at URL:http://www.sei.cmu.edu.

June 8-10, 1998 Crystal City, VASEI Conference on Risk ManagementContact: Customer Relations, Software EngineeringInstitute (SEI), Carnegie Mellon University, Pittsburgh, PA15213-3890. Tel: (412) 268-5800. E-mail: [email protected]. More info at website at URL:http://www.sei.cmu.edu.

June 15-17, 1998 Dallas, TX10th Annual RMSLReliability, Maintainability, Supportability, andLogistics WorkshopContact: Professional Development Division, SAE, 400Commonwealth Drive, Warrendale PA 15096-0001. Tel:(724) 772-7148. Fax: (724) 776-4955. E-mail:[email protected].

June 23-25, 1998 Munich, GermanyFTCS-28: Fault Tolerant Computing SymposiumContact: Ram Chillarege, IBM Watson Research, 30 SawMill River Road, Hawthorne, NY 10352. Tel: (914) 784-7375. Fax: (914) 784-6267. E-mail: [email protected]

July 15-17, 1998 Tokyo, Japan5th ISPE International Conference on ConcurrentEngineeringContact: Prof. Shuichi Fukuda, Dept. of Production,Information and Systems Engineering, Tokyo MetropolitanInstitute of Technology, 6-6, Asahigaoka, Hino, Tokyo 191,Japan. Tel: 81 425 83 5111 x3605. Fax: 81 425 83 5119.E-mail: [email protected].

July 27-30, 1998 San Diego, CASeventh Applied Reliability Engineering andProduct Assurance Institute for Engineers andManagersContact: Dr. Kececioglu, University of Arizona, Aerospaceand Mechanical Eng. Dept., 1130 N. Mountain Avenue,Bldg. 119, Room N517, PO Box 210119, Tuscon, AZ 85721-

0119. Tel: (520) 621-6120. Fax: (520) 621-8191. E-mail:[email protected].

August 26-28, 1998 Albuquerque, NM1998 Military/Aerospace (Transportation) COTSConferenceContact: Edward B. Hakim, The Center for CommercialComponent Insertion, Inc., 2412 Emerson Ave., Spring Lake,NJ 07762. Tel: (732) 449-4729. E-mail:[email protected]

September 13-17, 1998 New York, NYInternational Conference on Probabilistic SafetyAssessment and ManagementContact: Dr. R. A. Bari, Brookhaven National Laboratory,PO Box 5000, Upton, NY 11973-5000. tel: (516) 344-5266.Fax: (516) 344-5266. E-mail: [email protected].

September 14-19, 1998 Seattle, WA16th International System Safety ConferenceContact: Clif Ericson, 18247 150th Ave SE, Renton, WA98058. Tel: (253) 657-5245. Fax: (253) 657-2585. E-mail:[email protected]

September 22-25, 1998 Ypsilanti, MIPractical Implementation of Probabilistic MethodsWorkshopContact: SAE Professional Development, 400Commonwealth Drive, Warrendale, PA 15096-0001. Tel:(724) 772-7148. Fax: (724) 776-4955. E-mail:[email protected].

October 6-8, 1998 Reno, NV20th Annual EOS/ESD SymposiumContact: ESD Association, 7900 Turin Rd., Bldg. 3, Suite 2,Rome, NY 13440-2069. Tel: (315) 339-6937.

October 26-29, 1998 Annapolis, MD20th Space Simulation ConferenceContact: Institute of Environmental Sciences &Technology, 940 East Northwest Highway, MountProspect, IL 60056. Tel: (847) 255-1561. Fax: (847) 255-1699. E-mail: [email protected].

November 5, 1998 Arlington, VASEI Visitor’s DayContact: Software Engineering Institute (SEI), CarneigeMellon University, Pittsburg, PA 15213-3890. Tel: (412)268-5800. E-mail: [email protected].

November 16-20, 1998 Tucson, AZ36th Annual Reliability Engineering &Management InstituteContact: University of Arizona, 1130 N. Mountain Ave.,Tucson, AZ 85721-0119. Tel: (520) 621-6120. Fax: (520)621-8191. E-mail: [email protected].

Page 21: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

RAC Journal, Volume 6, No. 1 21

Page 22: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

22 RAC Journal, Volume 6, No. 1

Help from RAC

Complementing the RACproduct in process on thereliable application ofoptoelectronics devices (seeNew From RAC, this issue) areseveral currently availablepublications on applying otherpart types and on general partsmanagement. Descriptionsfollow:

Reliable Application ofPlastic EncapsulatedMicrocircuits addresses theinterest in applying plasticencapsulated microcircuits(PEMs) in all environmentsincluding military andtelecommunication systemswith strict reliability demandsand harsh environments. Thisinterest stems from real andperceived improvements inPEM reliability, quality andcost. Thepublicationprovidesinformation onPEM partscontrol,procurement,failure modesandmechanisms, reliability andquality. A reliabilityprediction model for PEMs isalso provided.Order Code: PEM2Price: $75 U.S., $85 Non-U.S.

Reliable Application ofMultichip Modulesaddresses hybrid microcircuits,multichip packaging andmultichip modules (MCMs). Itprovides an overview of themethodology and tools to ensurethat these components andtheir vendors are selected,controlled and testedeffectively for the specific endproduct and usage, so that thefavorable weight and sizeadvantages of these devices

may be successfully exploited.Order Code: MCMPrice: $50 U.S., $60 Non-U.S.

Reliable Application ofCapacitors providesinformation on the failuremodes of thevariouscapacitor types,their observedfailure rates andspecific areas ofconcern whenselecting acapacitor. Italso containsquantitative relationshipsdescribing the change inelectrical parameters(capacitance, dissipation factorand equivalent seriesresistance) as a function of timeand temperature. A thoroughbibliography is provided.Order Code: CAPPrice: $50 U.S., $60 Non-U.S.

The first publication in RAC’sseries on the reliableapplication of parts was PartsSelection, Application andControl, an overviewdocument addressing topics of ageneral nature affecting alltypes of parts. It identifiesspecific selection andprocurement methods and toolsfor properly applyingelectrical and electronic partsmost suitable for the uniquerequirements of a givenapplication, military orcommercial.Order Code: PSACPrice: $50 U.S., $60 Non-U.S.

To help bridge the gap betweenthe use of militaryspecifications and standards,and new policies in acquiringmilitary equipment under theDoD acquisition reformprogram, RAC compiled a

handbook, Processes forUsing Commercial Parts inMilitary Applications. Thehandbook provides anoverview of the acquisitionreform process, its relationshipto system engineering functions,and its impact on reliabilityand quality issues. It alsoprovides guidance for a partsmanagement program under thenew groundrules, and casehistories to illustrate practicalexperience and “lessonslearned.”Order Code: PUCPPrice: $25 U.S., $35 Non-U.S.

Orders may be placed using theorder blank on the followingpage, or by telephone to (888)722-8737. Other RAC productsare listed in the RAC Productand Services Catalog, recentlyrevised. Ask for your free copy.

RAC on the Web

RAC maintains a home page onthe World Wide Web at URL:http://rome.iitri.com/RAC/.From there, the user can obtain:

• Current awarenessinformation: copies of theRAC Journal, START sheets,a calendar of events

• RAC products and traininginformation: catalog ofproducts, on-line order form,training schedules, availablecourses

• RAC services: RAC userguide, bibliographicsearches, technical support,databases

• Links to other DoDInformation Analysis Centers(IACs), and other reliability,maintainability and qualityinformation sources

Pay us a visit at:

http://rome.iitri.com/RAC/

ReliableApplicationof Plastic

EncapsulatedMicrocircuits

Reliability Analysis CenterRAC is a DoD Information Analysis Center

Sponsored by the Defense Technical Information Center

RAC Parts Selection, Application and Control Seri

ReliableApplicationof Capacitors

Reliability Analysis CenterRAC is a DoD Information Analysis Center

Sponsored by the Defense Technical Information Center

RAC Parts Selection, Application and Control S

C

ESR

RI

LL RL

Page 23: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

RAC Journal, Volume 6, No. 1 23

RAC Order FormSend to: Reliability Analysis Center, 201 Mill Street, Rome, NY 13440-6916

Call: (888) RAC-USER (722-8737), (315) 339-7047 Fax: (315) 337-9932

E-mail: [email protected]

Qty Title Price Each Total

Shipping & Handling:US Orders add $4.00 per book for first class shipments, ($2.00 for RAC Blueprints)Non-US add $10.00 per book for surface mail (8-10 weeks), $15.00 per book for air mail ($25.00 for NPRDand VZAP, $40.00 for EPRD). ($4.00 for RAC Blueprints)

Total

Name

Company

Division

Address

City State Zip

Country Phone Ext.

Fax: E-mail:

Method of Payment:

❒ Personal check enclosed

❒ Company check enclosed (make checks payable to IITRI/RAC)

❒ Credit Card # Expiration Date

Type (circle): AMERICAN EXPRESS VISA MASTERCARDA minimum of $25 is required for credit card orders.

Name on card:

Signature:

Billing address:

❒ DD1155 (Government personnel) ❒ Company purchase order

❒ Send product catalog

Page 24: the Journal of the First Quarter 1998 Reliability Analysis ...web.cortland.edu/romeu/RACJour1ST_Q1998.pdf · Reliability Analysis Center ... equipment useful for analyzing the cause

RAC Training Courses

Course Dates:

June 9-11, 1998

Course Fee:

$995

Location and On-SiteAccommodations:

Virginia Beach Resort Hotel &Conference Center

2800 Shore DriveVirginia Beach, VA 13451(757) 481-9000

System Software ReliabilityFeaturing hands-on software reliabilitymeasurement, analyses and design, thiscourse is intended for those responsible formeasuring, analyzing, designing,automating, implementing, or ensuringsoftware reliability for commercial orgovernment programs.

Design ReliabilityThis intensive overview covers theoreticaland practical aspects of reliability throughconcurrent engineering. Reliability analysis,test and evaluation, parts selection, circuitanalysis, and applicable standards andinformation sources are addressed.

Mechanical ReliabilityPractical applications of mechanicalengineering to system and componentreliability are covered in this popularcourse. Basic theories of mechanicalreliability and the essential tools ofmechanical reliability analysis arecovered and reinforced throughproblem solving and discussion.

Accelerated TestingThe course describes the statistical modelsfor accelerated tests, how to plan efficienttests, and how to estimate and improveproduct reliability. The methods will beillustrated with many applications toelectronic, mechanical, and other products,including your own data, which you areencouraged to bring.

Bring RAC Training Coursesto Your FacilityRAC provides training courses togovernment and industry in virtually everyaspect of reliability and quality. On-sitetraining is often more cost-effective fororganizations, particularly since on-site“closed” courses can be tailored to specificcustomer products and processes. Subjectsaddressed by RAC training include but arenot limited to:

• Design for Reliability• Reliability Modeling• Fault Tree Analysis• Failure Analysis• Statistical Process Control• Software Engineering• Software Reliability• Failure Data Systems• Mechanical Reliability• Reliability Testing• Testability Analysis• Reliability Analysis• Reliability Management/Planning• Quantitative Methods• Microelectronics Standardization• Worst Case Circuit Analysis• Maintainability Testing• Failure Mode, Effects & Criticality Analysis• Total Quality Management• Environmental Stress Screening• Parts Selection and Control• Reliability-Centered Maintenance• Probabilistic Mechanical Design• Qualified Manufacturers List

For further information on scheduled RAC training courses contact Ms. Nan Pfrimmer at the Reliability Analysis Center, (800) 526-4803 or(315) 339-7036. For information about on-site and custom training arrangements contact Mr. Patrick Hetherington at (315) 339-7084..

IIT Research Institute/Reliability Analysis Center201 Mill StreetRome, NY 13440-6916

RAC INQUIRIES AND ORDERS:

Reliability Analysis Center201 Mill Street

Rome, NY USA 13440-6916

(315) 337-0900................................Main Telephone(315) 337-9932.......................................... Facsimile(315) 337-9933...........................Technical Inquiries(800) 526-4802............................Publication Orders(800) 526-4803.........................Training Course Info(888) RAC-USER (722-8737) .....General [email protected] (Product Info)[email protected] Technical InquiriesURL: http://rome.iitri.com/RAC........Web Address

Contact Gina Nash at (800) 526-4802 or theabove address for our latest Product Catalog ora free copy of the RAC Users Guide.

Non-Profit Org.U.S. Postage

PaidUtica, NY

Permit No. 566