design for testability dft seminar

1

Test Engineering

Courtesy ofPatrick D.T. O’Connor

62 Whitney DriveStevenage

Herts. SG1 4BJ UK

www.pat-oconnor.co.ukwww.pat-oconnor.co.uk/testengineering/htm

[email protected]@ieee.org

2

Test Engineering

Outline (day 1):1. Introduction2. Stress, strength, failure of materials3. Stress, strength, failure of electronics4. Variation and reliability5. Design analysis6. Development test principles

3

Test Engineering

Outline (day 2):7. Materials and systems test8. Electronics test9. Software10. Manufacturing test11. Testing in service12. Data collection and analysis13. Laws, regulations, standards14. Managing test

4

Test Engineering

Why test?

• Design uncertainty• Manufacturing• Variation• Maintenance• Regulations• Contracts

5

Test Engineering

Causes of failure• Design inherently incapable• Variation (parameters, environments)• Wearout• Other time-dependent mechanisms• Sneaks• Errors

We must know them all!

6

Test Engineering

How to test?

• Test to succeed/test to fail?• Accelerated test• Systems and components• Technologies• Processes• Analysis and simulation

7

Test Engineering

Testing tales:• “Our engineers are paid to design right”• “Trains don’t need testing”• Ship engine for a locomotive?• We always have done this test• The telecomms system• MIL-STD-883 IC burn-in test• “Don’t overstress”• Too much test?

8

Test Engineering

Development test principles

•Failure costs exceed costs of test to detect & remove (Deming).

•Failure-free design: selection, training, teams, leadership•Optimise test programme

•Test adds value!

9

Test Engineering

Development test costs

• Test articles (“UUT”)• People X time• Facilities• Delay to market• Downstream opportunities (warranty, fixes, reputation, etc.)

10

Test Engineering

Management aspects:

• Design capability/risks• Markets, competition• Product environment, life• Suppliers• Regulations• Manufacturing, service

11

FAILURE CAUSES: MECHANICAL

• Maximum stress, fracture• Stress cycling, fatigue, creep

(vibration, temperature cycle)• Wear• Corrosion• Manufacture• Variation • Other (leaks, backlash, friction, ...)

12

MATERIAL STRESS, STRENGTH, FAILURE

Properties:• Strength/elasticity (Hooke’s Law)

– Stress (σ) = Young’s Modulus (E) X strain (ε)• Yield strength, ultimate tensile strength

(UTS)• Toughness/brittleness (resistance to

fracture: energy/volume)• Crack growth (Griffith’s Law)

13

MATERIAL STRESS, STRENGTH, FAILURE

Hooke’s Law:Stress

σ

Strain ε

Elastic

Plastic

Yield point

Fracture

Figure 2.1 Material behaviour in tensile stress

14

MATERIAL STRESS, STRENGTH, FAILUREStress

σMPA

10 20 30Strain ε %

Figure 2.2 Tensile stress/strain behaviour of different materials (generalised)

Brittle:cast ironceramicsglass

Ductile:plasticscoppersolder

Tough:kevlarsteelsalloys (Al,Ti, etc.)

400

200

15

FINITE ELEMENT ANALYSIS (MECHANICAL STRESS) (MSC)

16

MECHANICAL FAILURE CAUSES

• Shock overload

Constant failure/hazard rate (CFR/CHR)(Load - Strength Analysis)

• Strength deterioration

Increasing failure/hazard rate (IFR/IHR)Durability

17

CAUSES OF STRENGTH DETERIORATION

• Fatigue (cyclic stress: vibration, handling, temperature cycling)

• Creep (high temperature + mech. stress)

• Wear (parts moving in contact: connectors)

• Corrosion (electrolytic, contamination, ...)

• etc.

18

Stress

S

Cycles to failure N(log scale)

1 10 100 1000 10000 100000

FATIGUE: S - N CURVE

UTS

Fatigue limit

19

FATIGUE: MINER’S RULE

M1 +M2 + … Mk = 1

n1 n2 nk

20

“CLASSIC” FATIGUE FAILURE

Initiating crackor damage

Crack growth rings

Granular fracture surface

21

DESIGN AGAINST FATIGUE

• Reduce mech. stress concentrations (FEA)• Provide support for heavy components,

connectors, etc.• Minimise thermal gradients• Know material fatigue properties

particularly solder!• Design for safe life• Design for fail-safe• Design for inspection & test

22

VIBRATION

Leads to:

• Fatigue• Wear• Loosening • Leaks• Noise

23

VIBRATION

Measures:• Frequency (Hz)

• Displacement (m)

• Velocity (m/s)

• Acceleration (peak) (m/s2 or gn)

• Damping (reduces amplitude)

• Noise, vibration and harshness (NVH)

24

VIBRATION: WATERFALL PLOT

Figure 2.5 Waterfall plot of vibration data

25

TEMPERATURE EFFECTS

• Expansion/contraction (TCE)• Softening, weakening, melting

(metals, some plastics)• Charring (plastics, organics)• Drying/condensation/freezing• Other physical/chemical (Arrhenius’

Law)• Viscosity change, lubricant loss• Interactions (corrosion, …)

26

WEAR MECHANISMS

• Adhesive• Fretting• Abrasive• Cavitation/Erosion• Corrosive

27

WEAR REDUCTION

• Examine• Test/analyse• Lubricate (oils, MoS2-----)• Surface treatment (PTFE, …)• Stress reduction (mech, temp,

vibration)• Material change (eg. non-

abrasive)

28

CORROSION

• Ferrous Alloys (Rust)

• Non - Ferrous:- Al, Mg

• Chemical

• Electrolytic

29

PREVENTING CORROSION

• Material selection• Surface protection

- Anodising- Plating (Cr, Sn, ----)- Painting- Lubricating

• Environmental protection (seals, desiccants)

30

OTHER MECHANICAL FAILURE MECHANISMS

• Backlash (wear?)• Adjustments• Leaks• Loosening (fasteners)

- Wear?- Maintenance?

• etc.

31

MATERIAL SELECTION FOR RELIABILITY/DURABILITY

• Metals:- CorrosionProtectionFatigue

• Plastics, Rubbers:- ChemicalTemperature stabilityUV sensitivity

• Ceramics:- Fracture toughness• Composites:- Impact strength

DelaminationErosion

32

Electrical/electronicsStress, Strength & Failure

• Component selection• Stress derating (electrical, thermal)• EMI, EMC, ESD• Parameter variation• Connectors• Mechanical

33

Stress Effects

• Current– temperature rise– drift

• Voltage– current/overstress (EOS)– arcing, corona discharge

• Power (W=I2R)• Temperature

34

Arrhenius’ Law

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡−=kTEKexpλ

or K ATλ= −⎡

⎣

⎢⎢⎢⎢

⎤

⎦

⎥⎥⎥⎥

exp

E = activation energy (0.3 - 1.5 eV)k = Boltzmann’s constant (8.63 x 10-5 eVK-1)

35

λ

T deg. CRated (85/125)

MIL217,Bellcore

Reality

20 200?

Temperature Effect on Reliability

36

Drift CharacteristicsCarbon Resistor +70C

50% PSR

100% PSR

Changein R%

Time hX1000

0

-0.5

-1.0

-1.5

1.0 1.5 2.0

37

Semiconductor Device Construction Features• Si preparation• Diffusion• Passivation*• Metallization*• Glassivation• Connection• Packaging

(*multilayer)

38

Semiconductor Device Technologies

• ASIC• Mixed signal (analog/digital/RF)• 3-5 (GaAs, InP)• Power (transistors, thyristors, GTO,

IGBT)• Microwave (MMIC)

39

Microcircuit Mounting and Connection

• DIP in PTH• Flat pack / SOIC• Surface mounting

− Leadless chip carrier (LCC)− Pin grid array (PGA)/ball grid array

(BGA)− Chip scale packaging (CSP)− Tape automated bonding (TAB)

• IC sockets (DIP, LCC)

40

Semiconductor Device Failure Mechanisms

1. Die Related• Crystal structure / impurity• Diffusion / masking• Passivation / dielectric breakdown (TDDB)• Electromigration• Passivation• Latch-up• Slow trapping, hot carriers, alpha particle• External: ESD / EOS / EMP

41

Semiconductor Device Failure Mechanisms

2. Package Related• Adhesion• Bonding• Impurity / corrosion / inclusions• Hermeticity• Solderability

42

Passive Device Failure Mechanisms

1. Resistors (Fixed)• Parameter drift• Open circuit• Noise

2. Variables• As above plus:• Mechanical failure• Contact failure• Seal failure

43


3. Capacitors

• Short circuit (dielectric breakdown)• Open circuit (high V)• Leakage (wet types)• Wire bond failure (open circuit)

44


4. Interconnections• PCB

- ball bonds- track cracks (opens)- through hole opens- shorts

• Wire/ribbon− breaks (fatigue, damage)− solder attach

• Intermittents

45

Solder

Major contributor to failures!(SMT, BGA, >10K joints/board)• Inadequate wetting (contamination,

oxidation)• Insufficient time (“second drop”)• Fatigue• Creep

46

Insulation

• Damaged, cut, chafed, trapped, …

• Overheated

• Aged, embrittled

• Eaten (rodents)

47

System/circuit Problems

• Distortion• Jitter• Timing• Interference/compatibility (“noise”)

(EMI/EMC)• Intermittents/no fault found (NFF)

48

EMI: Problems

• High frequencies (MHz - GHz) (VHF-UHF!)• Close spacing (SMT, narrow tracks)• ASICs, mixed signals (digital, RF)• New regulations (UL, CE, etc.)• Lack of knowledge (designers, managers)• Basic EDA does not simulate

49

EMI Sources (internal)

• Current loops (Lenz’s Law: reduce loop area)

• Signal noise (components, conductors)

• Ground noise

50

EMI Sources (external)

• ESD• Switched inductive loads• Supply transients• Other systems (motors, radars,

computers, peripherals)

51

EMI Protection

• Shielding− Faraday Shield− Coax cables

• Circuit protection− Capacitive (decoupling)− Inductive− Opto-couplers− Filters, regulators (on PCB)

52

Electrical Overstress/Electrostatic Damage

EOS/ESD

• ICs ARE VULNERABLE!!• People generate 1 - 5 kV / 50 - 100 μJ• EOS / ESD can kill ICs• It can also do GBH• On-chip protection

53

EOS/ESD Protection

• Connector separation for different voltage levels

• Decoupling of ICs• Isolation (opto-couplers)• Handling / packaging / bonding• On-chip protection

54

Probability Distributions

Histogram and Probability Density Function

x

pdff(x)

55X standard deviation sMean

Probability

Normal Distribution

Variable-4 -3 -2 -1 1 2 3 4

56

”Natural” Variation

• Constant in time. Past = Future

• ”Normal” Distribution Function(Mean, Standard Deviation)

• ”Made by God”

57

Normal (Gaussian) Distribution

• Central Limit Theorem• Symmetrical about mean/median μ• Standard deviation (SD) σ . Variance = σ2

in ±nσ : 1 2 3 6lie: 68% 95% 99.7% 99.999999%

58

Variation in Engineering

• Not ”normal”• Not constant in time. Past NOT = Future• Selection effects• Often deterministic (V = IR, F = ma)• Sometimes due to failures, errors,....• Occasionally catastrophic

(discontinuous, eg. fatigue)• ”Made by man”

59X standard deviation s1 2 3-3 -2 -1 4-4Mean

Probability

Curtailed Distribution

Variable

60

-10% -5% Nom. +5% +10% Parameter

Probability

Effect of Selection

61

Skewed Distribution

Probability

Variable

62

Bimodal Distribution (typical human mortality)

Probability of death at this age

Variable (years)10 20 30 40 50 60 70 80 90 100 110

63

-nσ nσMean

Four distributions with same mean and SD (from Shewhart)

Normal Distributions?

1

2

3

4

64

Weibull Distribution

R = exp[-(t/μ) ]

μ = Characteristic lifeβ = Shape parameter (slope)

= 1 : CHR< 1 : DHR> 1 : IHR

If failure-free life = γ, replace t with (t - γ)

β

65

Load Strength

Probability

ValueL S

a. Non-overlapping distributions

L S

b. Overlapping distributions:wide strength variation (low LR)

L Sd. Overlapping distributions:wide load distribution (high LR)

Distributed load and strength

L S

c. Curtailed strength distribution

66

Distributed Load & StrengthFor Normally Distributed Load L and Strength S

S- L

σσ

σ +σ

σL

S L

L

2 2

2 2

67

Time/load cyclesLog scale

Load

Strength

t’

Time-dependent load and strength

68

Specification

Probability

Strength

Figure 6.3 Strength vs. Specification (time-dependent)

Time

Probability of failingat max. specifiedstress

Strength v. specification(time dependent)

69

Summary of High Reliability Design Principles

• Determine most likely distributions of load and strength

• Evaluate SM for intrinsic reliability• Determine protection methods (load limit,

derate, screen, QC)• Analyse strength degradation modes• Test to corroborate, analyse results• Correct or control (redesign, safe life,

maintenance,...)

70

Multiple Variations

Traditional Method:

• Test effect of one variable at a time

• Cannot test interactions

71

Statistical Design of ExperimentsDoE

• Test all variables simultaneously• Randomisation• Analysis of variance (ANOVA):

1. Determines effects of all variables2. Determines effects of all interactions

(R.A.Fisher, 1926)

72

Genichi Taguchi

• ”Loss to Society”• System Design• Parameter Design• Tolerance Design• Control & Noise Factors• Orthogonal Arrays• Brainstorm

73

DoE: Engineering Aspects

• Statistical v. engineering significance• Randomisation• Cost effectiveness• Confirmation• SPC• CAE• Nonlinearity• Management

74

Confidence and Risk

• s-confidence = probability that population parameter lies between “confidence limits”

• Bigger sample, narrower confidence limits• Risk = (1 - confidence) (probability that

parameter lies outside confidence limits)• s - confidence vs. engineering confidence

75

Statistical, Scientific and Engineering Confidence

• Statistical test (binomial):items tested, 0 failures 0 1 10 2080% s-confidence that R > 0 0.90 0.98 0.99

Data is entirely statistical, no prior knowledge

• Scientific test:items dropped, all fall 0 1 10 20confidence that all will fall 1 1 1 1

Information is deterministic

• Engineering: can range from deterministic to statistical

76

Measures of Reliability

• Failure Rate (FR) (λ)

• Hazard Rate (HR for non-repairable items) (λ)

• Mean Time Between Failures (MTBF) (M)*

• Mean Time to Failure (MTTF) (M)*

• Durability (failure free life; FR = 0)

• Reliability R = Probability of no failures in time t

= e-λt = e-t/M *

*(for constant failure/hazard rate)

77

Patterns of Failure

The Bathtub Curve

t0

CFRDFR(weak)

IFR(wearout)

Total

Infant mortality Useful life Wearout

78

Variation: summary

• Variation is seldom (never?) “normal”• Most important variation is in the tails

– Less data– More uncertain– Conventional stats most misleading

• Variation can change over time• Interaction effects• Variation made by people• Most engineering education maths only

79

Development Test Principles

Categories of test:

• Functional (design proving/proof of principle)• Reliability/durability• Contractual/safety/regulatory• Test and evaluation (T&E)• Beta testing

80


Fill ”uncertainty gap”

• Performance/safety: – demonstrate success– perform once

• Reliability/durability: – test to fail– accelerated tests

• Variation: – Taguchi/statistical experiments– Multiple tests?

81


• Components, systems, interfaces• Software• External suppliers• FRACAS• Integrated test programme

82


Test economics: major driver of development cost & time, BUT:• Failure costs increase during project phases (x10 rule: design, development, production, service)• Failure free design is cheaper!(experience, training, integrated engineering, design analysis)

83


Probability

Specification Strength (stress to fail)L

Strength v. Specification

84


Probability

Specification Strength (stress to fail)

Strength v. Specification(transient & permanent failures)

Transient Permanent

85


Probability

Specification Strength (stress to fail)

Strength v. Specification (time dependent)

Time

86


• Failures are often due to combined stresses/strengths (uncertain)• Failures are often influenced by interactions (uncertain)• Failures often time-dependent (uncertain)• Causes of service failures can be shown by different test stresses, e.g.

– vibration/temperature cycle– high frequency/low frequency

87


Fundamental principle: increase (combined) stresses to cause failures, then use information to make product strongerLimits:• Technology (e.g. solder melt)• Test capability• Economic

88


Testing at “representative” stresses, and hoping for no failures, is ineffective and a waste of resourcesExamples:

• Engines on test beds

• Cars on test tracks

• “Simulated” environmental test (MIL-STD-781, MIL-STD-810, etc.)

89

Environments (1):

• All relevant environments• Combined environments (CERT)• User• Environmental simulation?


90


Environments (2):

• Thermal• Thermal fatigue (switching)• Vibration• Shock• Humidity• Power supply/load• Transients (ESD, EOS)• Pollution, corrosion• People, other animals• Etc.

91


Accelerated stress test

• Miner’s Law for fatigue (mech, thermal)

• Arrhenius Law for thermal acceleration?

• Step-stress testing

• Failure modes relevant, not stress levels!

92


Highly accelerated life test (HALT) (1)

• Highly accelerated combined stresses (temperature, cycling, multi-axis vibration, others...)• Step stress to discover transient and permanent limits• Time compression: orders of magnitude• Developed by Gregg Hobbs

93

Development Test PrinciplesHALT (2)

• Special chambers, facilities (QualMark, Thermotron, Screening Systems, TEAM, ...)• Savings: time, space, energy• Optimise manufacturing screens (HASS)• Similar approaches:

– Highly accelerated stress test (HAST)– Stress-induced failure test (STRIFE)– Failure mode verification test (FMVT ®Entela)– Etc.

94

HALT Philosophy (1)

Stress limits

Stress(combined)

Product spec.

Upperoperating

limit

Loweroperating

limit

Upperdestruct

limit

Lowerdestruct

limit

• High stresses = small samples!

95

HASS Philosophy

Stress(combined)

Product spec.

Upperoperating

limit

Loweroperating

limit

Upperdestruct

limit

Lowerdestruct

limit

Precipitation screen

Detection screen

96

HALT/HASS Philosophy (2)S

tress

(S)

Cycles to fail(Log N)

HALT/HASS

ESS

in use

97

Accelerated Test Approach

TE p105

1. What failures might occur in service? (FMEA, etc).2. List/analyse stresses, combinations.3. Plan how to apply.4. Apply single stresses, step increases to failure.5. Analyse failure, strengthen design.6. Iterate 4 & 5 to fundamental limits.7. Repeat with combined stresses.8. Iterate 5 & 6.

98


Examples:

• Mechanical (rotating, engines, etc.)– Old lubricants, filters– Low fluid levels (oil, coolant)– Out-of-balance

• Electro-mech (printers, etc.)– Temp, vib, power V level, humidity, ...– Misalign shafts, etc.– Out-of-spec. materials (paper, friction, ...)

• Electronic components/packages, etc.– Temp, vib (high frequencies), etc.– Use vibration transducers (speaker coils?)

99


Questions (TE p109):• How many to test? As many as practicable /economic• Can reliability (MTBF, durability) be measured? NO! It will be increased!• How do we know if failure on test could occur in service?Analyse, use experience, THINK!• Product will see no vibration in service. Why vibrate on test?Vibration on test can stimulate failures caused by temp. cycle, handling, etc. in service, QUICKLY!• Is the principle limited to temp, vib, elec stress?Not at all. Apply to fluid systems, mech tolerances, etc.

100

HALT/HASS Payoffs

• Robust designs + capable processes = High Reliability

• Reduced test time and cost• Feedback to design: reduce

“uncertainty gap” on future products• Continuous improvement (“kaizen”)

of design capability (products, processes)

101

Accelerated Test or DoE?

Important Variables, Effects, etc. DoE/HALT?

Parameters: electrical, dimensions, etc. DoEEffects on measured performance parameters, yields DoEStress: temperature, vibration, etc. HALTEffects on reliability/durability HALTSeveral uncertain variables DoENot enough items available for DoE HALTNot enough time available for DoE HALT

102

Circuit Test Principles: Analog

• DC: current, potential, resistance (AVO), capacitance, ...• AC: current, potential, impedance, waveforms, ...• Signals: waveforms, gain, distortion, jitter, ...

103

AB

O (output)

Truth table:A B O

0 0 00 1 01 0 01 1 1

Truth table for 2-input AND gate

inputs

Circuit Test Principles: Digital

Test vectors: 4

(combinational logic)

“Stuck at” faults (SA0, SA1)

104


Logic classes:

• Combinational: outputs follow inputs

• Sequential: input dependent, also data flow,

memory allocation

• Dynamic: requires refresh/”keep alive”

105


Fault types:

• SA0, SA1• Stuck at input• “At speed”• Pattern sensitive• Etc.

106

Manual Test Equipment

• Basic instruments – DMMs, power meters, ...

• Instruments – oscilloscopes, waveform generators, spectrum analysers, logic analysers, ...

• Special instruments– RF testers, optical signal testers, hi volt, ...

• PC - based

107

Automatic Test Equipment (ATE)

• Vision: Automatic optical inspection (AOI), X-ray (AXI)• Manufacturing defects analyser (MDA)• In-circuit test (ICT) • Fixtureless/flying probe• Functional test (FT) (via circuit connectors)• Combined ICT/FT• Special test (RF, power supplies, manual, “hot rig”..)

108

Test Capability

ATE must:• Confirm correct operation of good circuits• Not classify good as faulty• Detect faulty items• Diagnose fault causes

109

Design for Test (DFT)

Design must allow ATE to:

• Initialize (start clocks, set logic states)• Control (e.g. open feedback loops, force

logic, generate inputs)• Observe (access to important nodes)• Partition (reduce test program complexity)

110

Layout for ICT

• Keep PCB edges clear• Location holes• Large components on top (for double

sided PCBs)• Resistors between power lines and

control signals (resets, enables, tristates)• Clock disable (provide link)

111

Built-in Test (BIT)

• Boundary scan (IEEE 1149.1)• ASICs• Logic and function tests• Complexity, false alarms

112

EMI/EMC TestMust test for:

• Radiated emissions• Conducted emissions (power lines, signal lines)• Compatibility (susceptibility) (radiated, power, signals)• Internal problems• Special situations (rail signalling, avionics, lightning, nuclear (NEMP, etc.)

Standards and regulations

113

Test Control and Data Acquisition (DAQ)

Test databus standards:• General purpose interface bus (GPIB)

(IEEE488)• PC interface bus (PCI), PCI extensions for

instruments (PXI)• VLSI extensions for instruments (VXI)

114

IC Test

• Special/expensive ATE

• Test cost ≅ IC manufacture cost!

• IDDQ test

• BIST

• Standard tests (MIL-STD-883, etc.)

• Rely on IC manufacturer’s tests

115

IDDQ

(mA)

Node state

Etc.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Good device

Defective device (atstates 2,3,10, ...)

0.3

0.2

0.1

Figure 8.11 IDDQ plot

IDDQ Test

116

Standards, References, Software

• MIL-STD-2165 (USA)• DEF STAN 00-13 (UK)• ‘Design for Testability’ - Jon Turino• ‘Testability Advisor’ - Logical Solutions

Inc.

117

Software Reliability• All new systems involved

(operating & test)• Cannot predict failure modes and

effects• Cannot test complete system*• Errors are present in all copies*• S/W - H/W interfaces (keyboards,

sensors, devices, emi)

*Compare VLSI hardware

118

Hardware/Software Reliability Differences (1)1. Failures can be caused by

deficiencies in design, production, use and maintenance.

2. Failures can be due to wear or other energy-related phenomena.

3. No two items are identical. Failures can be caused by variation.

4. Repairs can be made to make equipment more reliable.

5. Reliability may be time-related, with failures occurring as a function of operating (or storage) time, cycles, etc.

6. Reliability may be related to environmental factors (temperature, vibration, humidity, etc.

7. Reliability can be predicted, in principle but mostly with large uncertainty, from knowledge of design, parts, usage, and environmental stress factors.

1. Failures are primarily due to design faults.

2. There are no wearout phenomena. Software failures occur without warning,

3. There is no variation: all copies of a program are identical.

4. There is no repair. The only solution is redesign (reprogramming

5. Reliability is not time related. Failures occur when a specific program step or path is executed or a specific input condition is encountered, which triggers a failure.

6. The external environment does not affect reliability except insofar as it might affect program inputs.

7. Reliability cannot be predicted from any physical bases, since it entirely depends on human factors in design.

119

Hardware/Software Reliability Differences (2)

8. Reliability can be improved by redundancy. since if one path fails, the other will have the error.

9. Failures can occur in components of a system in a pattern that is, to some extent, predictable from the stresses on the components and other factors. Reliability critical lists are useful to identify high risk items.

10. Hardware interfaces are visual; one can see a 10-pin connector.

11. Computer-aided design systems exist that can be used to create and analyse designs.

12. Hardware products use standard components as basic building blocks.

8. Reliability cannot be improved by redundancy if the parallel paths are identical.

9. Failures are rarely predictable from analyses of separate statements. Errors are likely to exist randomly throughout the program, and any statement may be in error. Reliability critical lists are not appropriate.

10. Software interfaces are conceptual rather than visual.

11. There are no computerised methods for software design and analysis.

12. There are no standard parts in software, although there are standardised logic structures. Software reuse is being deployed, but on a limited basis.

120

Software in Engineering• “Real time”• Wide range of interfaces (hardware,

human, timing, ...)• Different levels of embedding

(ASICs, PGAs, BIOS, ...)• Hardware/software options for

functions• Electrically “noisy” environments• Usually smaller

121

Software ReliabilityERROR

Sources of error:• Specification (60%)• Design (20%)• Code(20%) (typo, numerical,

omissions, etc.)• Timing/emi• Data (information) integrity

FAULTFAILURE

122

Error Reduction

• Modular design• Error traps• Remarks• Spec & code review• Test

123

Fault Tolerance• Internal tests (rates of change,

cycle times, logic)• Resets, fault indications• Redundancy, voting• Hardware failure protection

124

Languages• Machine code/microcode• Assembly level/symbolic assemblers

– Both processor specific– Faster, less memory– Difficult, error prone

• High level (HLL) (BASIC, Fortran, *Pascal, *Ada, *C, *C++)– Processor independent– Easier, error protection*– Assemblers, compilers

• Programmable logic controllers (PLCs)• Assemblers, compilers

125

Software Testing (1)

• Total paths = 2n (n = branches + loops)• Test specs

– All requirements (“must do”, “must not do”)– Extreme conditions (timing, parameter

values, rates of change, memory utilisation, ...)

– Input sequences– Fault tolerance/error recovery

126

Software Testing (2)• Module & interface tests (“white box”)

– Data /control flow– Memory allocation– Lookups– Etc.

• System tests– Verification– Validation (“black box”)

127

Documentation• Specifications• Code, remarks• Notebooks• Changes, corrections• Test results:

– Version– Test– Faults

128

Software Reliability Prediction and Measurement

• Methods:– Error/bug count– Time-based (hours, days, CPU

seconds)

• “Cleanroom” approach (IBM)

• Do not use!

129

Test in Manufacture

Manufactured items are either:

1. Good

2. Defective, but detected and fixed or scrapped

3. Defective, but shipped, and might/will fail later

We must inspect/test to discriminate

130

Manufacturing Test Principles (1)

• All testing costs. So minimise (ideal = zero)• But:

– Manufacturing processes generate variation & defects– Later costs of variation & defects can exceed costs of detection & correction/removal

• So:

– Must consider total life cycle (manufacturing, use, ...)

Value-added testing

131

Manufacturing Test Principles (2)

Test cost justification is difficult, because:

• Test costs arise in manufacture; failure costs arise later

•Failure occurrences and costs cannot be predicted

Some testing might be obligatory: calibration, EMI/EMC, safety, etc.

132

Test Capability

Tests must:• Identify good items • Detect defects (parts, processes,

suppliers, ...)• Indicate defect source/location

133

Test OK?

N

YPass?

Y

N

Detect?Y Diagnose,

repair

Next test

N

Figure 10.6 Test pass-fail logic

Test Pass - Fail Logic

134

Test Criteria and Stresses

• Manufacturing tests are not tests of the design

• Manufacturing tests must not damage good items (contrast with development)

135

Manufacturing Test Economics

Aspects to consider:

• Cost of test(s) (setup, run, repairs, ...)• Defects that might be generated upstream• Test capability• Alternatives to test (inspection, ...)• Methods to reduce/prevent defects• Downstream costs of undetected defects• 100% or sample test?

136

Manufacturing Test Economics

Examples:• Screw• Integrated circuit• Automotive gearbox• Car• Spacecraft• Electronics assembly

137

Inspection and Measurement

Inspection:• Visual (manual, automatic)Measurement:• Dimensional (metrology)

– Micrometers, CMMs, ...• Parameters

– mech. (strength, torque, ...)– elec. (instruments, ATE, ...) (Module 8)

Inspection, measurement, test: not absolute definitions

138

Stress Screening

Definition: application of stresses to cause defective items to fail/show without damaging good ones

Alternative terms:• Environmental stress screening (ESS)• Burn-in (electronic components & systems)• STRIFE test• etc.

Guidelines, etc:• US NAVMAT P-9492• US MIL-STD-2164• IEST ESSEH Guidelines

139

Highly Accelerated Stress Screening (HASS)

• Highly accelerated stresses (temp., vib., elec., ...)

• Developed via HALT in development testing• Stresses are not extrapolations of service

conditions• Can be applied only to products that have

been subjected to HALT in development

140

HASS Philosophy (1)

Stress(combined)

Product spec.

Upperoperating

limit

Loweroperating

limit

Upperdestruct

limit

Lowerdestruct

limit

Precipitation screen

Detection screen

141

HALT/HASS Philosophy (2)S

tress

(S)

Cycles to fail(Log N)

HALT/HASS

ESS

in use

142

HASS Philosophy (3)

• Proof (safety) of screen (POS)

• HASA (audit): sample v. 100%

• Review/adapt (e.g. repeat POS)

• Can apply to any technology (elec., mech.)

• Keep flexible (no standard procedures)

143

Electronics Manufacturing Faults

In rough order:• Solder problems (permanent/intermittent o/c

or s/c, weak, ...)• Parts missing/wrong place/wrong value• Part parameters/functions• Damage (physical, ESD, ...)• System/assembly level (cables/connectors,

variation, EMI/EMC, ...)In 1970’s list could have been reversed!

144

Assemble AOI

faildi

pass MDA

faildm

pass ICT/FT

faildf

passShip

Diagnose/repair

CΙCM CF

CR

CA

Figure 10.3 Electronics assembly t est flow example

C = costd = proportion failed

Electronics Test Options/EconomicsBoard test:

145

Electronics Test Options/EconomicsA simple model for the manufacturing and test cost per unit is:

C = CA + CI + CM + CF + (CR + CM + CF ) (dI + dm + df )

If, for example,

CA = $200CI = $10CM = $10CF = $20CR = $50dI = dm = df = 0.05

then the total cost per unit would be $252

146

Fault Proportions & CoverageCoverage %

Fault faults % AOI AXI MDA/ ICT FT HASS

Open circuit 25 40 95 85 95 *Insufficient solder 18 40 80 0 0 20-80Short circuit 13 60 99 99 95 *Component missing 12 90 99 85 85 *Component misaligned 8 80 80 50 0 0Component elec. para error 8 0 0 20/80 80 *Wrong component 5 15 10 80 90 *Other non-electrical 4 80 0 0 0 20-80Excess solder 3 90 90 0 0 0Component reversed 2 90 90 80 90 *

147

Assembly Test

Board 1

Board 2

PSU

Backplane

Keypad

Display

Test Test

148

Electronic Assembly Burn-In (ESS)

• Typically -30ºC to 70ºC, 5 cycles• Power on (monitor)• (Vibrate)• Finds production defects

– Solder– Damage

• Not effective against component defects (low temp, low stress)

149

Integrating Stress Screening

• Integrate with functional test (FT)• Before/after AOI/ICT?• Assembly stages?:

– Board– Intermediate– Final

• Re-screen after repair? YESNo fixed rules!

150

Post-Production Economics

• TE Page 183

151

Electronic Component Test

• All components tested by manufacturers

• Generally not practicable/economic for OEMs/CEMs to test (IC tester $5M!)

• No repair possible• Special cases:

– Power devices?– Etc?

152

Infant mortality

“Freaks”

Good population(zero failures)

Failureprobability

Time (h)10 100 1000 10000

Electronic Component Population Categories

153

IC Test

• MIL - STD - 883 (TE p. 186)– Level A, B, C screens– Burn-in (125°C, 168h)– Plastic/hermetic packages (autoclave test)

• Other standards (CECC, IEC, ...)

Don’t use!

154

In-Service Test Philosophy

Test only:

• If only way to determine correct function• To determine failure cause (diagnostic)• To confirm repair

Optimise during development

155

Test Schedules• Continuous (BIT, monitors, ...)• Time run (electronics, aircraft, engines, ...)• Distance travelled (cars, trains, ...)• Operating cycles (electronics, aircraft

engines, ...)• Calendar (calibration, seasonal, ...)

Must be measuredIntervals, tolerances

156

Examples

• TE pages 191-193

157

Built-in (Self) Test (BIT/BIST)

• Apply only to functions that are not observed

• Keep it simple!– Sensors etc. fail– False alarms

• Implement in software (no weight, power, complexity)

158

“No Fault Found” (NFF)Causes:• Intermittent failures (components, connections, ...)• Tolerance effects• Connectors• BIT false alarms• Incorrect diagnosis/repair• Inconsistent test criteria• People• Ambiguous cause: >1 suspect unit changed

(Also “retest OK” (RTOK), etc.)

50% - 80% of repairs!

159

RCM Objectives• Optimises preventive maintenance (PM)

• Balances cost, availability, reliability, safety

160

Maintenance Categories (1)

Corrective (CM):• Failure repair• Unplanned• Expensive/unsafe

Minimise by high reliability and durability, + effective PM

161

Maintenance Categories (2)

Preventive (PM):• Failure Prevention• Planned• Less Expensive/Safe

Optimise by RCM

162

RCM Decision Logic (1)

Failure Pattern:• Increasing (wearout)? Consider

replacement– Failure-free life (light bulbs/tubes, drive belts,

bearings, ...)

• Decreasing/constant? No replacement(electronics, ...)

163

RCM Replacement Intervals (1)

m 2m 3m

Haz

ard

Rat

e

Time

Decreasing hazard rate:scheduled replacementincreases failureprobability

m 2m 3m

Haz

ard

Rat

e

Time

Constant hazard rate:scheduled replacementhas no effect on failureprobability

164

RCM Replacement Intervals (2)

m 2m 3m

Haz

ard

Rat

e

Time

Increasing hazard rate:scheduled replacementreduces failureprobability

m 2m 3m

Haz

ard

Rat

e

Time

Increasing hazard rate:with failure-free life >m:scheduled replacementmakes failure probability = 0

165


Failure Effect (FMECA):

• Critical? Consider replacement / PM

• Detectable? Consider PM (eg. fatigue)

166

RCM Decision Logic (3)Failure Cost:

• High? Consider replacement(gearboxes, engines, ...)

• Low? Consider replacement on failure(light bulbs/tubes, hydraulic hoses (?), ...)

167


ScheduledReplacement

FR Increasing?

FECritical?

FailureDetectable?

FailureCost High?

NoReplacement

PM

Replace OnFailure

No

No

Yes

Yes

No

Yes

Yes

No

168

(Incipient) Failure Detection Methods

Mechanical:• Manual (corrosion, wear, condition, ...)• NDT for fatigue (ultrasonic, dye penetrant,

radiographic, ...)• Oil analysis (spectroscopic, magnetic)• Vibration/acoustic

Electrical/Electronic:• Built-in test• Functional test/calibration

169

Stress Screens for Repairs

• Proves repair effectiveness• Reduces NFF• Use HASS if units subjected to

HALT/HASS

170

Calibration

• Regular test to ensure accuracy– Measuring devices– Instruments– Sensors

• Traceability• Accuracy (ISO5725)• Management, records, labels

171

Organisation and Responsibilities

Test Department:• Provide facilities (strategic, tactical)• Knowledge (methods, requirements,

regulations, standards, ...)• External facilities (contracts, hire, ...)• Maintenance and calibration• Training

172


Projects:

• Create and manage team

• Plan and manage testing

• Liaison with Test Department

• Identify/obtain project-specific requirements

173


Design:• Design product• Design processes (manufacture, test,

maintenance)• Integrate design analysis & development

test• Design review (specification, pre-test, pre-

production)

174

Test Procedures

Include:• Organisation and responsibilities• Methods (design analysis, test)• Test planning and action• Failure reporting (FRACAS)• Project/design reviews• Integration (development, production,

maintenance test)• Test equipment maintenance & calibration• In-service maintenance & calibration

175

Development Test Programme

What/when to test?• Components, modules, system• Component test:

– earlier– more/cheaper– higher stresses– selection

• External suppliers’ products• Output module(s) first

176

Development Test Programme

How many to test?• As many as practicable

(components/modules/systems)• Consider design analyses, risks, time,

costs• Rotate items through tests (e.g. Software,

proving, environmental, ...)Ever heard of too much testing?

177

Testing Purchased Items

Base testing on:• Project requirements• Existing knowledge

– supplier’s data– past use

• Application/risks/novelty/costs ...• Supplier’s test programme/results

Integrate!RetainRepeat

178

In-House v. External Facilities

In-house:• Core technologies

/confidentiality• Designers more

involved• More flexible (?)• Cheaper (?)

External:• Lower capital outlay (?)

• Better facilities /expertise (?)

Consider balanced use of bothTE homepage (/testservices.htm)

179

Project Test Plan (1)

Include:• Requirements (performance, reliability,

standards, ...)• Failures that must/should not occur• Design/design analysis inputs (design review)• Tests to be performed• Test items/allocations• Suppliers’ test requirements• Integration through project phases• Responsibilities (primary, support)• Schedules

180

Project Test Plan (2)

• Single test plan• Link to other project plans

– reliability– safety– quality, ...

• Link/refer to procedures, standards, ...Flowchart: TE Fig. 14.1 (p. 241)

Example: Appendix 3

181

Manufacturing Test Plan

• Develop from development test results• HALT/HASS

Flowchart: TE Fig. 14.2 (p. 242) Example: Appendix 4

182

Management Issues

• Training– degree courses– short courses– on-the-job (HALT/HASS)

• Integration– across functions– through phases

• Economics– Long v. short term– Test adds value

The Practice of Engineering Management, P.D.T. O’Connor (Wiley)

183

The Future of Test

• Virtual test– EDA, FEA, CFD, ...– Simulation– Virtual reality

• “Intelligent” CAE– Integrated physics, variation, ergonomics, ...– automatic design

• Internet• Test hardware (BIT, “Sentient™”, ...)• Computer-based test• Teaching (?)