design for testability dft seminar
DESCRIPTION
Design for Testability (DfT) SeminarTRANSCRIPT
1
Test Engineering
Courtesy ofPatrick D.T. O’Connor
62 Whitney DriveStevenage
Herts. SG1 4BJ UK
www.pat-oconnor.co.ukwww.pat-oconnor.co.uk/testengineering/htm
[email protected]@ieee.org
2
Test Engineering
Outline (day 1):1. Introduction2. Stress, strength, failure of materials3. Stress, strength, failure of electronics4. Variation and reliability5. Design analysis6. Development test principles
3
Test Engineering
Outline (day 2):7. Materials and systems test8. Electronics test9. Software10. Manufacturing test11. Testing in service12. Data collection and analysis13. Laws, regulations, standards14. Managing test
4
Test Engineering
Why test?
• Design uncertainty• Manufacturing• Variation• Maintenance• Regulations• Contracts
5
Test Engineering
Causes of failure• Design inherently incapable• Variation (parameters, environments)• Wearout• Other time-dependent mechanisms• Sneaks• Errors
We must know them all!
6
Test Engineering
How to test?
• Test to succeed/test to fail?• Accelerated test• Systems and components• Technologies• Processes• Analysis and simulation
7
Test Engineering
Testing tales:• “Our engineers are paid to design right”• “Trains don’t need testing”• Ship engine for a locomotive?• We always have done this test• The telecomms system• MIL-STD-883 IC burn-in test• “Don’t overstress”• Too much test?
8
Test Engineering
Development test principles
•Failure costs exceed costs of test to detect & remove (Deming).
•Failure-free design: selection, training, teams, leadership•Optimise test programme
•Test adds value!
9
Test Engineering
Development test costs
• Test articles (“UUT”)• People X time• Facilities• Delay to market• Downstream opportunities (warranty, fixes, reputation, etc.)
10
Test Engineering
Management aspects:
• Design capability/risks• Markets, competition• Product environment, life• Suppliers• Regulations• Manufacturing, service
11
FAILURE CAUSES: MECHANICAL
• Maximum stress, fracture• Stress cycling, fatigue, creep
(vibration, temperature cycle)• Wear• Corrosion• Manufacture• Variation • Other (leaks, backlash, friction, ...)
12
MATERIAL STRESS, STRENGTH, FAILURE
Properties:• Strength/elasticity (Hooke’s Law)
– Stress (σ) = Young’s Modulus (E) X strain (ε)• Yield strength, ultimate tensile strength
(UTS)• Toughness/brittleness (resistance to
fracture: energy/volume)• Crack growth (Griffith’s Law)
13
MATERIAL STRESS, STRENGTH, FAILURE
Hooke’s Law:Stress
σ
Strain ε
Elastic
Plastic
Yield point
Fracture
Figure 2.1 Material behaviour in tensile stress
14
MATERIAL STRESS, STRENGTH, FAILUREStress
σMPA
10 20 30Strain ε %
Figure 2.2 Tensile stress/strain behaviour of different materials (generalised)
Brittle:cast ironceramicsglass
Ductile:plasticscoppersolder
Tough:kevlarsteelsalloys (Al,Ti, etc.)
400
200
15
FINITE ELEMENT ANALYSIS (MECHANICAL STRESS) (MSC)
16
MECHANICAL FAILURE CAUSES
• Shock overload
Constant failure/hazard rate (CFR/CHR)(Load - Strength Analysis)
• Strength deterioration
Increasing failure/hazard rate (IFR/IHR)Durability
17
CAUSES OF STRENGTH DETERIORATION
• Fatigue (cyclic stress: vibration, handling, temperature cycling)
• Creep (high temperature + mech. stress)
• Wear (parts moving in contact: connectors)
• Corrosion (electrolytic, contamination, ...)
• etc.
18
Stress
S
Cycles to failure N(log scale)
1 10 100 1000 10000 100000
FATIGUE: S - N CURVE
UTS
Fatigue limit
19
FATIGUE: MINER’S RULE
M1 +M2 + … Mk = 1
n1 n2 nk
20
“CLASSIC” FATIGUE FAILURE
Initiating crackor damage
Crack growth rings
Granular fracture surface
21
DESIGN AGAINST FATIGUE
• Reduce mech. stress concentrations (FEA)• Provide support for heavy components,
connectors, etc.• Minimise thermal gradients• Know material fatigue properties
particularly solder!• Design for safe life• Design for fail-safe• Design for inspection & test
22
VIBRATION
Leads to:
• Fatigue• Wear• Loosening • Leaks• Noise
23
VIBRATION
Measures:• Frequency (Hz)
• Displacement (m)
• Velocity (m/s)
• Acceleration (peak) (m/s2 or gn)
• Damping (reduces amplitude)
• Noise, vibration and harshness (NVH)
24
VIBRATION: WATERFALL PLOT
Figure 2.5 Waterfall plot of vibration data
25
TEMPERATURE EFFECTS
• Expansion/contraction (TCE)• Softening, weakening, melting
(metals, some plastics)• Charring (plastics, organics)• Drying/condensation/freezing• Other physical/chemical (Arrhenius’
Law)• Viscosity change, lubricant loss• Interactions (corrosion, …)
26
WEAR MECHANISMS
• Adhesive• Fretting• Abrasive• Cavitation/Erosion• Corrosive
27
WEAR REDUCTION
• Examine• Test/analyse• Lubricate (oils, MoS2-----)• Surface treatment (PTFE, …)• Stress reduction (mech, temp,
vibration)• Material change (eg. non-
abrasive)
28
CORROSION
• Ferrous Alloys (Rust)
• Non - Ferrous:- Al, Mg
• Chemical
• Electrolytic
29
PREVENTING CORROSION
• Material selection• Surface protection
- Anodising- Plating (Cr, Sn, ----)- Painting- Lubricating
• Environmental protection (seals, desiccants)
30
OTHER MECHANICAL FAILURE MECHANISMS
• Backlash (wear?)• Adjustments• Leaks• Loosening (fasteners)
- Wear?- Maintenance?
• etc.
31
MATERIAL SELECTION FOR RELIABILITY/DURABILITY
• Metals:- CorrosionProtectionFatigue
• Plastics, Rubbers:- ChemicalTemperature stabilityUV sensitivity
• Ceramics:- Fracture toughness• Composites:- Impact strength
DelaminationErosion
32
Electrical/electronicsStress, Strength & Failure
• Component selection• Stress derating (electrical, thermal)• EMI, EMC, ESD• Parameter variation• Connectors• Mechanical
33
Stress Effects
• Current– temperature rise– drift
• Voltage– current/overstress (EOS)– arcing, corona discharge
• Power (W=I2R)• Temperature
34
Arrhenius’ Law
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡−=kTEKexpλ
or K ATλ= −⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
exp
E = activation energy (0.3 - 1.5 eV)k = Boltzmann’s constant (8.63 x 10-5 eVK-1)
35
λ
T deg. CRated (85/125)
MIL217,Bellcore
Reality
20 200?
Temperature Effect on Reliability
36
Drift CharacteristicsCarbon Resistor +70C
50% PSR
100% PSR
Changein R%
Time hX1000
0
-0.5
-1.0
-1.5
1.0 1.5 2.0
37
Semiconductor Device Construction Features• Si preparation• Diffusion• Passivation*• Metallization*• Glassivation• Connection• Packaging
(*multilayer)
38
Semiconductor Device Technologies
• ASIC• Mixed signal (analog/digital/RF)• 3-5 (GaAs, InP)• Power (transistors, thyristors, GTO,
IGBT)• Microwave (MMIC)
39
Microcircuit Mounting and Connection
• DIP in PTH• Flat pack / SOIC• Surface mounting
− Leadless chip carrier (LCC)− Pin grid array (PGA)/ball grid array
(BGA)− Chip scale packaging (CSP)− Tape automated bonding (TAB)
• IC sockets (DIP, LCC)
40
Semiconductor Device Failure Mechanisms
1. Die Related• Crystal structure / impurity• Diffusion / masking• Passivation / dielectric breakdown (TDDB)• Electromigration• Passivation• Latch-up• Slow trapping, hot carriers, alpha particle• External: ESD / EOS / EMP
41
Semiconductor Device Failure Mechanisms
2. Package Related• Adhesion• Bonding• Impurity / corrosion / inclusions• Hermeticity• Solderability
42
Passive Device Failure Mechanisms
1. Resistors (Fixed)• Parameter drift• Open circuit• Noise
2. Variables• As above plus:• Mechanical failure• Contact failure• Seal failure
43
Passive Device Failure Mechanisms
3. Capacitors
• Short circuit (dielectric breakdown)• Open circuit (high V)• Leakage (wet types)• Wire bond failure (open circuit)
44
Passive Device Failure Mechanisms
4. Interconnections• PCB
- ball bonds- track cracks (opens)- through hole opens- shorts
• Wire/ribbon− breaks (fatigue, damage)− solder attach
• Intermittents
45
Solder
Major contributor to failures!(SMT, BGA, >10K joints/board)• Inadequate wetting (contamination,
oxidation)• Insufficient time (“second drop”)• Fatigue• Creep
46
Insulation
• Damaged, cut, chafed, trapped, …
• Overheated
• Aged, embrittled
• Eaten (rodents)
47
System/circuit Problems
• Distortion• Jitter• Timing• Interference/compatibility (“noise”)
(EMI/EMC)• Intermittents/no fault found (NFF)
48
EMI: Problems
• High frequencies (MHz - GHz) (VHF-UHF!)• Close spacing (SMT, narrow tracks)• ASICs, mixed signals (digital, RF)• New regulations (UL, CE, etc.)• Lack of knowledge (designers, managers)• Basic EDA does not simulate
49
EMI Sources (internal)
• Current loops (Lenz’s Law: reduce loop area)
• Signal noise (components, conductors)
• Ground noise
50
EMI Sources (external)
• ESD• Switched inductive loads• Supply transients• Other systems (motors, radars,
computers, peripherals)
51
EMI Protection
• Shielding− Faraday Shield− Coax cables
• Circuit protection− Capacitive (decoupling)− Inductive− Opto-couplers− Filters, regulators (on PCB)
52
Electrical Overstress/Electrostatic Damage
EOS/ESD
• ICs ARE VULNERABLE!!• People generate 1 - 5 kV / 50 - 100 μJ• EOS / ESD can kill ICs• It can also do GBH• On-chip protection
53
EOS/ESD Protection
• Connector separation for different voltage levels
• Decoupling of ICs• Isolation (opto-couplers)• Handling / packaging / bonding• On-chip protection
54
Probability Distributions
Histogram and Probability Density Function
x
pdff(x)
55X standard deviation sMean
Probability
Normal Distribution
Variable-4 -3 -2 -1 1 2 3 4
56
”Natural” Variation
• Constant in time. Past = Future
• ”Normal” Distribution Function(Mean, Standard Deviation)
• ”Made by God”
57
Normal (Gaussian) Distribution
• Central Limit Theorem• Symmetrical about mean/median μ• Standard deviation (SD) σ . Variance = σ2
in ±nσ : 1 2 3 6lie: 68% 95% 99.7% 99.999999%
58
Variation in Engineering
• Not ”normal”• Not constant in time. Past NOT = Future• Selection effects• Often deterministic (V = IR, F = ma)• Sometimes due to failures, errors,....• Occasionally catastrophic
(discontinuous, eg. fatigue)• ”Made by man”
59X standard deviation s1 2 3-3 -2 -1 4-4Mean
Probability
Curtailed Distribution
Variable
60
-10% -5% Nom. +5% +10% Parameter
Probability
Effect of Selection
61
Skewed Distribution
Probability
Variable
62
Bimodal Distribution (typical human mortality)
Probability of death at this age
Variable (years)10 20 30 40 50 60 70 80 90 100 110
63
-nσ nσMean
Four distributions with same mean and SD (from Shewhart)
Normal Distributions?
1
2
3
4
64
Weibull Distribution
R = exp[-(t/μ) ]
μ = Characteristic lifeβ = Shape parameter (slope)
= 1 : CHR< 1 : DHR> 1 : IHR
If failure-free life = γ, replace t with (t - γ)
β
65
Load Strength
Probability
ValueL S
a. Non-overlapping distributions
L S
b. Overlapping distributions:wide strength variation (low LR)
L Sd. Overlapping distributions:wide load distribution (high LR)
Distributed load and strength
L S
c. Curtailed strength distribution
66
Distributed Load & StrengthFor Normally Distributed Load L and Strength S
S- L
σσ
σ +σ
σL
S L
L
2 2
2 2
67
Time/load cyclesLog scale
Load
Strength
t’
Time-dependent load and strength
68
Specification
Probability
Strength
Figure 6.3 Strength vs. Specification (time-dependent)
Time
Probability of failingat max. specifiedstress
Strength v. specification(time dependent)
69
Summary of High Reliability Design Principles
• Determine most likely distributions of load and strength
• Evaluate SM for intrinsic reliability• Determine protection methods (load limit,
derate, screen, QC)• Analyse strength degradation modes• Test to corroborate, analyse results• Correct or control (redesign, safe life,
maintenance,...)
70
Multiple Variations
Traditional Method:
• Test effect of one variable at a time
• Cannot test interactions
71
Statistical Design of ExperimentsDoE
• Test all variables simultaneously• Randomisation• Analysis of variance (ANOVA):
1. Determines effects of all variables2. Determines effects of all interactions
(R.A.Fisher, 1926)
72
Genichi Taguchi
• ”Loss to Society”• System Design• Parameter Design• Tolerance Design• Control & Noise Factors• Orthogonal Arrays• Brainstorm
73
DoE: Engineering Aspects
• Statistical v. engineering significance• Randomisation• Cost effectiveness• Confirmation• SPC• CAE• Nonlinearity• Management
74
Confidence and Risk
• s-confidence = probability that population parameter lies between “confidence limits”
• Bigger sample, narrower confidence limits• Risk = (1 - confidence) (probability that
parameter lies outside confidence limits)• s - confidence vs. engineering confidence
75
Statistical, Scientific and Engineering Confidence
• Statistical test (binomial):items tested, 0 failures 0 1 10 2080% s-confidence that R > 0 0.90 0.98 0.99
Data is entirely statistical, no prior knowledge
• Scientific test:items dropped, all fall 0 1 10 20confidence that all will fall 1 1 1 1
Information is deterministic
• Engineering: can range from deterministic to statistical
76
Measures of Reliability
• Failure Rate (FR) (λ)
• Hazard Rate (HR for non-repairable items) (λ)
• Mean Time Between Failures (MTBF) (M)*
• Mean Time to Failure (MTTF) (M)*
• Durability (failure free life; FR = 0)
• Reliability R = Probability of no failures in time t
= e-λt = e-t/M *
*(for constant failure/hazard rate)
77
Patterns of Failure
The Bathtub Curve
t0
CFRDFR(weak)
IFR(wearout)
Total
Infant mortality Useful life Wearout
78
Variation: summary
• Variation is seldom (never?) “normal”• Most important variation is in the tails
– Less data– More uncertain– Conventional stats most misleading
• Variation can change over time• Interaction effects• Variation made by people• Most engineering education maths only
79
Development Test Principles
Categories of test:
• Functional (design proving/proof of principle)• Reliability/durability• Contractual/safety/regulatory• Test and evaluation (T&E)• Beta testing
80
Development Test Principles
Fill ”uncertainty gap”
• Performance/safety: – demonstrate success– perform once
• Reliability/durability: – test to fail– accelerated tests
• Variation: – Taguchi/statistical experiments– Multiple tests?
81
Development Test Principles
• Components, systems, interfaces• Software• External suppliers• FRACAS• Integrated test programme
82
Development Test Principles
Test economics: major driver of development cost & time, BUT:• Failure costs increase during project phases (x10 rule: design, development, production, service)• Failure free design is cheaper!(experience, training, integrated engineering, design analysis)
83
Development Test Principles
Probability
Specification Strength (stress to fail)L
Strength v. Specification
84
Development Test Principles
Probability
Specification Strength (stress to fail)
Strength v. Specification(transient & permanent failures)
Transient Permanent
85
Development Test Principles
Probability
Specification Strength (stress to fail)
Strength v. Specification (time dependent)
Time
86
Development Test Principles
• Failures are often due to combined stresses/strengths (uncertain)• Failures are often influenced by interactions (uncertain)• Failures often time-dependent (uncertain)• Causes of service failures can be shown by different test stresses, e.g.
– vibration/temperature cycle– high frequency/low frequency
87
Development Test Principles
Fundamental principle: increase (combined) stresses to cause failures, then use information to make product strongerLimits:• Technology (e.g. solder melt)• Test capability• Economic
88
Development Test Principles
Testing at “representative” stresses, and hoping for no failures, is ineffective and a waste of resourcesExamples:
• Engines on test beds
• Cars on test tracks
• “Simulated” environmental test (MIL-STD-781, MIL-STD-810, etc.)
89
Environments (1):
• All relevant environments• Combined environments (CERT)• User• Environmental simulation?
Development Test Principles
90
Development Test Principles
Environments (2):
• Thermal• Thermal fatigue (switching)• Vibration• Shock• Humidity• Power supply/load• Transients (ESD, EOS)• Pollution, corrosion• People, other animals• Etc.
91
Development Test Principles
Accelerated stress test
• Miner’s Law for fatigue (mech, thermal)
• Arrhenius Law for thermal acceleration?
• Step-stress testing
• Failure modes relevant, not stress levels!
92
Development Test Principles
Highly accelerated life test (HALT) (1)
• Highly accelerated combined stresses (temperature, cycling, multi-axis vibration, others...)• Step stress to discover transient and permanent limits• Time compression: orders of magnitude• Developed by Gregg Hobbs
93
Development Test PrinciplesHALT (2)
• Special chambers, facilities (QualMark, Thermotron, Screening Systems, TEAM, ...)• Savings: time, space, energy• Optimise manufacturing screens (HASS)• Similar approaches:
– Highly accelerated stress test (HAST)– Stress-induced failure test (STRIFE)– Failure mode verification test (FMVT ®Entela)– Etc.
94
HALT Philosophy (1)
Stress limits
Stress(combined)
Product spec.
Upperoperating
limit
Loweroperating
limit
Upperdestruct
limit
Lowerdestruct
limit
• High stresses = small samples!
95
HASS Philosophy
Stress(combined)
Product spec.
Upperoperating
limit
Loweroperating
limit
Upperdestruct
limit
Lowerdestruct
limit
Precipitation screen
Detection screen
96
HALT/HASS Philosophy (2)S
tress
(S)
Cycles to fail(Log N)
HALT/HASS
ESS
in use
97
Accelerated Test Approach
TE p105
1. What failures might occur in service? (FMEA, etc).2. List/analyse stresses, combinations.3. Plan how to apply.4. Apply single stresses, step increases to failure.5. Analyse failure, strengthen design.6. Iterate 4 & 5 to fundamental limits.7. Repeat with combined stresses.8. Iterate 5 & 6.
98
Accelerated Test Approach
Examples:
• Mechanical (rotating, engines, etc.)– Old lubricants, filters– Low fluid levels (oil, coolant)– Out-of-balance
• Electro-mech (printers, etc.)– Temp, vib, power V level, humidity, ...– Misalign shafts, etc.– Out-of-spec. materials (paper, friction, ...)
• Electronic components/packages, etc.– Temp, vib (high frequencies), etc.– Use vibration transducers (speaker coils?)
99
Accelerated Test Approach
Questions (TE p109):• How many to test? As many as practicable /economic• Can reliability (MTBF, durability) be measured? NO! It will be increased!• How do we know if failure on test could occur in service?Analyse, use experience, THINK!• Product will see no vibration in service. Why vibrate on test?Vibration on test can stimulate failures caused by temp. cycle, handling, etc. in service, QUICKLY!• Is the principle limited to temp, vib, elec stress?Not at all. Apply to fluid systems, mech tolerances, etc.
100
HALT/HASS Payoffs
• Robust designs + capable processes = High Reliability
• Reduced test time and cost• Feedback to design: reduce
“uncertainty gap” on future products• Continuous improvement (“kaizen”)
of design capability (products, processes)
101
Accelerated Test or DoE?
Important Variables, Effects, etc. DoE/HALT?
Parameters: electrical, dimensions, etc. DoEEffects on measured performance parameters, yields DoEStress: temperature, vibration, etc. HALTEffects on reliability/durability HALTSeveral uncertain variables DoENot enough items available for DoE HALTNot enough time available for DoE HALT
102
Circuit Test Principles: Analog
• DC: current, potential, resistance (AVO), capacitance, ...• AC: current, potential, impedance, waveforms, ...• Signals: waveforms, gain, distortion, jitter, ...
103
AB
O (output)
Truth table:A B O
0 0 00 1 01 0 01 1 1
Truth table for 2-input AND gate
inputs
Circuit Test Principles: Digital
Test vectors: 4
(combinational logic)
“Stuck at” faults (SA0, SA1)
104
Circuit Test Principles: Digital
Logic classes:
• Combinational: outputs follow inputs
• Sequential: input dependent, also data flow,
memory allocation
• Dynamic: requires refresh/”keep alive”
105
Circuit Test Principles: Digital
Fault types:
• SA0, SA1• Stuck at input• “At speed”• Pattern sensitive• Etc.
106
Manual Test Equipment
• Basic instruments – DMMs, power meters, ...
• Instruments – oscilloscopes, waveform generators, spectrum analysers, logic analysers, ...
• Special instruments– RF testers, optical signal testers, hi volt, ...
• PC - based
107
Automatic Test Equipment (ATE)
• Vision: Automatic optical inspection (AOI), X-ray (AXI)• Manufacturing defects analyser (MDA)• In-circuit test (ICT) • Fixtureless/flying probe• Functional test (FT) (via circuit connectors)• Combined ICT/FT• Special test (RF, power supplies, manual, “hot rig”..)
108
Test Capability
ATE must:• Confirm correct operation of good circuits• Not classify good as faulty• Detect faulty items• Diagnose fault causes
109
Design for Test (DFT)
Design must allow ATE to:
• Initialize (start clocks, set logic states)• Control (e.g. open feedback loops, force
logic, generate inputs)• Observe (access to important nodes)• Partition (reduce test program complexity)
110
Layout for ICT
• Keep PCB edges clear• Location holes• Large components on top (for double
sided PCBs)• Resistors between power lines and
control signals (resets, enables, tristates)• Clock disable (provide link)
111
Built-in Test (BIT)
• Boundary scan (IEEE 1149.1)• ASICs• Logic and function tests• Complexity, false alarms
112
EMI/EMC TestMust test for:
• Radiated emissions• Conducted emissions (power lines, signal lines)• Compatibility (susceptibility) (radiated, power, signals)• Internal problems• Special situations (rail signalling, avionics, lightning, nuclear (NEMP, etc.)
Standards and regulations
113
Test Control and Data Acquisition (DAQ)
Test databus standards:• General purpose interface bus (GPIB)
(IEEE488)• PC interface bus (PCI), PCI extensions for
instruments (PXI)• VLSI extensions for instruments (VXI)
114
IC Test
• Special/expensive ATE
• Test cost ≅ IC manufacture cost!
• IDDQ test
• BIST
• Standard tests (MIL-STD-883, etc.)
• Rely on IC manufacturer’s tests
115
IDDQ
(mA)
Node state
Etc.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Good device
Defective device (atstates 2,3,10, ...)
0.3
0.2
0.1
Figure 8.11 IDDQ plot
IDDQ Test
116
Standards, References, Software
• MIL-STD-2165 (USA)• DEF STAN 00-13 (UK)• ‘Design for Testability’ - Jon Turino• ‘Testability Advisor’ - Logical Solutions
Inc.
117
Software Reliability• All new systems involved
(operating & test)• Cannot predict failure modes and
effects• Cannot test complete system*• Errors are present in all copies*• S/W - H/W interfaces (keyboards,
sensors, devices, emi)
*Compare VLSI hardware
118
Hardware/Software Reliability Differences (1)1. Failures can be caused by
deficiencies in design, production, use and maintenance.
2. Failures can be due to wear or other energy-related phenomena.
3. No two items are identical. Failures can be caused by variation.
4. Repairs can be made to make equipment more reliable.
5. Reliability may be time-related, with failures occurring as a function of operating (or storage) time, cycles, etc.
6. Reliability may be related to environmental factors (temperature, vibration, humidity, etc.
7. Reliability can be predicted, in principle but mostly with large uncertainty, from knowledge of design, parts, usage, and environmental stress factors.
1. Failures are primarily due to design faults.
2. There are no wearout phenomena. Software failures occur without warning,
3. There is no variation: all copies of a program are identical.
4. There is no repair. The only solution is redesign (reprogramming
5. Reliability is not time related. Failures occur when a specific program step or path is executed or a specific input condition is encountered, which triggers a failure.
6. The external environment does not affect reliability except insofar as it might affect program inputs.
7. Reliability cannot be predicted from any physical bases, since it entirely depends on human factors in design.
119
Hardware/Software Reliability Differences (2)
8. Reliability can be improved by redundancy. since if one path fails, the other will have the error.
9. Failures can occur in components of a system in a pattern that is, to some extent, predictable from the stresses on the components and other factors. Reliability critical lists are useful to identify high risk items.
10. Hardware interfaces are visual; one can see a 10-pin connector.
11. Computer-aided design systems exist that can be used to create and analyse designs.
12. Hardware products use standard components as basic building blocks.
8. Reliability cannot be improved by redundancy if the parallel paths are identical.
9. Failures are rarely predictable from analyses of separate statements. Errors are likely to exist randomly throughout the program, and any statement may be in error. Reliability critical lists are not appropriate.
10. Software interfaces are conceptual rather than visual.
11. There are no computerised methods for software design and analysis.
12. There are no standard parts in software, although there are standardised logic structures. Software reuse is being deployed, but on a limited basis.
120
Software in Engineering• “Real time”• Wide range of interfaces (hardware,
human, timing, ...)• Different levels of embedding
(ASICs, PGAs, BIOS, ...)• Hardware/software options for
functions• Electrically “noisy” environments• Usually smaller
121
Software ReliabilityERROR
Sources of error:• Specification (60%)• Design (20%)• Code(20%) (typo, numerical,
omissions, etc.)• Timing/emi• Data (information) integrity
FAULTFAILURE
122
Error Reduction
• Modular design• Error traps• Remarks• Spec & code review• Test
123
Fault Tolerance• Internal tests (rates of change,
cycle times, logic)• Resets, fault indications• Redundancy, voting• Hardware failure protection
124
Languages• Machine code/microcode• Assembly level/symbolic assemblers
– Both processor specific– Faster, less memory– Difficult, error prone
• High level (HLL) (BASIC, Fortran, *Pascal, *Ada, *C, *C++)– Processor independent– Easier, error protection*– Assemblers, compilers
• Programmable logic controllers (PLCs)• Assemblers, compilers
125
Software Testing (1)
• Total paths = 2n (n = branches + loops)• Test specs
– All requirements (“must do”, “must not do”)– Extreme conditions (timing, parameter
values, rates of change, memory utilisation, ...)
– Input sequences– Fault tolerance/error recovery
126
Software Testing (2)• Module & interface tests (“white box”)
– Data /control flow– Memory allocation– Lookups– Etc.
• System tests– Verification– Validation (“black box”)
127
Documentation• Specifications• Code, remarks• Notebooks• Changes, corrections• Test results:
– Version– Test– Faults
128
Software Reliability Prediction and Measurement
• Methods:– Error/bug count– Time-based (hours, days, CPU
seconds)
• “Cleanroom” approach (IBM)
• Do not use!
129
Test in Manufacture
Manufactured items are either:
1. Good
2. Defective, but detected and fixed or scrapped
3. Defective, but shipped, and might/will fail later
We must inspect/test to discriminate
130
Manufacturing Test Principles (1)
• All testing costs. So minimise (ideal = zero)• But:
– Manufacturing processes generate variation & defects– Later costs of variation & defects can exceed costs of detection & correction/removal
• So:
– Must consider total life cycle (manufacturing, use, ...)
Value-added testing
131
Manufacturing Test Principles (2)
Test cost justification is difficult, because:
• Test costs arise in manufacture; failure costs arise later
•Failure occurrences and costs cannot be predicted
Some testing might be obligatory: calibration, EMI/EMC, safety, etc.
132
Test Capability
Tests must:• Identify good items • Detect defects (parts, processes,
suppliers, ...)• Indicate defect source/location
133
Test OK?
N
YPass?
Y
N
Detect?Y Diagnose,
repair
Next test
N
Figure 10.6 Test pass-fail logic
Test Pass - Fail Logic
134
Test Criteria and Stresses
• Manufacturing tests are not tests of the design
• Manufacturing tests must not damage good items (contrast with development)
135
Manufacturing Test Economics
Aspects to consider:
• Cost of test(s) (setup, run, repairs, ...)• Defects that might be generated upstream• Test capability• Alternatives to test (inspection, ...)• Methods to reduce/prevent defects• Downstream costs of undetected defects• 100% or sample test?
136
Manufacturing Test Economics
Examples:• Screw• Integrated circuit• Automotive gearbox• Car• Spacecraft• Electronics assembly
137
Inspection and Measurement
Inspection:• Visual (manual, automatic)Measurement:• Dimensional (metrology)
– Micrometers, CMMs, ...• Parameters
– mech. (strength, torque, ...)– elec. (instruments, ATE, ...) (Module 8)
Inspection, measurement, test: not absolute definitions
138
Stress Screening
Definition: application of stresses to cause defective items to fail/show without damaging good ones
Alternative terms:• Environmental stress screening (ESS)• Burn-in (electronic components & systems)• STRIFE test• etc.
Guidelines, etc:• US NAVMAT P-9492• US MIL-STD-2164• IEST ESSEH Guidelines
139
Highly Accelerated Stress Screening (HASS)
• Highly accelerated stresses (temp., vib., elec., ...)
• Developed via HALT in development testing• Stresses are not extrapolations of service
conditions• Can be applied only to products that have
been subjected to HALT in development
140
HASS Philosophy (1)
Stress(combined)
Product spec.
Upperoperating
limit
Loweroperating
limit
Upperdestruct
limit
Lowerdestruct
limit
Precipitation screen
Detection screen
141
HALT/HASS Philosophy (2)S
tress
(S)
Cycles to fail(Log N)
HALT/HASS
ESS
in use
142
HASS Philosophy (3)
• Proof (safety) of screen (POS)
• HASA (audit): sample v. 100%
• Review/adapt (e.g. repeat POS)
• Can apply to any technology (elec., mech.)
• Keep flexible (no standard procedures)
143
Electronics Manufacturing Faults
In rough order:• Solder problems (permanent/intermittent o/c
or s/c, weak, ...)• Parts missing/wrong place/wrong value• Part parameters/functions• Damage (physical, ESD, ...)• System/assembly level (cables/connectors,
variation, EMI/EMC, ...)In 1970’s list could have been reversed!
144
Assemble AOI
faildi
pass MDA
faildm
pass ICT/FT
faildf
passShip
Diagnose/repair
CΙCM CF
CR
CA
Figure 10.3 Electronics assembly t est flow example
C = costd = proportion failed
Electronics Test Options/EconomicsBoard test:
145
Electronics Test Options/EconomicsA simple model for the manufacturing and test cost per unit is:
C = CA + CI + CM + CF + (CR + CM + CF ) (dI + dm + df )
If, for example,
CA = $200CI = $10CM = $10CF = $20CR = $50dI = dm = df = 0.05
then the total cost per unit would be $252
146
Fault Proportions & CoverageCoverage %
Fault faults % AOI AXI MDA/ ICT FT HASS
Open circuit 25 40 95 85 95 *Insufficient solder 18 40 80 0 0 20-80Short circuit 13 60 99 99 95 *Component missing 12 90 99 85 85 *Component misaligned 8 80 80 50 0 0Component elec. para error 8 0 0 20/80 80 *Wrong component 5 15 10 80 90 *Other non-electrical 4 80 0 0 0 20-80Excess solder 3 90 90 0 0 0Component reversed 2 90 90 80 90 *
147
Assembly Test
Board 1
Board 2
PSU
Backplane
Keypad
Display
Test Test
148
Electronic Assembly Burn-In (ESS)
• Typically -30ºC to 70ºC, 5 cycles• Power on (monitor)• (Vibrate)• Finds production defects
– Solder– Damage
• Not effective against component defects (low temp, low stress)
149
Integrating Stress Screening
• Integrate with functional test (FT)• Before/after AOI/ICT?• Assembly stages?:
– Board– Intermediate– Final
• Re-screen after repair? YESNo fixed rules!
150
Post-Production Economics
• TE Page 183
151
Electronic Component Test
• All components tested by manufacturers
• Generally not practicable/economic for OEMs/CEMs to test (IC tester $5M!)
• No repair possible• Special cases:
– Power devices?– Etc?
152
Infant mortality
“Freaks”
Good population(zero failures)
Failureprobability
Time (h)10 100 1000 10000
Electronic Component Population Categories
153
IC Test
• MIL - STD - 883 (TE p. 186)– Level A, B, C screens– Burn-in (125°C, 168h)– Plastic/hermetic packages (autoclave test)
• Other standards (CECC, IEC, ...)
Don’t use!
154
In-Service Test Philosophy
Test only:
• If only way to determine correct function• To determine failure cause (diagnostic)• To confirm repair
Optimise during development
155
Test Schedules• Continuous (BIT, monitors, ...)• Time run (electronics, aircraft, engines, ...)• Distance travelled (cars, trains, ...)• Operating cycles (electronics, aircraft
engines, ...)• Calendar (calibration, seasonal, ...)
Must be measuredIntervals, tolerances
156
Examples
• TE pages 191-193
157
Built-in (Self) Test (BIT/BIST)
• Apply only to functions that are not observed
• Keep it simple!– Sensors etc. fail– False alarms
• Implement in software (no weight, power, complexity)
158
“No Fault Found” (NFF)Causes:• Intermittent failures (components, connections, ...)• Tolerance effects• Connectors• BIT false alarms• Incorrect diagnosis/repair• Inconsistent test criteria• People• Ambiguous cause: >1 suspect unit changed
(Also “retest OK” (RTOK), etc.)
50% - 80% of repairs!
159
RCM Objectives• Optimises preventive maintenance (PM)
• Balances cost, availability, reliability, safety
160
Maintenance Categories (1)
Corrective (CM):• Failure repair• Unplanned• Expensive/unsafe
Minimise by high reliability and durability, + effective PM
161
Maintenance Categories (2)
Preventive (PM):• Failure Prevention• Planned• Less Expensive/Safe
Optimise by RCM
162
RCM Decision Logic (1)
Failure Pattern:• Increasing (wearout)? Consider
replacement– Failure-free life (light bulbs/tubes, drive belts,
bearings, ...)
• Decreasing/constant? No replacement(electronics, ...)
163
RCM Replacement Intervals (1)
m 2m 3m
Haz
ard
Rat
e
Time
Decreasing hazard rate:scheduled replacementincreases failureprobability
m 2m 3m
Haz
ard
Rat
e
Time
Constant hazard rate:scheduled replacementhas no effect on failureprobability
164
RCM Replacement Intervals (2)
m 2m 3m
Haz
ard
Rat
e
Time
Increasing hazard rate:scheduled replacementreduces failureprobability
m 2m 3m
Haz
ard
Rat
e
Time
Increasing hazard rate:with failure-free life >m:scheduled replacementmakes failure probability = 0
165
RCM Decision Logic (2)
Failure Effect (FMECA):
• Critical? Consider replacement / PM
• Detectable? Consider PM (eg. fatigue)
166
RCM Decision Logic (3)Failure Cost:
• High? Consider replacement(gearboxes, engines, ...)
• Low? Consider replacement on failure(light bulbs/tubes, hydraulic hoses (?), ...)
167
RCM Decision Logic (4)
ScheduledReplacement
FR Increasing?
FECritical?
FailureDetectable?
FailureCost High?
NoReplacement
PM
Replace OnFailure
No
No
Yes
Yes
No
Yes
Yes
No
168
(Incipient) Failure Detection Methods
Mechanical:• Manual (corrosion, wear, condition, ...)• NDT for fatigue (ultrasonic, dye penetrant,
radiographic, ...)• Oil analysis (spectroscopic, magnetic)• Vibration/acoustic
Electrical/Electronic:• Built-in test• Functional test/calibration
169
Stress Screens for Repairs
• Proves repair effectiveness• Reduces NFF• Use HASS if units subjected to
HALT/HASS
170
Calibration
• Regular test to ensure accuracy– Measuring devices– Instruments– Sensors
• Traceability• Accuracy (ISO5725)• Management, records, labels
171
Organisation and Responsibilities
Test Department:• Provide facilities (strategic, tactical)• Knowledge (methods, requirements,
regulations, standards, ...)• External facilities (contracts, hire, ...)• Maintenance and calibration• Training
172
Organisation and Responsibilities
Projects:
• Create and manage team
• Plan and manage testing
• Liaison with Test Department
• Identify/obtain project-specific requirements
173
Organisation and Responsibilities
Design:• Design product• Design processes (manufacture, test,
maintenance)• Integrate design analysis & development
test• Design review (specification, pre-test, pre-
production)
174
Test Procedures
Include:• Organisation and responsibilities• Methods (design analysis, test)• Test planning and action• Failure reporting (FRACAS)• Project/design reviews• Integration (development, production,
maintenance test)• Test equipment maintenance & calibration• In-service maintenance & calibration
175
Development Test Programme
What/when to test?• Components, modules, system• Component test:
– earlier– more/cheaper– higher stresses– selection
• External suppliers’ products• Output module(s) first
176
Development Test Programme
How many to test?• As many as practicable
(components/modules/systems)• Consider design analyses, risks, time,
costs• Rotate items through tests (e.g. Software,
proving, environmental, ...)Ever heard of too much testing?
177
Testing Purchased Items
Base testing on:• Project requirements• Existing knowledge
– supplier’s data– past use
• Application/risks/novelty/costs ...• Supplier’s test programme/results
Integrate!RetainRepeat
178
In-House v. External Facilities
In-house:• Core technologies
/confidentiality• Designers more
involved• More flexible (?)• Cheaper (?)
External:• Lower capital outlay (?)
• Better facilities /expertise (?)
Consider balanced use of bothTE homepage (/testservices.htm)
179
Project Test Plan (1)
Include:• Requirements (performance, reliability,
standards, ...)• Failures that must/should not occur• Design/design analysis inputs (design review)• Tests to be performed• Test items/allocations• Suppliers’ test requirements• Integration through project phases• Responsibilities (primary, support)• Schedules
180
Project Test Plan (2)
• Single test plan• Link to other project plans
– reliability– safety– quality, ...
• Link/refer to procedures, standards, ...Flowchart: TE Fig. 14.1 (p. 241)
Example: Appendix 3
181
Manufacturing Test Plan
• Develop from development test results• HALT/HASS
Flowchart: TE Fig. 14.2 (p. 242) Example: Appendix 4
182
Management Issues
• Training– degree courses– short courses– on-the-job (HALT/HASS)
• Integration– across functions– through phases
• Economics– Long v. short term– Test adds value
The Practice of Engineering Management, P.D.T. O’Connor (Wiley)
183
The Future of Test
• Virtual test– EDA, FEA, CFD, ...– Simulation– Virtual reality
• “Intelligent” CAE– Integrated physics, variation, ergonomics, ...– automatic design
• Internet• Test hardware (BIT, “Sentient™”, ...)• Computer-based test• Teaching (?)