fys4260/fys9260: microsystems and electronics … · fys4260/fys9260: microsystems and electronics...
TRANSCRIPT
FYS4260/FYS9260: Microsystems and Electronics Packaging and Interconnect
Reliability and Failure Mechanisms
Learning objectives
• Topics: – Definition of reliability related terms – Origins for failures in electronics – Accelerated testing – Statistical models for estimation of failure modes and how
we can apply them to estimate time to failure
• Background literature:
– Advanced Electronic Packaging, 2nd edition (Ulrich): Chapter 16: Reliability Considerations
Reliability issues become increasingly important as dimensions become smaller
• Small features mean that failure modes need shorter time to become critical
• Small features often mean more features, which increases failure risk at the system level
• Failures at IC level increasingly show up at package level due to shrinking dimensions
FYS4260/FYS9260 Frode Strisland 3
X-ray inspection of BGA package
What is a failure? • Has your PC failed when
– You cannot use the CD drive? – When the F button does not work? – When the PC will not start?
• We defined failure as when a system is not
operational • It is therefore important to define requirements in
order to have an operational system. FYS4260/FYS9260 Frode Strisland 4
What is failing? • A system (like a computer) consists of
components • Failures are usually observed at the system level,
but actually happen in a small part of a component of the system
• In a system, the term component usually is used for replacable parts, that is, the smallest parts that can be repaired
• A component may consist of multiple non-replacable units, for example SMT components
FYS4260/FYS9260 Frode Strisland 5
Failure mechanisms
• Failure mechanisms are the stresses that cause components to move from operational to failed.
• Stresses include electrical, chemical and mechanical effects
FYS4260/FYS9260 Frode Strisland 6
Yield and reliability • Yield is the fraction of manufactured devices that
passes final testing – Target 100% yield, corresponding to no wreckage
• We start calculating reliability in Day 1 of the manufactured device's operational lifetime
FYS4260/FYS9260 Frode Strisland 7
Yield a prime concern Reliability a prime concern
Two equivalent definitions of reliabilty:
• The probability that a specific unit will be operational for a given period of time
• The fraction of a group of units manufactured toghether that are operational for a given period of time
• Reliability is a number between 1 (all operational) and 0 (all failed)
• Reliability is time dependent, and drops steadily with time. FYS4260/FYS9260 Frode Strisland 8
Mathematical representation of reliability • Example: R(5 years) = 0.90
– Meaning: After 5 years, 90% of units will still be operational
• R(0) = 1 – We start counting reliability at the time when units
have passed final manufacturing tests, assuming all unit are then operational
• 𝑑𝑅(𝑡)𝑑𝑡
≤ 0 for all values of t – Reliability drops with time, R(∞) = 0.
FYS4260/FYS9260 Frode Strisland 9
Drivers for reliability in electronic systems • Cost driven reliability: The unit's reliability
should be sufficient to maximise profit. – Example: Consumer electronics
• Performance driven reliability: No reasonable
money or effort will be spared to maximise reliability – Example: Safety critical components, such as fire
alarms, avionics parts
FYS4260/FYS9260 Frode Strisland 10
Patterns of Failure – the bathtub curve
FYS4260/FYS9260 Frode Strisland 11
Failu
re ra
te
(arb
itrar
y un
its)
Time
Early failures (infant mortality)
Caused by out-of-spec manufacturing failures
Overstress failures Caused by out-of-spec use
Wearout failures Caused by normal
failure modes starting to have effect
What should be the manufacturers warranty time?
FYS4260/FYS9260 Frode Strisland 12
Failu
re ra
te
(arb
itrar
y un
its)
Time
Early failures (infant mortality)
Caused by out-of-spec manufacturing failures
Overstress failures Caused by out-of-spec use
Wearout failures Caused by normal failure modes starting to have effect
Reliability curve corresponding to the bathtub curve
FYS4260/FYS9260 Frode Strisland 13
R(t)
t
Aviod infant mortality! 1
Minimise risk of overstress failures!
Understand wearout failure modes (if you need to make design improvements)
0
Failure modes in electronics packaging • Hardware1) failures are caused by some sort of
stress: – Electrical – Chemical – Mechanical
• Failures can usually be categorized as either – Overstress failures (caused by a single event high stress
level) – Wearout failures (the accumulated effect lower levels of
stress over a longer period of time)
1) We neglect failures caused by (embedded) software
FYS4260/FYS9260 Frode Strisland 14
Examples of common failure mechanisms
• Will consider three common cases: – Corrosion – Mechanical stresses – Electrical stresses
FYS4260/FYS9260 Frode Strisland 16
Corrosion • Corrosion is an electrochemical
reaction that takes place when metals come in contact with water and certain dissolved ions
• Electrically conductive metal ions are oxidised into postive valence states, and either become part of a poorly conductivity metal oxide crust (rust) or is dissolved.
FYS4260/FYS9260 Frode Strisland 17
Aggressive PCB corrosion attack due
to NiCd battery leakage
Corrosion (continued)
• Removal of either water or ions stops the corrosion attack
• Corrosion is an increasingly important issue as feature sizes goes down – Traces of humidity diffused through non-hermetic
package material can be sufficient to destroy micro-/nanometer thick metal layers
FYS4260/FYS9260 Frode Strisland 18
Thermodynamic stability of metals towards corrosion • The stronger a metal holds its electrons, the
slower the corrosion rate will be • Noble metals gold and platinum are
thermodynamically more stable as metals than as oxides.
• Other metals are more stable as oxides
FYS4260/FYS9260 Frode Strisland 19
Thermodynamic stability of metals towards corrosion
FYS4260/FYS9260 Frode Strisland 20
The electronegativity scale is relative to the oxidation of hydrogen, defined as 0V
Anodic oxidation A consequence of the difference in thermodynamic stability is that when two metals, e.g. Al and Au, are in contact, electrons will seek to flow towards the most stable metal, leaving the less stable metal even more volunerable to corrosion.
FYS4260/FYS9260 Frode Strisland 21
Au wire bonding towards Al is an situation example where anodic oxidation effects can be seen
Mechanical stress failures Mechanical stress failures are often tracable to differences in CTE between different materials
FYS4260/FYS9260 Frode Strisland 23
Mechanical failures
• Even carefully matched CTE materials can build up stresses in assembly
• Catastropic failures, e.g. broken bond balls or cracked dies usually occur in early product life
• Stress can cause slowly emerging fatigue, which might not be catastropic, but can cause drift in component or sensor values.
FYS4260/FYS9260 Frode Strisland 24
Electrical stress
• Electrostatic discharge is by far the most common electrical failure mechanism.
• This is an overstress failure mode. • Prevention:
– Careful grounding practices for personnell and assembly sites.
– Build in overstress protection in circuitry – Stringent ESD destructive testing of sample products
FYS4260/FYS9260 Frode Strisland 25
Administrativt – kommende forelesninger
• 11/5: Frist for innlevering av obligatorisk prosjektrapport til kurs-epost (innen kl 23:59)
• 11/5: Oppsummeringsforelesning • 18/5: Presentasjon av prosjektoppgave (detaljer
avtales 11/5) • Onsdag 20/5 kl 13-14: Gjesteforelesning om 3D
pakking • 25/5: Fri (2. pinsedag) • Onsdag 27/5 kl 13-14: Spørretime. Generelle
tilbakemeldinger til rapporter • Mandag 1. juni: Eksamen
FYS4260/FYS9260 Frode Strisland 26
Administrativt (2)
• Veiledning/råd for eksamenslesning • Språk på eksamen
– Engelsk: Har fagord! – Norsk: Mange mest fortrolige med, men oversettelser
kan være en utfordring, f.eks trådbonding (wire bonding)
– Mulig alternativ: Engelsk eksamen, men svar på valgfritt språk (norsk/engelsk, evt norsk med engelske fagord)
FYS4260/FYS9260 Frode Strisland 27
Techniques for failure analysis • Optical (light) microscope (~ 1 µm
resolution) – Inspection of polished cross-section
• Electron microscope (~ 10 nm resolution)
• Scanning Acoustic Microscope • Surface analysis tootls
– Auger, X-ray photoelectron spectroscopy, Secondary Ion Mass Spectroscopy and other surface microanalytic methods can give input on surface compositions and thicknesses
• Package opening (for hermetic packages)
• X-ray or Ultrasonic Imaging FYS4260/FYS9260 Frode Strisland 28
Inspection sequence following accelerated test procedures • 3D X-ray imaging
– Check alignment – Find areas of interest for
cross-section • Scanning Acoustic
Microscope – Find cracks, voids,
delaminations • Microscopy of cross-
sectioned samples FYS4260/FYS9260 Frode Strisland 29
Accelerated testing Accelerated testing is applied to make failure mechanisms happen in a much shorter time than they would in normal/field usage. Purposes include:
• Determine product failure modes • Estimate product operation time in
normal use • Compare reliability of different
production methods FYS4260/FYS9260 Frode Strisland 30
Thermal cycling chamber with elevator bringing components between extreme temperatures
within seconds
Accelerated test procedures Multiple accelerated test procedures are used, some central examples are:
FYS4260/FYS9260 Frode Strisland 31
Test type Description Environmental test
High temperature to induce material disintegration Combined high temperature and high humidity to induce corrosion failures
Mechanical Thermal cycling to exacerbate CTE mismatch Mechanical bending of assemblies
Electrical testing Constant high voltage Pulsed electrostatic discharge testing
Failure rates increases with increasing temperatures The failure rate F of an electronic component increases with temperature, and can often be described by the Arrhenius equation derived in statistical mechanics
F = Ae− EAkBT where
A = constant F = failure rate EA = activation energy (Joule) kB = Boltzmann's constant T = local temperature in Kelvin
FYS4260/FYS9260 Frode Strisland 32
Activation Energies for common Failure Mechanisms in Electronic Circuits
FYS4260/FYS9260 Frode Strisland 33
Table 3.2 in Electronics Packaging and Interconnection Handbook 4th ed
Accelerated testing: Electrostatic Discharge testing • Human Body Model: Controlled charges are
released to the Device Under Test (DUT) until failure
FYS4260/FYS9260 Frode Strisland 34
Common pitfalls in accelerated failure testing • The acceleration factor triggers failure modes not
a problem at normal stress levels • Existence of multiple failure modes complicates
interpretation • Dependence between different failure modes
– Failures may depend on a combination of factors
FYS4260/FYS9260 Frode Strisland 36
Reliability Metrology: How can we measure and quantify
electronics reliability?
Failure rate = total number of failures
total number of device hours
FYS4260/FYS9260 Frode Strisland 37
Mean Time Before Failure MTBF = 1
Failure rate
Failure rate and MTBF example
Assume that you have observed 4 failures after having 500 devices in test for 1000 hours: • Number of device hours: 500 000 • Failure rate: 8 x 10-6 failures per hour • MTBF: 125 000 hours between failures
FYS4260/FYS9260 Frode Strisland 38
Reliability functions • Reliability fuctions are shaped to predict failure
versus time. • If curves are known, it is possible to estimate
from a small data set the behaviour into the future
R(t) = reliability function – The fraction of devices still operating at time t
F(t) = the cumulative failure function – The fraction of devices that has failed at time t
R(t) + F(t) = 1 FYS4260/FYS9260 Frode Strisland 39
Reliability functions Example • Test data from 1800 test
units subjected to 1000 days of testing
• After 1000 days, 74% of units had failed
FYS4260/FYS9260 Frode Strisland 40
Reliability functions
Failure density function • The failure density function f(t) gives the
fractional rate that the original (initial) devices are failing at a given time.
• The higher rate, the more rapidly devices are failing
𝑓 𝑡 = 𝑑𝑑(𝑡)𝑑𝑡
= − 𝑑𝑑(𝑡)𝑑𝑡
𝑑 𝑡 = �𝑓 𝑡 𝑑𝑡𝑡
0
FYS4260/FYS9260 Frode Strisland 42
Reliability functions
Hazard rate function
• Failure density function will eventually always drop because it refers to the initial number of test devices
• The hazard rate h(t) gives a more intuitive description:
h t = rate of failure
number of devices still operating=
f(t)R(t)
= f(t)
1 − F(t)
FYS4260/FYS9260 Frode Strisland 44
Introduction to statistical treatment of reliability data
Fitting to statistical models is helpful in order to quantify reliability data. Two common models for this are • The Weibull distribution • The Normal Distribution
FYS4260/FYS9260 Frode Strisland 46
Not topic for exam
The Weibull distribution The Weibull distribution failure density function is
f t = βλ t
λβ−1
e− t/λ β Where λ = the lifetime parameter: The average time to failure β = the shape parameter – a measure of how the failure function is distributed around the average lifetime FYS4260/FYS9260 Frode Strisland 47
Not topic for exam
The Weibull distribution function
• β small indicates higher chance of early failure
• β > 1 indicates that wearout failures causes increasing failure rates at later times
FYS4260/FYS9260 Frode Strisland 48
Not topic for exam
The Weibull distribution function Example data
FYS4260/FYS9260 Frode Strisland 49
Not topic for exam
Normal distribution applied for failure patterns Must first determine normal distribution expectation and variance values (after all units have failed):
tavg = sum of times to failure for all units
initial number of units= ∑ tiN
σ = ∑ ti − tavg
2
N
FYS4260/FYS9260 Frode Strisland 50
Not topic for exam
Normal distribution applied for failure patterns Failure density function for the normal distribution:
𝑓 𝑡 =1
𝜎 2𝜋𝑒𝑒𝑒 −
12𝑡 − 𝑡𝑎𝑎𝑎
𝜎
2
FYS4260/FYS9260 Frode Strisland 51
Not topic for exam
End of lecture: Reliability and Failure Models • Important issues:
– Understand how failures develop with time (Bathtub curve)
– Understand the nature of some overstress and wearout failure modes
– Understand how acceleration factors can speed up test and verification (and what can go wrong in these processes)
– Know the main reliability metrology terms – Have a forward look at statistical treatment of
failure data (Weibull and Normal Distribution)