fys4260/fys9260: microsystems and electronics … · fys4260/fys9260: microsystems and electronics...

52
FYS4260/FYS9260: Microsystems and Electronics Packaging and Interconnect Reliability and Failure Mechanisms

Upload: duongdieu

Post on 26-Aug-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

FYS4260/FYS9260: Microsystems and Electronics Packaging and Interconnect

Reliability and Failure Mechanisms

Learning objectives

• Topics: – Definition of reliability related terms – Origins for failures in electronics – Accelerated testing – Statistical models for estimation of failure modes and how

we can apply them to estimate time to failure

• Background literature:

– Advanced Electronic Packaging, 2nd edition (Ulrich): Chapter 16: Reliability Considerations

Reliability issues become increasingly important as dimensions become smaller

• Small features mean that failure modes need shorter time to become critical

• Small features often mean more features, which increases failure risk at the system level

• Failures at IC level increasingly show up at package level due to shrinking dimensions

FYS4260/FYS9260 Frode Strisland 3

X-ray inspection of BGA package

What is a failure? • Has your PC failed when

– You cannot use the CD drive? – When the F button does not work? – When the PC will not start?

• We defined failure as when a system is not

operational • It is therefore important to define requirements in

order to have an operational system. FYS4260/FYS9260 Frode Strisland 4

What is failing? • A system (like a computer) consists of

components • Failures are usually observed at the system level,

but actually happen in a small part of a component of the system

• In a system, the term component usually is used for replacable parts, that is, the smallest parts that can be repaired

• A component may consist of multiple non-replacable units, for example SMT components

FYS4260/FYS9260 Frode Strisland 5

Failure mechanisms

• Failure mechanisms are the stresses that cause components to move from operational to failed.

• Stresses include electrical, chemical and mechanical effects

FYS4260/FYS9260 Frode Strisland 6

Yield and reliability • Yield is the fraction of manufactured devices that

passes final testing – Target 100% yield, corresponding to no wreckage

• We start calculating reliability in Day 1 of the manufactured device's operational lifetime

FYS4260/FYS9260 Frode Strisland 7

Yield a prime concern Reliability a prime concern

Two equivalent definitions of reliabilty:

• The probability that a specific unit will be operational for a given period of time

• The fraction of a group of units manufactured toghether that are operational for a given period of time

• Reliability is a number between 1 (all operational) and 0 (all failed)

• Reliability is time dependent, and drops steadily with time. FYS4260/FYS9260 Frode Strisland 8

Mathematical representation of reliability • Example: R(5 years) = 0.90

– Meaning: After 5 years, 90% of units will still be operational

• R(0) = 1 – We start counting reliability at the time when units

have passed final manufacturing tests, assuming all unit are then operational

• 𝑑𝑅(𝑡)𝑑𝑡

≤ 0 for all values of t – Reliability drops with time, R(∞) = 0.

FYS4260/FYS9260 Frode Strisland 9

Drivers for reliability in electronic systems • Cost driven reliability: The unit's reliability

should be sufficient to maximise profit. – Example: Consumer electronics

• Performance driven reliability: No reasonable

money or effort will be spared to maximise reliability – Example: Safety critical components, such as fire

alarms, avionics parts

FYS4260/FYS9260 Frode Strisland 10

Patterns of Failure – the bathtub curve

FYS4260/FYS9260 Frode Strisland 11

Failu

re ra

te

(arb

itrar

y un

its)

Time

Early failures (infant mortality)

Caused by out-of-spec manufacturing failures

Overstress failures Caused by out-of-spec use

Wearout failures Caused by normal

failure modes starting to have effect

What should be the manufacturers warranty time?

FYS4260/FYS9260 Frode Strisland 12

Failu

re ra

te

(arb

itrar

y un

its)

Time

Early failures (infant mortality)

Caused by out-of-spec manufacturing failures

Overstress failures Caused by out-of-spec use

Wearout failures Caused by normal failure modes starting to have effect

Reliability curve corresponding to the bathtub curve

FYS4260/FYS9260 Frode Strisland 13

R(t)

t

Aviod infant mortality! 1

Minimise risk of overstress failures!

Understand wearout failure modes (if you need to make design improvements)

0

Failure modes in electronics packaging • Hardware1) failures are caused by some sort of

stress: – Electrical – Chemical – Mechanical

• Failures can usually be categorized as either – Overstress failures (caused by a single event high stress

level) – Wearout failures (the accumulated effect lower levels of

stress over a longer period of time)

1) We neglect failures caused by (embedded) software

FYS4260/FYS9260 Frode Strisland 14

Failure modes in electronics packaging

FYS4260/FYS9260 Frode Strisland 15

Examples of common failure mechanisms

• Will consider three common cases: – Corrosion – Mechanical stresses – Electrical stresses

FYS4260/FYS9260 Frode Strisland 16

Corrosion • Corrosion is an electrochemical

reaction that takes place when metals come in contact with water and certain dissolved ions

• Electrically conductive metal ions are oxidised into postive valence states, and either become part of a poorly conductivity metal oxide crust (rust) or is dissolved.

FYS4260/FYS9260 Frode Strisland 17

Aggressive PCB corrosion attack due

to NiCd battery leakage

Corrosion (continued)

• Removal of either water or ions stops the corrosion attack

• Corrosion is an increasingly important issue as feature sizes goes down – Traces of humidity diffused through non-hermetic

package material can be sufficient to destroy micro-/nanometer thick metal layers

FYS4260/FYS9260 Frode Strisland 18

Thermodynamic stability of metals towards corrosion • The stronger a metal holds its electrons, the

slower the corrosion rate will be • Noble metals gold and platinum are

thermodynamically more stable as metals than as oxides.

• Other metals are more stable as oxides

FYS4260/FYS9260 Frode Strisland 19

Thermodynamic stability of metals towards corrosion

FYS4260/FYS9260 Frode Strisland 20

The electronegativity scale is relative to the oxidation of hydrogen, defined as 0V

Anodic oxidation A consequence of the difference in thermodynamic stability is that when two metals, e.g. Al and Au, are in contact, electrons will seek to flow towards the most stable metal, leaving the less stable metal even more volunerable to corrosion.

FYS4260/FYS9260 Frode Strisland 21

Au wire bonding towards Al is an situation example where anodic oxidation effects can be seen

Common metals used in microelectronics

FYS4260/FYS9260 Frode Strisland 22

Mechanical stress failures Mechanical stress failures are often tracable to differences in CTE between different materials

FYS4260/FYS9260 Frode Strisland 23

Mechanical failures

• Even carefully matched CTE materials can build up stresses in assembly

• Catastropic failures, e.g. broken bond balls or cracked dies usually occur in early product life

• Stress can cause slowly emerging fatigue, which might not be catastropic, but can cause drift in component or sensor values.

FYS4260/FYS9260 Frode Strisland 24

Electrical stress

• Electrostatic discharge is by far the most common electrical failure mechanism.

• This is an overstress failure mode. • Prevention:

– Careful grounding practices for personnell and assembly sites.

– Build in overstress protection in circuitry – Stringent ESD destructive testing of sample products

FYS4260/FYS9260 Frode Strisland 25

Administrativt – kommende forelesninger

• 11/5: Frist for innlevering av obligatorisk prosjektrapport til kurs-epost (innen kl 23:59)

• 11/5: Oppsummeringsforelesning • 18/5: Presentasjon av prosjektoppgave (detaljer

avtales 11/5) • Onsdag 20/5 kl 13-14: Gjesteforelesning om 3D

pakking • 25/5: Fri (2. pinsedag) • Onsdag 27/5 kl 13-14: Spørretime. Generelle

tilbakemeldinger til rapporter • Mandag 1. juni: Eksamen

FYS4260/FYS9260 Frode Strisland 26

Administrativt (2)

• Veiledning/råd for eksamenslesning • Språk på eksamen

– Engelsk: Har fagord! – Norsk: Mange mest fortrolige med, men oversettelser

kan være en utfordring, f.eks trådbonding (wire bonding)

– Mulig alternativ: Engelsk eksamen, men svar på valgfritt språk (norsk/engelsk, evt norsk med engelske fagord)

FYS4260/FYS9260 Frode Strisland 27

Techniques for failure analysis • Optical (light) microscope (~ 1 µm

resolution) – Inspection of polished cross-section

• Electron microscope (~ 10 nm resolution)

• Scanning Acoustic Microscope • Surface analysis tootls

– Auger, X-ray photoelectron spectroscopy, Secondary Ion Mass Spectroscopy and other surface microanalytic methods can give input on surface compositions and thicknesses

• Package opening (for hermetic packages)

• X-ray or Ultrasonic Imaging FYS4260/FYS9260 Frode Strisland 28

Inspection sequence following accelerated test procedures • 3D X-ray imaging

– Check alignment – Find areas of interest for

cross-section • Scanning Acoustic

Microscope – Find cracks, voids,

delaminations • Microscopy of cross-

sectioned samples FYS4260/FYS9260 Frode Strisland 29

Accelerated testing Accelerated testing is applied to make failure mechanisms happen in a much shorter time than they would in normal/field usage. Purposes include:

• Determine product failure modes • Estimate product operation time in

normal use • Compare reliability of different

production methods FYS4260/FYS9260 Frode Strisland 30

Thermal cycling chamber with elevator bringing components between extreme temperatures

within seconds

Accelerated test procedures Multiple accelerated test procedures are used, some central examples are:

FYS4260/FYS9260 Frode Strisland 31

Test type Description Environmental test

High temperature to induce material disintegration Combined high temperature and high humidity to induce corrosion failures

Mechanical Thermal cycling to exacerbate CTE mismatch Mechanical bending of assemblies

Electrical testing Constant high voltage Pulsed electrostatic discharge testing

Failure rates increases with increasing temperatures The failure rate F of an electronic component increases with temperature, and can often be described by the Arrhenius equation derived in statistical mechanics

F = Ae− EAkBT where

A = constant F = failure rate EA = activation energy (Joule) kB = Boltzmann's constant T = local temperature in Kelvin

FYS4260/FYS9260 Frode Strisland 32

Activation Energies for common Failure Mechanisms in Electronic Circuits

FYS4260/FYS9260 Frode Strisland 33

Table 3.2 in Electronics Packaging and Interconnection Handbook 4th ed

Accelerated testing: Electrostatic Discharge testing • Human Body Model: Controlled charges are

released to the Device Under Test (DUT) until failure

FYS4260/FYS9260 Frode Strisland 34

Test structures for accelerated testing

FYS4260/FYS9260 Frode Strisland 35

Common pitfalls in accelerated failure testing • The acceleration factor triggers failure modes not

a problem at normal stress levels • Existence of multiple failure modes complicates

interpretation • Dependence between different failure modes

– Failures may depend on a combination of factors

FYS4260/FYS9260 Frode Strisland 36

Reliability Metrology: How can we measure and quantify

electronics reliability?

Failure rate = total number of failures

total number of device hours

FYS4260/FYS9260 Frode Strisland 37

Mean Time Before Failure MTBF = 1

Failure rate

Failure rate and MTBF example

Assume that you have observed 4 failures after having 500 devices in test for 1000 hours: • Number of device hours: 500 000 • Failure rate: 8 x 10-6 failures per hour • MTBF: 125 000 hours between failures

FYS4260/FYS9260 Frode Strisland 38

Reliability functions • Reliability fuctions are shaped to predict failure

versus time. • If curves are known, it is possible to estimate

from a small data set the behaviour into the future

R(t) = reliability function – The fraction of devices still operating at time t

F(t) = the cumulative failure function – The fraction of devices that has failed at time t

R(t) + F(t) = 1 FYS4260/FYS9260 Frode Strisland 39

Reliability functions Example • Test data from 1800 test

units subjected to 1000 days of testing

• After 1000 days, 74% of units had failed

FYS4260/FYS9260 Frode Strisland 40

Reliability function and Failure Function

FYS4260/FYS9260 Frode Strisland 41

Reliability functions

Failure density function • The failure density function f(t) gives the

fractional rate that the original (initial) devices are failing at a given time.

• The higher rate, the more rapidly devices are failing

𝑓 𝑡 = 𝑑𝑑(𝑡)𝑑𝑡

= − 𝑑𝑑(𝑡)𝑑𝑡

𝑑 𝑡 = �𝑓 𝑡 𝑑𝑡𝑡

0

FYS4260/FYS9260 Frode Strisland 42

Failure density function Example

FYS4260/FYS9260 Frode Strisland 43

Reliability functions

Hazard rate function

• Failure density function will eventually always drop because it refers to the initial number of test devices

• The hazard rate h(t) gives a more intuitive description:

h t = rate of failure

number of devices still operating=

f(t)R(t)

= f(t)

1 − F(t)

FYS4260/FYS9260 Frode Strisland 44

Hazard rate function Example

FYS4260/FYS9260 Frode Strisland 45

Introduction to statistical treatment of reliability data

Fitting to statistical models is helpful in order to quantify reliability data. Two common models for this are • The Weibull distribution • The Normal Distribution

FYS4260/FYS9260 Frode Strisland 46

Not topic for exam

The Weibull distribution The Weibull distribution failure density function is

f t = βλ t

λβ−1

e− t/λ β Where λ = the lifetime parameter: The average time to failure β = the shape parameter – a measure of how the failure function is distributed around the average lifetime FYS4260/FYS9260 Frode Strisland 47

Not topic for exam

The Weibull distribution function

• β small indicates higher chance of early failure

• β > 1 indicates that wearout failures causes increasing failure rates at later times

FYS4260/FYS9260 Frode Strisland 48

Not topic for exam

The Weibull distribution function Example data

FYS4260/FYS9260 Frode Strisland 49

Not topic for exam

Normal distribution applied for failure patterns Must first determine normal distribution expectation and variance values (after all units have failed):

tavg = sum of times to failure for all units

initial number of units= ∑ tiN

σ = ∑ ti − tavg

2

N

FYS4260/FYS9260 Frode Strisland 50

Not topic for exam

Normal distribution applied for failure patterns Failure density function for the normal distribution:

𝑓 𝑡 =1

𝜎 2𝜋𝑒𝑒𝑒 −

12𝑡 − 𝑡𝑎𝑎𝑎

𝜎

2

FYS4260/FYS9260 Frode Strisland 51

Not topic for exam

End of lecture: Reliability and Failure Models • Important issues:

– Understand how failures develop with time (Bathtub curve)

– Understand the nature of some overstress and wearout failure modes

– Understand how acceleration factors can speed up test and verification (and what can go wrong in these processes)

– Know the main reliability metrology terms – Have a forward look at statistical treatment of

failure data (Weibull and Normal Distribution)