rla -reliability2

DATA DRIVEN APPROACH

FOR RESIDUAL LIFE PREDICTION

DATA DRIVEN APPROACH

FOR RESIDUAL LIFE PREDICTION

Dr. Bijay Kumar RoutDr. Bijay Kumar Rout

Assistant Professor (Mechanical Engg. Dept)Assistant Professor (Mechanical Engg. Dept)

Birla Institute of Technology and Birla Institute of Technology and Science, PilaniScience, Pilani

Two Day WorkshopTwo Day Workshop on on

RLA of Thermal Power PlantsRLA of Thermal Power Plants

Presentation outline

Introduction Commonly used terms in the field of

Reliability & Maintainability Engg. Prediction of reliability and failure rate Discussion on Weibull failure

(modeling and estimation) Failure due to load and capacity

interaction Concept of residual useful life and

Mean residual life Proportional Hazard Modeling Conclusion

To discuss the prognostics strategies by assessing the degeneration/ degradation of components.

To use the Reliability and hazard rate model to determine Residual Useful Life (RUL) which will provide the Organization a economic advantage and safe environment to work

OBJECTIVE

Functioning Degrading Failing

Plant Health ConditionPlant Health Condition

$$$$$$$$$$$$$$Lose of life and/or system due to catastrophic failure

$ Optimum$ OptimumReplace item with maximum usage

before failure

$$$$$$Prescriptive replacement of

functioning “good” item

Associated Cost Associated Cost with Time of with Time of ReplacementReplacement

TimeTime

Data-driven methodologyData-driven approaches can be divided into two categories: Artificial intelligence (AI) techniques (neural networks, fuzzy systems, decision trees, etc.). Statistical techniques (multivariate statistical methods(PCA and others), PHM, linear and quadratic discriminators, partial least squares, HMM, etc.).

5

“The function of maintenance in a world-class operating environment is not to simply maintain, but to provide reliable systems and to extend the life of systems

at optimum costs.” Stanley Lasday, June 1997, Industrial Heating

Run to failure Inspection

Preventive Maintenance

RCM

Computerized Maintenance Management Systems (CMMS)

TPM

Conditioned-Based Maintenance (CBM)

1930 1950 1990 2000

Sys

tem

Av

ail a

bil

ity

Prognostics & Health Management (PHM)

Evolving Maintenance Strategies Evolving Maintenance Strategies

1. Modeling of stress and damage in electronic parts, structures and equipments utilizing exposure conditions (e.g., usage, temperature, vibration, radiation) to compute accumulated damage.

2. Prediction of the expected life of plant and its major parts.

3. Prediction of the availability of plant4. Prediction of the expected maintenance

load5. Prediction of the support system resources

needed for effective operation

Evolving Maintenance Strategies Evolving Maintenance Strategies

Commonly used terms MTBF: Mean Time Between Failures, . When applied to repairable products this is the average time that a system will operate until the next failure.

Failure Rate: The number of failures per unit of stress. The stress can be expressed in various units and is equal to = 1/.

MTTF or MTFF: The mean time to failure or mean time to first failure. This is the measure applied to systems that can’t be repaired during their mission.

MTTR: Mean time to repair. This is the average elapsed time between a unit failing and its being repaired and returned to service.

Availability: The proportion of time a system is operable. This is only relevant for systems that can be repaired and is given by: Availability = (MTBF)/(MTBF + MTTR)

b10 and b50 Life: the life value at which (10%) 50% of the population has failed. This is also called the median life.

Fault Tree Analysis (FTA): Fault trees are diagrams used to trace symptoms to their root causes. Fault Tree Analysis is the term used to describe the process involved in constructing a fault tree.

Derating: Assigning a product to an application that is a stress level less than the rated stress level for the product. This is analogous to providing a safety factor.

Censored Test: A life test where some units are removed before the end of the test period, even though they have not failed.

Maintainability: A measure of the ability to place a system that has failed back into service. Figures of merit for maintainability include availability and MTTR.

Commonly used terms

Different Views about FailureDifferent Views about Failure

LEAK STARTS

“FAILED” SAYS SAFETY OFFICER

POOL OF OIL

LEAK DETERORATES

“FAILED” SAYS ENGINEER

HIGH OIL CONSUMPTION

“FAILED” SAYS PRODUCTION ENGINEER

EQUIPMENT STOPS WORKING

TIME

Mechanism of Failure

Distortion Fatigue and Fracture

Corrosion Wear

• Buckling• Creep• Yielding• Warped• Thermal

relaxation• Elastic

deformation• Brinelling

• Ductile fracture• Brittle facture• Fatigue

Fracture• High cycle

fatigue• Low Cycle

Fatigue• Residual stress• Torsional

fatigue

• Corrosion Fatigue

• Stress corrosion

• Galvanic corrosion

• Biological corrosion

• Chemical attack

• Abrasive wear• Adhesive wear• Cavitation• Fretting wear• Scoring• Surface-origin

fatigue(pitting)

Reliability ModelingConstruction of a model (life distribution) that represents the times-to-failure (TTF) of the entire system based on the life distributions of the components, subassemblies and/or assemblies ("black boxes") from which it is composed,

12

Reliability Measures

• Cumulative failure function (The probability of failure by time t)

Mean-value function(The expected number of failures experienced by time t according to the model)

• Reliability Function(The probability that the time to failure is larger than t)

0

)(.))(()( dxtfttfEt

t

dxxftTPtF0

)()0()(

t

dxxftFtTPtR )()(1)()(

13

• n(t) = no. of survivors (No. still alive or functioning adequately) at time t.

• empirical failure CDF

• empirical reliability Function

)(

)()()(

1 ii

ii

tnt

ttntnth

)0(

)()()(

1Nt

ttntntf

i

ii

Empirical Reliability Measures

1)(ˆ

n

itF

1

1

11)(ˆ1)(ˆ

n

in

n

itFtR

14

Empirical Reliability MeasuresLet’s say 10 hypothetical component failure times of on life test are {5,10, 17.5, 30, 40, 55, 67.5, 82.5, 100, 117.5}. Determine f(t) and h(t) from the data

15

•Hazard rate function: The conditional probability that a failure per unit time occurs in the intervals given that a failure has not occurred before t.

•Mean Time To Failure (MTTF): The expected time during which the system will function successfully without maintenance or repair

•Failure intensity function: The instantaneous rate of change of the expected number of failures with respect to time.

Cumulative Hazard fn.

)(

)()(lim)( 0 tR

tf

t

tTttTtPth t

0

)( dttRMTTF

dt

tdt

)()(

)()()(ln0

tHdtthtRt


16Probability of Failing at a Given Age

Fai

lure

Pro

bab

ilit

y

Time (t)Infant Mortality Wear out

From bathtub curveFrom constant failure rate (exponential)


17Cumulative Failure Probability

F(t)

Time (t)Infant Mortality Wear out

0

1

From bathtub curve

From constant failure rate (1-exponential)

Eventually everything fails


18

Exponential Distribution Function

• The simplest distribution function, exponential, is characterized by a constant failure rate over the lifetime of the device.

• This is useful for representing a device in which all early failure mechanisms have been eliminated– h(t) = – R(t) = exp(- t)– F(t) = 1 - exp(- t)– f(t) = exp(- t)

– MTTF = 0

texp(- t) dt

19

Weibull Distribution Function• h(t) varies as a power of the age of the device where , and γ are

constants• For < 1 the failure rate decreases with time and can be used to

represent early failure• For = 1, h(t) is constant and can be used to represent steady state,

the failure rate is constant which is a special case of Weibull distribution• For > 1, h(t) increases with time and can be used represent wear out

condition

R(t) =

f(t) =

h(t) =

MTTF =

ttt

tf ,exp)(1

t

tR exp)(

1

)(

)()(

t

tR

tfth

1

1

20

Weibull Cumulative Failure ProbabilityPlotted on Weibull Graph Paper

β > 1

F

Time (t)

.99

β < 1

β = 1

t =

(Sca

le b

ase

d o

n lo

g lo

g 1

/1-F

)

(log scale)

.63

.01

Weibull Distribution Function

21

β < 1 Implies infant mortality

β = 1• Implies failures are random• An old part is as good as a

new part

βOccurs for:

Low cycle fatigue Most bearing and gear failures

Corrosion or Erosion

βImplies rapid wear out in old age

•Occurs for:•Wear-through

Weibull Interpretation

22

• Given a certain form for cumulative failure:

• we can rearrange and take natural logarithms and get:

• If we plot Log log {1/[1-F(t)]} vs log t, the result is a straight line. Special graph paper exists that does these transformations

• Cumulative Hazard function is

)ln(ln)(1

1lnln

ttF

t

tF exp)(1

t

tH )(

Weibull Distribution Function

23

Weibull Plot

COMMANDMATLAB SOFTWARE

X= [72, 82, 97, 103, 113, 117, 126, 127, 127, 139, 154, 159, 199, 207]>>wblplot(X)>>parmhat = wblfit(X)

parmhat = 144.2727 3.6437

βα

24

Bath Tub CurveH

azard

Rate

h

(t)

Time (t)

Infant Mortality Wear out

Bathtub Curve

Constant failure rate

Equipment Failure Profile

The burn-in phase (known also as infant morality, break-in, debugging): During this phase the hazard rate decrease and the failure occur due to causes such as:

Incorrect use procedures

Poor test specifications

Incomplete final test

Poor quality control

Over-stressed parts Wrong handling or packaging

Inadequate materials

Incorrect installation or setup

Poor technical representative training

Marginal parts Poor manufacturing processes or tooling

Power surges

The useful life phase: During this phase the hazard rate is constant and the failures occur randomly or unpredictably. Some of the causes of the failure include:Insufficient design marginsIncorrect use environmentsUndetectable defectsHuman error and abuseUnavoidable failures


The wear-out phase: the hazard rate increases. Some of the causes of the failure include:

Wear due to aging, degradation in strength Inadequate or improper preventive maintenance Limited-life componentsWear-out due to friction, misalignments, corrosion and creep, Materials Fatigue Incorrect overhaul practices.


28

Remedies

• Early

• Wear out

• Chance

• Quality manufacture/Robust Design

• Physically-based models, preventative maintenance, Robust design (FMEA)

• Tight customer linkages, testing, HAST

29

Failures vs time as a function of Stress/ Load

High Stress

Medium Stress

Low Stress

Hazard

Rate

h(t

)

Time

30

load strength

New units After Infant Mortality

Failure region

load strength

Failure regionApplied or Failure Stress

Pro

bab

ilit

y of

Occ

urre

nce

Failure due to Load & Capacity variations

31

load strength load strength

New units After wear-out

Failure region Failure region

Failure due to Load & Capacity variation

32

Ex: Both strength and stress are normally distributed with respective(, 2) combinations of (50,000 and 5,0002) and (30,000 and 3,0002) so that

the safety factor (difference) has (, 2) = (20,000 and 5,8312). A critical failureoccurs when the difference < 0 (that is, stress exceeds strength).

33

0 2,507 8,338 14,169 20,000 25,831 31,662 37,493

P(critical failure) = P(difference < 0) = P[(0-)/ < (0-20,000)/(5831)] = P(Z < -3.43) .0002 so that reliability = .9998 (99.98%)

NOTE: “reliability” = “safety”

34

Repairable and Non-Repairable Systems

• Non-Repairable– Only need to track first failure

• Repairable– Track Mean Time Between Failures (MTBF)– Time to Repair (T0) Preventive maintenance

schedule

– The smaller the time period more notable the improvement

)()]([)()( 0 xRTRxRxjTR joM

RUL and Mean RUL

))(()()(* tTxtTPxXPxR t

To provide a sufficiently distant planning horizon, remaining machinery useful operating life is developed.

Accumulated operating time is t on the probability that a equipment can survive an additional time x

If a failure should be prevented, then the system should be stopped safely before MRL

)(

)()(*

tR

xtRxR

RUL and Mean RUL

From the reliability model based on the degradation process, the estimation of the mean remaining useful life may then be achieved.

))(()()( tTxtTExXEtMRL t

MTTFMRLtattR

dttR

tMRL t

)0(0)(

)(

)(

Ex: A device by has decreasing failure rate characterized by a two parameter Weibull distribution with α=180 year and β=1/2. The device is required to have a design life reliability of 0.90. What is design life if the device is first subject to wearin period of one month?

RUL and Mean RUL

Proportional Hazard Model (PHM)

The reliability of an equipment or system is greatly influenced by the operating conditions called covariates.

The proportional hazard model (PHM) was proposed by Cox.(1972)

All reliability models consider failure time to model the reliability.

It is possible to include the effect of operating conditions like type of failure, stress etc. in the reliability function.

Proportional Hazard Model

The proportional hazard model is the most general of the regression models

h(t, z) = h0(t) hz(t)

This part is constant for all individuals

This part is a function of individual x values

It adjusts h0 up or down as a function degradation

βze(t)h z We generally use this parametric approach

If Z(t) is a covariate information (measurement) available at time t (which may also include all past information)

h0(t) - baseline hazard; it is the hazard for the respective individual when all independent variable values are equal to zero.

The failure rate of a system is affected by its operation time and also by the covariates. ex: a unit may have been tested under a combination of different accelerated stresses such as humidity, temperature, voltage, etc.

)...exp()(........,,),( 11021 mmm zbzbthzzzth

mmm zbzbthzzzth ...)](/........,,),(log[ 11021

Proportional Hazard Model

Proportional Hazards Model

:

Two Parametric Functional Forms

h(t, z) = h0(t) hz(t)

Weibulle

lexponentiaλe1

β

β

z

z

t

it changes with time whereas in multiple regression the intercept remains constant.

We call h(t,z)/h0(t) as hazard ratio

Regression coefficients are estimated by maximizing the partial likelihood.

The base-line hazard function is not fitted into a specific model and has a non-parametric form.

It represents the hazard rate of a system when all the covariates are equal to zero.

The model assumes that the covariates act multiplicatively on the hazard function, so that for different values of explanatory variables the hazard function at each time are proportional to each other.

Proportional Hazards Model

Key merits of PHM

• Used to investigate to effects of various explanatory variables on hazard of assets/individuals

• It is distribution free, thus it does not have to assume a specific form for the baseline hazard function

• Regression coefficients are estimated using partial likelihood without the need of specifying the baseline hazard function

• This model is available for both static and dynamic explanatory variables (more realistic and reasonable assumption)

• This model handles truncated, non-truncated data, and tied values, Many goodness-of-fit tests and graphical methods are available for this model

Key limitations of PHM

• A vulnerable approach when covariates are deleted or the precision of covariate measurements is changed.

• Mixing different types of covariates in one model may cause some problems

• An asset/individual life is assumed to be terminated at the first failure time. In other words, this model depends only on the time elapsed between the starting event (e.g. 𝑡diagnosis) and the terminal event (e.g. fail) and not on the chronological time .𝑡

• The influence of a covariate in PHM is assumed to be time-independent. Due to proportionality assumption, a common baseline hazard for all assets/individuals has been assumed in a case in which the assets/individuals should be stratified according to baseline.

• Any Questions??

ConclusionEngineering components experience a variety of

environmental conditions and stresses while in service. We need to anticipate the failure by suitable use of modeling technique!!!

Though mathematical models attempt to capture all the degradation and failure mechanism, it is not end in itself !!!

rla -reliability2

Documents

life of systems

life test

mean time

life value

life distributions

b50 life

median life

life andor system