rla -reliability2
TRANSCRIPT
DATA DRIVEN APPROACH
FOR RESIDUAL LIFE PREDICTION
DATA DRIVEN APPROACH
FOR RESIDUAL LIFE PREDICTION
Dr. Bijay Kumar RoutDr. Bijay Kumar Rout
Assistant Professor (Mechanical Engg. Dept)Assistant Professor (Mechanical Engg. Dept)
Birla Institute of Technology and Birla Institute of Technology and Science, PilaniScience, Pilani
Two Day WorkshopTwo Day Workshop on on
RLA of Thermal Power PlantsRLA of Thermal Power Plants
Presentation outline
Introduction Commonly used terms in the field of
Reliability & Maintainability Engg. Prediction of reliability and failure rate Discussion on Weibull failure
(modeling and estimation) Failure due to load and capacity
interaction Concept of residual useful life and
Mean residual life Proportional Hazard Modeling Conclusion
To discuss the prognostics strategies by assessing the degeneration/ degradation of components.
To use the Reliability and hazard rate model to determine Residual Useful Life (RUL) which will provide the Organization a economic advantage and safe environment to work
OBJECTIVE
Functioning Degrading Failing
Plant Health ConditionPlant Health Condition
$$$$$$$$$$$$$$Lose of life and/or system due to catastrophic failure
$ Optimum$ OptimumReplace item with maximum usage
before failure
$$$$$$Prescriptive replacement of
functioning “good” item
Associated Cost Associated Cost with Time of with Time of ReplacementReplacement
TimeTime
Data-driven methodologyData-driven approaches can be divided into two categories: Artificial intelligence (AI) techniques (neural networks, fuzzy systems, decision trees, etc.). Statistical techniques (multivariate statistical methods(PCA and others), PHM, linear and quadratic discriminators, partial least squares, HMM, etc.).
5
“The function of maintenance in a world-class operating environment is not to simply maintain, but to provide reliable systems and to extend the life of systems
at optimum costs.” Stanley Lasday, June 1997, Industrial Heating
Run to failure Inspection
Preventive Maintenance
RCM
Computerized Maintenance Management Systems (CMMS)
TPM
Conditioned-Based Maintenance (CBM)
1930 1950 1990 2000
Sys
tem
Av
ail a
bil
ity
Prognostics & Health Management (PHM)
Evolving Maintenance Strategies Evolving Maintenance Strategies
1. Modeling of stress and damage in electronic parts, structures and equipments utilizing exposure conditions (e.g., usage, temperature, vibration, radiation) to compute accumulated damage.
2. Prediction of the expected life of plant and its major parts.
3. Prediction of the availability of plant4. Prediction of the expected maintenance
load5. Prediction of the support system resources
needed for effective operation
Evolving Maintenance Strategies Evolving Maintenance Strategies
Commonly used terms MTBF: Mean Time Between Failures, . When applied to repairable products this is the average time that a system will operate until the next failure.
Failure Rate: The number of failures per unit of stress. The stress can be expressed in various units and is equal to = 1/.
MTTF or MTFF: The mean time to failure or mean time to first failure. This is the measure applied to systems that can’t be repaired during their mission.
MTTR: Mean time to repair. This is the average elapsed time between a unit failing and its being repaired and returned to service.
Availability: The proportion of time a system is operable. This is only relevant for systems that can be repaired and is given by: Availability = (MTBF)/(MTBF + MTTR)
b10 and b50 Life: the life value at which (10%) 50% of the population has failed. This is also called the median life.
Fault Tree Analysis (FTA): Fault trees are diagrams used to trace symptoms to their root causes. Fault Tree Analysis is the term used to describe the process involved in constructing a fault tree.
Derating: Assigning a product to an application that is a stress level less than the rated stress level for the product. This is analogous to providing a safety factor.
Censored Test: A life test where some units are removed before the end of the test period, even though they have not failed.
Maintainability: A measure of the ability to place a system that has failed back into service. Figures of merit for maintainability include availability and MTTR.
Commonly used terms
Different Views about FailureDifferent Views about Failure
LEAK STARTS
“FAILED” SAYS SAFETY OFFICER
POOL OF OIL
LEAK DETERORATES
“FAILED” SAYS ENGINEER
HIGH OIL CONSUMPTION
“FAILED” SAYS PRODUCTION ENGINEER
EQUIPMENT STOPS WORKING
TIME
Mechanism of Failure
Distortion Fatigue and Fracture
Corrosion Wear
• Buckling• Creep• Yielding• Warped• Thermal
relaxation• Elastic
deformation• Brinelling
• Ductile fracture• Brittle facture• Fatigue
Fracture• High cycle
fatigue• Low Cycle
Fatigue• Residual stress• Torsional
fatigue
• Corrosion Fatigue
• Stress corrosion
• Galvanic corrosion
• Biological corrosion
• Chemical attack
• Abrasive wear• Adhesive wear• Cavitation• Fretting wear• Scoring• Surface-origin
fatigue(pitting)
Reliability ModelingConstruction of a model (life distribution) that represents the times-to-failure (TTF) of the entire system based on the life distributions of the components, subassemblies and/or assemblies ("black boxes") from which it is composed,
12
Reliability Measures
• Cumulative failure function (The probability of failure by time t)
Mean-value function(The expected number of failures experienced by time t according to the model)
• Reliability Function(The probability that the time to failure is larger than t)
0
)(.))(()( dxtfttfEt
t
dxxftTPtF0
)()0()(
t
dxxftFtTPtR )()(1)()(
13
• n(t) = no. of survivors (No. still alive or functioning adequately) at time t.
• empirical failure CDF
• empirical reliability Function
)(
)()()(
1 ii
ii
tnt
ttntnth
)0(
)()()(
1Nt
ttntntf
i
ii
Empirical Reliability Measures
1)(ˆ
n
itF
1
1
11)(ˆ1)(ˆ
n
in
n
itFtR
14
Empirical Reliability MeasuresLet’s say 10 hypothetical component failure times of on life test are {5,10, 17.5, 30, 40, 55, 67.5, 82.5, 100, 117.5}. Determine f(t) and h(t) from the data
15
•Hazard rate function: The conditional probability that a failure per unit time occurs in the intervals given that a failure has not occurred before t.
•Mean Time To Failure (MTTF): The expected time during which the system will function successfully without maintenance or repair
•Failure intensity function: The instantaneous rate of change of the expected number of failures with respect to time.
Cumulative Hazard fn.
)(
)()(lim)( 0 tR
tf
t
tTttTtPth t
0
)( dttRMTTF
dt
tdt
)()(
)()()(ln0
tHdtthtRt
Reliability Measures
16Probability of Failing at a Given Age
Fai
lure
Pro
bab
ilit
y
Time (t)Infant Mortality Wear out
From bathtub curveFrom constant failure rate (exponential)
Reliability Measures
17Cumulative Failure Probability
F(t)
Time (t)Infant Mortality Wear out
0
1
From bathtub curve
From constant failure rate (1-exponential)
Eventually everything fails
Reliability Measures
18
Exponential Distribution Function
• The simplest distribution function, exponential, is characterized by a constant failure rate over the lifetime of the device.
• This is useful for representing a device in which all early failure mechanisms have been eliminated– h(t) = – R(t) = exp(- t)– F(t) = 1 - exp(- t)– f(t) = exp(- t)
– MTTF = 0
texp(- t) dt
19
Weibull Distribution Function• h(t) varies as a power of the age of the device where , and γ are
constants• For < 1 the failure rate decreases with time and can be used to
represent early failure• For = 1, h(t) is constant and can be used to represent steady state,
the failure rate is constant which is a special case of Weibull distribution• For > 1, h(t) increases with time and can be used represent wear out
condition
R(t) =
f(t) =
h(t) =
MTTF =
ttt
tf ,exp)(1
t
tR exp)(
1
)(
)()(
t
tR
tfth
1
1
20
Weibull Cumulative Failure ProbabilityPlotted on Weibull Graph Paper
β > 1
F
Time (t)
.99
β < 1
β = 1
t =
(Sca
le b
ase
d o
n lo
g lo
g 1
/1-F
)
(log scale)
.63
.01
Weibull Distribution Function
21
β < 1 Implies infant mortality
β = 1• Implies failures are random• An old part is as good as a
new part
βOccurs for:
Low cycle fatigue Most bearing and gear failures
Corrosion or Erosion
βImplies rapid wear out in old age
•Occurs for:•Wear-through
Weibull Interpretation
22
• Given a certain form for cumulative failure:
• we can rearrange and take natural logarithms and get:
• If we plot Log log {1/[1-F(t)]} vs log t, the result is a straight line. Special graph paper exists that does these transformations
• Cumulative Hazard function is
)ln(ln)(1
1lnln
ttF
t
tF exp)(1
t
tH )(
Weibull Distribution Function
23
Weibull Plot
COMMANDMATLAB SOFTWARE
X= [72, 82, 97, 103, 113, 117, 126, 127, 127, 139, 154, 159, 199, 207]>>wblplot(X)>>parmhat = wblfit(X)
parmhat = 144.2727 3.6437
βα
24
Bath Tub CurveH
azard
Rate
h
(t)
Time (t)
Infant Mortality Wear out
Bathtub Curve
Constant failure rate
Equipment Failure Profile
The burn-in phase (known also as infant morality, break-in, debugging): During this phase the hazard rate decrease and the failure occur due to causes such as:
Incorrect use procedures
Poor test specifications
Incomplete final test
Poor quality control
Over-stressed parts Wrong handling or packaging
Inadequate materials
Incorrect installation or setup
Poor technical representative training
Marginal parts Poor manufacturing processes or tooling
Power surges
The useful life phase: During this phase the hazard rate is constant and the failures occur randomly or unpredictably. Some of the causes of the failure include:Insufficient design marginsIncorrect use environmentsUndetectable defectsHuman error and abuseUnavoidable failures
Equipment Failure Profile
The wear-out phase: the hazard rate increases. Some of the causes of the failure include:
Wear due to aging, degradation in strength Inadequate or improper preventive maintenance Limited-life componentsWear-out due to friction, misalignments, corrosion and creep, Materials Fatigue Incorrect overhaul practices.
Equipment Failure Profile
28
Remedies
• Early
• Wear out
• Chance
• Quality manufacture/Robust Design
• Physically-based models, preventative maintenance, Robust design (FMEA)
• Tight customer linkages, testing, HAST
29
Failures vs time as a function of Stress/ Load
High Stress
Medium Stress
Low Stress
Hazard
Rate
h(t
)
Time
30
load strength
New units After Infant Mortality
Failure region
load strength
Failure regionApplied or Failure Stress
Pro
bab
ilit
y of
Occ
urre
nce
Failure due to Load & Capacity variations
31
load strength load strength
New units After wear-out
Failure region Failure region
Failure due to Load & Capacity variation
32
Ex: Both strength and stress are normally distributed with respective(, 2) combinations of (50,000 and 5,0002) and (30,000 and 3,0002) so that
the safety factor (difference) has (, 2) = (20,000 and 5,8312). A critical failureoccurs when the difference < 0 (that is, stress exceeds strength).
33
0 2,507 8,338 14,169 20,000 25,831 31,662 37,493
P(critical failure) = P(difference < 0) = P[(0-)/ < (0-20,000)/(5831)] = P(Z < -3.43) .0002 so that reliability = .9998 (99.98%)
NOTE: “reliability” = “safety”
34
Repairable and Non-Repairable Systems
• Non-Repairable– Only need to track first failure
• Repairable– Track Mean Time Between Failures (MTBF)– Time to Repair (T0) Preventive maintenance
schedule
– The smaller the time period more notable the improvement
)()]([)()( 0 xRTRxRxjTR joM
RUL and Mean RUL
))(()()(* tTxtTPxXPxR t
To provide a sufficiently distant planning horizon, remaining machinery useful operating life is developed.
Accumulated operating time is t on the probability that a equipment can survive an additional time x
If a failure should be prevented, then the system should be stopped safely before MRL
)(
)()(*
tR
xtRxR
RUL and Mean RUL
From the reliability model based on the degradation process, the estimation of the mean remaining useful life may then be achieved.
))(()()( tTxtTExXEtMRL t
MTTFMRLtattR
dttR
tMRL t
)0(0)(
)(
)(
Ex: A device by has decreasing failure rate characterized by a two parameter Weibull distribution with α=180 year and β=1/2. The device is required to have a design life reliability of 0.90. What is design life if the device is first subject to wearin period of one month?
RUL and Mean RUL
Proportional Hazard Model (PHM)
The reliability of an equipment or system is greatly influenced by the operating conditions called covariates.
The proportional hazard model (PHM) was proposed by Cox.(1972)
All reliability models consider failure time to model the reliability.
It is possible to include the effect of operating conditions like type of failure, stress etc. in the reliability function.
Proportional Hazard Model
The proportional hazard model is the most general of the regression models
h(t, z) = h0(t) hz(t)
This part is constant for all individuals
This part is a function of individual x values
It adjusts h0 up or down as a function degradation
βze(t)h z We generally use this parametric approach
If Z(t) is a covariate information (measurement) available at time t (which may also include all past information)
h0(t) - baseline hazard; it is the hazard for the respective individual when all independent variable values are equal to zero.
The failure rate of a system is affected by its operation time and also by the covariates. ex: a unit may have been tested under a combination of different accelerated stresses such as humidity, temperature, voltage, etc.
)...exp()(........,,),( 11021 mmm zbzbthzzzth
mmm zbzbthzzzth ...)](/........,,),(log[ 11021
Proportional Hazard Model
Proportional Hazards Model
:
Two Parametric Functional Forms
h(t, z) = h0(t) hz(t)
Weibulle
lexponentiaλe1
β
β
z
z
t
it changes with time whereas in multiple regression the intercept remains constant.
We call h(t,z)/h0(t) as hazard ratio
Regression coefficients are estimated by maximizing the partial likelihood.
The base-line hazard function is not fitted into a specific model and has a non-parametric form.
It represents the hazard rate of a system when all the covariates are equal to zero.
The model assumes that the covariates act multiplicatively on the hazard function, so that for different values of explanatory variables the hazard function at each time are proportional to each other.
Proportional Hazards Model
Key merits of PHM
• Used to investigate to effects of various explanatory variables on hazard of assets/individuals
• It is distribution free, thus it does not have to assume a specific form for the baseline hazard function
• Regression coefficients are estimated using partial likelihood without the need of specifying the baseline hazard function
• This model is available for both static and dynamic explanatory variables (more realistic and reasonable assumption)
• This model handles truncated, non-truncated data, and tied values, Many goodness-of-fit tests and graphical methods are available for this model
Key limitations of PHM
• A vulnerable approach when covariates are deleted or the precision of covariate measurements is changed.
• Mixing different types of covariates in one model may cause some problems
• An asset/individual life is assumed to be terminated at the first failure time. In other words, this model depends only on the time elapsed between the starting event (e.g. 𝑡diagnosis) and the terminal event (e.g. fail) and not on the chronological time .𝑡
• The influence of a covariate in PHM is assumed to be time-independent. Due to proportionality assumption, a common baseline hazard for all assets/individuals has been assumed in a case in which the assets/individuals should be stratified according to baseline.
• Any Questions??
ConclusionEngineering components experience a variety of
environmental conditions and stresses while in service. We need to anticipate the failure by suitable use of modeling technique!!!
Though mathematical models attempt to capture all the degradation and failure mechanism, it is not end in itself !!!