modeling and optimization of maintenance · this field with several universal optimization models...
TRANSCRIPT
MODELING AND OPTIMIZATION OF MAINTENANCE SYSTEMS
Xiaoyue Jiang
A thesis submitled in conformity with the requirements
for the degree of doctor of philosophy
Graduate Department of Mechanical and Industrial Engineering
University of Toronto
@Copyright by lXiaoyue Jiang (2001)
National Library If! of Canada Bibliotheque nationale du Canada
Acquisitions and Acquisitions et Bibliographic Services sewices bibliogmphiques
395 Wellington Street 395, rue Wellington Ottawa ON K I A O N 4 Ottawa ON K I A ON4 Canada Canada
The author has granted a non- L'auteur a accorde une licence non exclusive licence allowing the exclusive pennettant a la National Library of Canada to Bibliotheque nationale du Canada de reproduce, loan, distn'bute or sell reproduire, prGter, distribuer ou copies of this thesis in microform, vendre des copies de cette these sous paper or electronic formats. la forme de rnicrofiche/film, de
reproduction sur papier ou sur format electronique.
The author retains ownership of the L'auteur consewe la propriete du copyright in this thesis. Neither the droit d'auteur qui protege cette these. thesis nor substantial extracts from it Ni la these ni des extraits substantiels may be printed or othewise de celle-ci ne doivent etre imprimes reproduced without the author's ou autrement reproduits sans son permission. autorisation.
To my parents
MODELIXG AND OPTIMIZATION OF MAINTEN-UCE SYSTEMS
Xiaoyue Jiang (Ph.D. 2001)
Department of Mechanical and Industrial Engineering, University of Toronto
Abstract
This thesis focuses on modeling and optimization of maintenance systems.
Although the terminology we use is within the domain of manufacturing in-
dustry. we can identify its potentials in IT sections, such as software reliability
engineering and communication network management. to name a few.
The basic problem we are attacking is how to arrange preventive replace-
ment optimally based on the available information about the system's health
condition. Instead of emphasizing the concrete models. which are extremely
rich and diverse, we focus on the fundamental methodologies to grasp the
essence of this subject. In Chapters 2 to 6. we propose five models. which
can be roughly classified into two categories: age-based models (Chapters 2.
3 and 4) and condition-based models (Chapters 5 and 6). While each of the
models is of its own practice interest. it serves also as the vehicle to convey the
methodologies we integrated from the literature or developed in this thesis.
We solve these models in a fairly unified manner. The unified methodology is
further summarized in Chapter 7 in terms of a common modeling framework
and the associated optimization procedure. We espect that this framework
will be valuable for a wide range of applications.
Acknowledgements
I wish to thank my thesis supervisors Professor Viliam hIakis and Professor
-4ndrf.w K.S. Jardirle for their technical insights and guidance during the course
of my thesis. Their generous support and encouragement at critical times was
much appreciated.
Working on the thesis wodd not he the same without, the community and
s i p port of the CB hI Lab researchers and st udmts. Their suggest ions. inspira-
tion. and friendship over years has made a big difference to me. In particiilar.
I wish to mention 4111 Yang. Dr. Darning Lin. Dr. Dragan Banjevic. Walter
Wei Hlia Mi. Jayne Beardsmore Yimin Zhan. Babak Karirni and Kevin Doyle.
I am deepb indebted to my parents for r b i r loving snpport. and to my
talented brorher Zhongyie who has given me more than he mill ewr know.
Finally. I wish to thank my wife Jun who has been there with me rvery step
of the way.
Contents
1 INTRODUCTION 1
1.1 Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Core hlet hodologies . . . . . . . . . . . . . . . . . . . . . . . . . -I
1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 OPTIMALITY OF REPAIR-COST-LIMIT POLICIES 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Background and 4fodel Description . . . . . . . . . . . . . . . . 19
2.3 Repair/Replacement Problem . . . . . . . . . . . . . . . . . . . '23
2.4 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3 OPTIMAL PREVENTIVE REPLACEMENT UNDER MIN-
IMAL REPAIR AND RANDOM REPAIR COST 44
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction 44
. . . . . . . . . . . . . . . . . . . . . . . . 3.2 Problem Formulation 48
3.3 Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
. . . . . . . . . . . . . . . . . . . . . 3.1 Computational Algorithm 69
. . . . . . . . . . . 3.5 Optimal Policy in the Discounted Cost Case 74
4 OPTIMAL MAINTENANCE POLICY FOR A GENERAL
REPAIR MODEL 80
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SO
4.2 Model Description and the Slain Result . . . . . . . . . . . . . . 53
4.3 Problem Formulation and .Analysis . . . . . . . . . . . . . . . . 88
4.4 Dynamic Programming Approach . . . . . . . . . . . . . . . . . 92
4.5 Proof of the Theorem . . . . . . . . . . . . . . . . . . . . . . . . 102
4.6 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.7 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5 OPTIMALITY O F LEVEL-CROSSING POLICY FOR A CBM
MODEL 114
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.2 Problem Formulation and Existence of the Optimal Policy . . . 117
. . . . . . . . . . . . . . . . . . . 5.3 Optimal Control-Limit Policy 122
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Conclusions 130
6 A CBM FRAMEWORK BASED ON HIDDEN MARKOV
MODELS 132
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction 132
. . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Literature Survey 131
. . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Model Description 112
. . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Problem Reduction 1-45
. . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Optimal Policy 149
7 SUMMARY AND FUTURE RESEARCH DIRECTIONS 160
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Summary 160
. . . . . . . . . . . . . . . . . . . . . 7.2 Future Research Directions 164
Bibliography
List of Tables
4.1 Optimal control limits g(t ) and T ( t ) for different values of t . . . 110
3 . 1 Optimal control limits for different values of n. . . . . . . . . . . 129
vii
List of Figures
2.1 Optimal age replacement policy . . . . . . . . . . . . . . . . . . . 36
. . . . . . . . . . . . . . . . . . . . . . 3.1 Optimal repair cost limits 73
4.1 Sample path of the failure rate and the repair cost . . . . . . . . 8.5
-1.2 Optimal policy for a minimal repair model . . . . . . . . . . . . . 109
4.3 Optimal repair/replacernent policy . . . . . . . . . . . . . . . . . 111
6.1 Optimal values 1, . . . . . . . . . . . . . . . . . . . . . . . . . . 1.54
. . 6.2 Optimal preventive replacement policy . . . . . . . . . . . . . . . 1m
viii
Chapter 1
INTRODUCTION
1.1 Overview
Maintenance is now a significant activity in industrial practice. lccorcling
to Halasz e t a1 (1999) on the 1996 costs of maintenance across 11 Canadian
industry sectors. "in addition to every dollar spent on new machinery. an
additional 58 cents is spent on maintaining existing equipment. This amounts
to repair costs of approsimately $15 billion per year". .As a consequence. the
importance of maintenance optimization becomes obvious.
Essentially. the problem of maintenance optimization can be described as
follows. Consider a system that is prone to failure. Instead of running the
system to failure. one can arrange preventive replacement a t high risk situa-
tions to avoid costly failure. Also. one may have the opportunity at the failure
epoch to decide whether to repair the system or to replace it by a new one.
The objective is to optimize the system performance based on a given crite-
rion. such as average cost. discounted cost, or total net profit criterion. The
two fundamental questions are: when to carry out a replacement and how well
the system performs?
To obtain concrete mathematical models. one needs to specify all the con-
tents in the above conceptual model. such as the deterioration dynamics, the
cost structure, information level. available maintenance options. etc. !dore-
over. proper interpretation of the model is required for real life applications.
For instance. we may use "cost" to represent the economical expenses. or the
duration of down time (which leads to availability analysis); use "age' to repre-
sent the operating time. the mileage. or the number of takeoffs for airplane. to
adapt to the specific real situations. For models with random "observations".
the observations can be either raw data. or the preprocessed information from
the raw data. From this standpoint. we see that the modeling aspect is insep-
arable from the optimization aspect of the maintenance research and practice.
As an academic subject. the research on reliability and maintenance crosses
multi-disciplines. such as operat ions research. applied probability. statistics.
engineering and management science. Originated in mid-40s: this field has been
undergoing explosive growth. Hundreds of models and policies have appeared
in the literature annually in recent years, and they are distributed among
mathematical, engineering and management science journals. According to
a survey conducted by densen (1996) based on MATH DATAB.ASE of STY.
from 1972 to 1994, the number of publications with keyword "Reliability" is
$3521 and in addition. 1909 papers have keywords "Maintenance" or "Repair".
These papers account for about 0.8% of all mathematical publications which
are related to reliability and maintenance. This shows the importance of this
field and in the meantime. the difficulty of providing a complete overview on
the subject.
Several intensive surveys can be found in the journal of Naval Research Lo-
gistics Quarterly. where Pieskalla and Voelker (1976) has 259 references. Sherif
and Smith (1981) has an extensive bibliography of 52.1 references. and Valdez-
Flores and Feldman (1989) has 129 references. Certainly. it is getting harder
and harder to grasp this huge and growing field. Attempting to summarize
this field with several universal optimization models is definitely infeasible.
A more appropriate way to review this field is to study the core mathe-
matical and modeling methodologies. to investigate typical models from each
methodology domain. and to develop one's own models and methodologies
that meet the needs from the wide range of real world applications. It turns
out that the core mathematical methods in this field are much more compact
than the concrete models. In the next section, we will focus on those core
met ho du logies .
1.2 Core Methodologies
Generally speaking, there are three major categories of approaches that are
widely used in maintenance - age-based approach. Uarkovian approach. and
optimal stopping approach. We will review each of them in the sequel. and
it will be clear by the end of this review that proper integration of these
approaches is beneficial.
Age-based approach
Age based maintenance models are the most classical ones. rooted back
into the origin of this subject. The basic idea is to describe the system's
deterioration by a single index. the age. This quantity possesses some nice
analytical properties - deterministic. one-dimensionalt and monotone. which
make the analysis of this kind of models elementary. A general procedure of
this approach is the following:
Step 1. Propose a class of maintenance policies. with one or several colltrol
variables. Normally, the control variables are F he prc: a t ive replacement
time, number of repairs before failure replacement. etc.
Step 2. Explicitly derive the objective function. for example. the average cost.
as the function of given control variables.
Step 3. Find the optimal solution by using one or multi-dimensional opti-
mization schemes in the framework of ca1cuIus.
This approach is the most direct one to find a maintenance policy. and it is
accessible to researchers and practitioners with various background. Another
advantage of this approach is that it is very intuitive. Many fundamental
concepts. such as failure rate. minimal repair and replacement. etc. are defined
in this framework. .\.lore sophisticated concepts can also be built on top of
this. which makes it the most popular framework in the whole maintenance
area. Yet. a severe drawback of this approach exists. -4s there is no rigorous
justification to Step 1 on the optimality of the proposed policy classes. this
approach results in a huge number of policies that are neither optimal nor
provide much insight into the field.
One direction to extend classical age-based models is to introduce addi-
tional random factors. such as the random repair cost. to the models. Another
possibility is to generalize the concept of age itself to a new one - the vir-
tual age. Originated from Kijima and Sumita (1986) and Kijima (1989). the
concept of virtual age. together with the concept of repair degree, is used to
describe the effect of maintenance actions. The major difference between age
and virtual age is that the virtual age is no longer monotone. nor determinis-
tic. These two extensions of the model and the justification of optimality are
beyond the scope of the conventional age-based approach. Both Markovian
approach and optimal stopping approach are required to solve the problems
with the above extensions.
Markovian approach
Conceptually. the hlarkovian approach. which includes modeling and opti-
mization with hlarkov decision processes. is rather simple. The basic idea is
the following: the system deterioration is modeled by the state of a llarkov
process. where the state at the next **time period" depends only on its present
state. Consequently. one needs only to take into ilccount the system's present
state to make the best maintenance decision.
Nowadays. Markovian approach is completely mature. and its optimization
procedures. such as value iteration. policy iteration techniques become well-
known routines to all researchers. In the meantime. with full discrete/continuous
time horizon/state space/action sets combinations. the hlarkov approach have
tremendous modeling power. Further extensions of Markov models that in-
clude hidden blarkov models. semi-Markov and Markov renewal models. fur-
ther increase the flexibility of Markov approach. In fact, all those aforemen-
t ioned extensions can be transformed equivalently to standard Markov models
with larger state spaces.
From the application point of view. the modeling power of Narkovian ap-
proach has limits only in the sense of computational complexity and rnod-
eling inefficiency, instead of theoretical restrictions. Therefore, a successful
application of Markovian approach must involves careful modeling. and care-
ful computationai procedure design steps among others. including optimization
techniques and statistical issues. Several complementary techniques. such as
Generalized Stochastic Petri Yetworks, and Dynamic Decision Tree Analysis.
etc. have been developed from different application domains. such as cornputcr
science. and artificial intelligence to improve the efficiency of the representa-
tion for certain hlarkov models. and to at tack the computational issues within
Markovian framework.
From the above discussion, we see two major advantages of the hlarkovian
approach, the flexibility with respect to its modeling power. and the maturity
and simplicity with respect to the decision making procedure. On the other
hand. some drawbacks coesist with these advantages.
We have mentioned that the optimization procedure is now very mature.
and this fact leads some people to be satisfied with merely building a Markov
model and deriving the dynamic equation. This may prevent the researchers
from gaining more insights into finer structures. An example of this kind can
be found in Chapters 2 and 3, where by incorporating ideas from the optimal
stopping approach, we eventually derived a differential equation instead of the
dynamic equation of the integral equation form. This improvement not only
simplifies the computational task. but also uncovers the simple and intuitive
interpretation behind the optimal policy. Several attempts based on the stan-
dard Markovian approach have been reported in the literature which have not
achieved the complete solution.
hloreover. as the decision-making is based on present information instead of
the whole history. it is difficult to conduct policy comparison between different
models. For more details on policy comparison between different models which
might use different kind of information. see Jiang and Cheng (1995).
.An alternative way to attack these two problems is through the optimal
stopping approach. which has fundamentally different point of view with the
Slarkovian approach.
Optimal stopping approach
Instead of utilizing only the information about the current system state
as in the blarkovian approach. the optimal stopping approach tries to use
full information From the past up to current decision epoch. This approach
is heavily based on the general theory of stochastic processes, especially on
martingale theory. Many sophisticated mathematical objects and techniques
are utilized, by which intuitive concepts are defined and treated in a rigorous
and systematic manner.
The first important concept is the .;filtration''. which is basically the grow-
ing database that holds information about the system up to present time.
The second important concept is the "stopping time". which is a random
time that is completely determined by the available information up to present
time. In another words. for any given time epoch. whether this "stopping
time" has happened or not depends only on our knowledge about the history
of the system.
It is obvious to see the natural connection between stopping time and
replacement. The class of stopping times represents the wide range of mainte-
nance policies that are associated with the available information. The optimal
stopping problem is therefore to find the optimal policy among the whole stop-
ping time policy class for a given objective criterion.
An attractive advantage of optimal stopping approach is that for given fil-
t ration? i.e.. the information level. the optimality is among all stopping times
belonging to this filtration. including both SIarkovian. and non-Slarkovian
policies. Therefore. this op timalitp surpasses the optimality among merely
hlarkovian policies. An implicit advantage is that certain policy comparison
can be conducted by comparing the models with different information lev-
els. Obviously, higher information level means large stopping time class. and
therefore better optimal policy.
Compared with the SIarkovian approach regarding modeling power. they
are complementary to each other. On one hand. the formulation of optimal
stopping approach is not restricted to Slarkov assumption. On the other hand.
its action sets are always binary - stop or not. which is much more restrictive
than general Uarkov models.
The major limitation of optimal stopping approach is that. while its for-
mulation is general. the computational procetlures will still involve some spe-
cific assumptions, such as a certain kind of SIarkov property. or rnonotonicity.
or both. In fact, all computable non-4Iarkovian optimal stopping rules ob-
tained in the literature possess very strong monotonicity condition. .is in
many practical situations, those strong monotonicity assumptions are not re-
alistic. Markov assumption becomes a requirement for numerical computation.
Therefore, proper integration of hlarkov approach and optimal stopping ap-
proach has the potential to estract considerable value out of both.
1.3 Thesis Outline
-4s it has been discussed in the previous sections, the diversity and complexity
of maintenance problems implies the infeasibility of fitting all practical situa-
tions into one universal model. The alternative way we take here in this thesis
is to jointly utilize the core methodologies in order to establish and gradually
expand our maintenance model pool in a systematic manner.
From Chapter 2 to Chapter 6. w will develop one model in each c h a p
ter. Each model solves one particular problem of practical interest. and in the
same time. t p s to represent a general treatment for solving a range of sim-
ilar problems. Certain repetition in modeling and the treatment are kept to
stress the self-containedness of each chapter and the logical connection among
them. Instead of pursuing masimal generality in each model. we emphasize
the methodology and the procedures that are common to all of the models.
Much more variants and extensions can then be naturally developed along the
same direction.
We summarize the main results in this thesis as follows.
Chapter 2
In this chapter, we solve the repair/ replacement problem for a single unit
.,. system with random repair cost, which is proposed as an "important' and
"complicated open problem in Beichelt (1993). When the unit fails. the re-
pair cost is observed and a decision is made whether to replace the unit or
repair it. We assume that the repair is minimal. i.e.. the unit is restorsd to
its functioning condition just prior to failure. without changing its age. We
formulate this age-based model as a discrete time optimal stopping problem.
establish the existence of the optimal policy. and show that the optimal pol-
icy is a "repair-cost-limi t" policy. that is. there is a series of repair-cost-limi t
functions g, ( t ) , n = I. 2. ... such that the unit of age t is replaced at the n-th
failure if and only if the repair cost C(n. t ) 2 g,,(t): otherwise it is minimally
repaired. If the repair cost does not depend on n. then there is a single re-
pair cost limit function g ( t ) . which is uniquely determined by a first-order
differential equation with a boundary condition.
Chapter 3
We extend the previous model by incorporating the preventive replacement
optimization with the original repair/ replacement problem. The resulting
problem becomes a combination of tivo optimal stopping problems of different
nature. one is of discrete time. and the other is of continuous time. We first
develop a general result for characterizing the stopping times of jump processes.
This characterization, together with other mathematical tools such as semi-
martingale decomposition and ,\-maximization technique, enables us to solve
the two optimal stopping time problems sequentiallv without loss of optimality.
Again, we establish the existence of the optimal policy. and show that the
optimal policy is an age preventive replacement. repair-cost-limit policy. The
optimal preventive replacement time and the repair cost limits can be obtained
by solving the same system of ordinary differential equations as that in Chapter
2. with different boundary conditions. A very intuitive interpretation of the
optimal policy is obtained based on the concept of "residual value!'. Both the
average and the discounted cost criteria are treated with this approach. An
algorithm for fincling the optimal policy is presented for the average cost case
and a numerical example is given to illustrate the algorithm.
Chapter 4
The age-based models considered in the previous two chapters are further
estended to a virtual age-based model by generalizing the minimal repair to
a general repair. We use Kijima type I general repair model, and the anal-
ysis is valid for Kijima Type I1 model among others. \Ve assume that the
repair degree that affects the virtual age of the system is a random function
of the repair cost and the virtual age at failure time. The objective is to find
the optimal maintenance policy that minimizes the long-run expected average
cost per unit time. With a novel formulation of the problem as a continu-
ous time Markov model, we then apply the optimization procedure developed
in Chapter 3 to solve this problem. While more arguments on monotonicity
issues are involved, we are able to show that a generalized repair-cost-limit
policy is optimal. and the preventive replacement time depends on both the
virtual age of the system and on the Length of the operating time since the
last repair. Computational procedures for finding the optimal average cost.
the optimal repair cost limit function. and the optimal preventive replacement
time function (with respect to the virtual age) are developed. This model in-
cludes many well-known models as special cases and the approach provides a
unified treatment for a wide class of (virtual) age-based maintenance models.
Chapter 5
While various condition monitoring techniques have been widely deployed
in practice. there are still relatively few mathematical models capable of fully
utilizing the available information for optimal maintenance decision making.
A very intuitive scheme. called the "level-crossing' policy. is commonly used in
the maintenance practice. Based on this policy. the system is preventively re-
placed as soon as a certain performance parameter reaches a prespecified level.
While this scheme is practically plausible, its optimality is to be justified.
and the control-limit is to be optimized as well. In this chapter. we propose
a simple conditional-based maintenance (CBM) model to address these two
problems. The monitored signal process is a one-dimensional 4Iarkov process
over discrete time horizon. It represents onlv partial information because the
failure occurrence does not completely correspond to particular signal level.
The objective is to find the preventive replacement policy that maximizes the
total expected profit during the lifetime of the system. We formulate this prob-
lem as an optimal stopping problem. and show that under weak monotonicity
assumptions on the signal process. the optimal policy is a level-crossing policy.
We also develop an algorithm for finding the control limit for an €-optimal
policy.
Chapter 6
In this chapter. we first provide a summarized literature survey in the
condition-based maintenance area. and then propose a comprehensive CBM
model to represent a general framework for CBhI optimization. The principal
approach is optimal stopping for hidden hlarkov models. In this model. the
system state is driven by an unobservable hfarkov process. which is defined on
a continuous time horizon. The obsemations are described by another random
process which is defined at discrete epochs L. 2L. .... nL. .... and conditionally
depends on the hidden system state. By applying the results from general the-
ory of stochastic processes, we reduce the original problem significantly without
loss of optimality. and derive the dynamic equation for obtaining the optimal
value function and the optimal policy. Furthermore. we prove the optimality of
the discrete policy when the observation interval L is small enough, i.e.. opti-
mal preventive replacement is performed only at the observation epoch. While
the computational issue is not the major concern of this chapter. we provide
a simple algorithm based on fixed point theorem. and we solve a concrete
esarnple as an illustration.
Chapter 7
In this final chapter. we summarize the modeling framework and the opti-
mization procedures that are common in the models presented in the thesis.
Some subjective thoughts are then presented. and more research on the on
the hidden Markov model framework for CBM optimization is advocated. By
the end of the thesis. we indicate some theoretical problems and practical
applications along this direction for future research.
Chapter 2
OPTIMALITY OF
REPAIR-COST-LIMIT
POLICIES
2.1 Introduction
We consider a maintenance model where the failed unit can be restored by a
minimal repair or a replacement. The minimal repair cost C(n. t ) is a random
variable which depends on n and t . where n is the number of failures since the
last replacement and t is the age of the unit at the n- th failure.
This model has been proposed but unsolved in Beichelt (1993). where it
was used as a unified treatment of many well known models as the special
cases. For research related to unified treatment of maintenance models. see
also hIakis and Jardine (1992) and Jiang and Cheng (1995).
The repair-cost-limit policy has been linked to the random repair cost mod-
els from the very beginning (see e.g. Hastings (1969)). Several classes of the
repair-cost-limit functions have been studied in the literature. e.g. the constant
cost limit (e-g. Cleroux et al. (1979)) or a class of parameterized decreasing
functions (see. e.g. Berg et al. (1986), Park (1953. 19%)). Within these
particular classes. the optimal repair-cost-limit function has been found under
additional assumptions on the distribution of the repair cost. Obviously. these
approaches do not lead to the optimal policy. In the discrete case (the tleci-
sions are made at times t = 1.2. ...). the optimality results have been obtained
by Hastings (1969) and later extended by White (1989) for the expected av-
erage cost case and for the discounted cost case on the infinite horizon using
dynamic programming. The repair cost limit policy is easily implementable
and it is frequently used in practice (see e.g. Hastings (1969). Drinkwater and
Hastings (1967) and Beichelt (1993)).
We formulate this continuous time maintenance problem in the framework
of the optimal stopping theory in discrete time. First. we establish the exis-
tence of the optimal policy and then prove the optimality of the repair-cost-
limit policy. We will show that in a special case. where the random repair
cost C(n. t ) does not depend on n. the repair-cost-limit function is uniquely
determined by a first-order differential equation with a boundary condition.
The results obtained for the deterministic cost function C(n. t ) agree with the
results obtained by 4lakis and Jardine (1992b).
2.2 Background and Model Description
The following assumptions will be made throughout this chapter.
I . The failure rate h ( t ) of the unit is a nondecreilsing function of t . and
h ( t ) < x for all t .
2. Two kinds of maintenance actions. -4 , and ;I,, are considered. where -4,
and A, denote the minimal repair and replacement. respectively. A11
maintenance actions take negligible time.
3. The costs of the maintenance actions -4, and .-Ir are denoted by C(rr. t ) and
Cr . respectively. where Cr is a constant and C(n. t ) is a random variable:
n is the number of minimal repairs since the last replacement and t is the
age of the unit at the n-th failure. We assume that Cr includes a failure
loss Cf: C, 2 C1 > 0. and C(n. t ) is the sum of a repair cost C,(n. t ) 2 0
and the failure loss C1. Next. we assume that for each t 2 s > 0. and
19
for each i 2 j 2 1. Cm(it t ) 2 C,(j, s) stochastically. Finally. the repair
costs are mutually independent and observable at failure times.
4. The objective is to minimize the expected average cost per unit time over
an infinite time horizon.
We will formulate the problem in the framework of the optimal stopping
theory. First. we will summarize the optimal stopping results that mill be used
in this paper (see Chow et al. (1971)).
Let (9.3. P) be a probability space and IFnF,. n E .L;} be a right continuous
arrd complete filtration. Let 1- = {I,, n E .V+) be a scqience of random
variables such that (1,) is adapted to (a). h stopping time r is a random
variable r: R -t A L u {+a) such that {T = n} E 7, for all n E .V,. and we
define
Then. the optimal stopping problem is formulated as follows:
find a stopping time r'. if it exists, such that
El;- = sup,, (El;).
where D' = { r : r is a (Fn) - stopping time. and El,; exists }.
We will introduce the following notation
= ( r : r is a (7,) - stopping time, r < co. Ek;- < m}
= { T : r is a (3,) - stopping time. El;- < m)
first i 2 n such that 1; = 7,
x if no such i esists.
Remark 2.1. As a consequence of Theorem 4.7. p.81 in Chow et a/. (1971) .
V ( D ) = V(g). Obviously. also LF(D') = C'(D).
Theorem 2.2. [Chow et al.. Theorem 4.5'. p.821 I j E(supIk+) < x. then a
is optimal in D. If Yn + -00. then a E D.
For the Markov case. the optimal stopping time has a more specific form.
Assume that 7, = B(k;. ... Y,). and for each n = 1.2. ... there is a measur-
able space (X,, Xn) and an 3 n - measurable random variable x, taking values
in S, such that 1, = pn(xn) For some Xn- measurable function p,(.). PVe say
that (x,, X,)? provides a Markov representation of the sequence {EL: Fn}y if
P{X ,+~ E B I X,) = P{X,+[ E B I x,), ( B E F,+,,n = 1.2 ,... ). In addition.
if .YI = ... = S, S. X l = ... = Xn X . and P(X ,+~ E B I x,} is For each
n and B E X a function on X which does not depend on n. then we have a
stationary !darkov representation of (1,. Fn };C.
For n = 0 , L 2. .... denote
( x ) = ess supDn E( I ; l t , = I )
Then. we have
Theorem 2.3. [Chow et al.. Theorem .52, p.1041 In the stationaqj i h r k o v
case. there i s a version of (7,) such that /or each n = 1 . 2 . ....
C,(X) = E ( y n I I I in = x ) ?
L(4 = - / n W
Corollary 2.4. [Chow et al.. Remark. p.1051 if k, = ZFK: Bk(xt) + pn(xn):
denote
Then
and
Corollary 2.5. Using the same notation as in Corollary 2.4. if {B,(x,) } is
o sequence of mutually independent random uariubles. which depend o?dg on
state xn and on n. ,we have the same result as in Corollary 2.4.
Proof. Construct a new probability space (.V0 x S x R.N+ x X x B. P )
to substitute for (S. X. P"). Denote yk = ( k . x k . ~k(xt)). Let &-(yk) =
(0,O. 1)~; = ek(xk): & ( g n ) = qn((O. 1.0)~;) = qn(xn). Thus {gn/,,,VC x X x
n I - D)? forms a Slarkov representation of {I,.F,);O. where 1.; = 1,- Bk(!ll;) + - i j n ( ~ n ) +
Put IJ = (n. x. O,(x)). Then from Corollary 2.4 and the independence of
e n (~n))?
Also, we have
and
l if no such n exists.
Q.E.D.
We will also need the following result.
Lemma 2.6. [Chow et at.. Theorem 4.13. p.921 Assume that 8. &. 02. ... are
independent and identically distributed random uariables with EB = 0. Let
k, = x:=,O, and 3 > 0. If E ( ( B + ) ' + ~ ) < x. then V b > 0. E(supn(l, -
nb)*)P < cc.
Lemma 2.7. T h e following almost trivial result will also be useful later in
this Chapter. Assume that {k:. Fn): i = 1.2 are two stochastic sequences on
the same probability space and for each w and n l k',L(w) 5 Y:(w). Then:
2.3 Repair /Replacement Problem
We will use the A- maximization technique. see e.g. h e n and Bergman (1986)
to treat the following minimization problem:
Put
where X > 0 is a parameter. S, is the i-th failure time. and each failure is re-
moved by &. C(n, S,) is the n- th repair cost. Denote L i ( 0 ) = supcE1;(X).
Then.
If there is a X such that 1 3 0 ) = 0. then X = A * .
Denote 9, = (n . S,,C(n. Sn)). 71, = B(y,. i < n) . Then. P{!j,,I E B i
R,,) = P{y,+, E B ( 9,): thus (9,) forms a stationary hlarkov sequence. We
also denote 0 as 0~ to indicate its dependence on the parameter A. To prove
the esistence of the optimal stopping time, we only need to check the condition
in Theorem 2.2.
Denote A. = E C ( m . x ) / y ( m ) = EC(x,cc)h(cm)? XI = C,/?(O) and
X2 = XO A XI. where r ( t ) is the mean residual time to failure at age t. It is
easy to see that Xo and XI are the expected average costs under minimal repair
policy and failure replacement policy. respectively. Hence. the optimal cost A*
must be less than or equal to A2.
From Lemma 2.7. we can see that if the repair cost C(nl t ) is replaced
by C(n , t ) A C,, then the optimal value for the latter stopping problem is not
worse than for the former one. We will prove that in the latter case. the optimal
stopping time exists and it never prescribes a repair action when C(n . t ) > C,.
Hence. these two stopping problems have the same optimal solution and the
same optimal value. SIore details are provided in Lemma -1.3 in the Appendis.
Thus. we only need to consider the case C(n. t) 5 C,. From the definition of
C(n , t ) = CI + C,(n. t ) , we have
I t follows from the previous discussion. that we can restrict ourselves to the
case X 5 X o .
Lemma 3.1. If,\ < X 0 . then E(sup,Ii+) < xj. andI , i -x .
Proof. When X < X o . there is a finite real number S > 0. and an integer
N > 0. such that X$S) - EC(N. S) = -b < 0. Then.
Where w , = S, - 3,- 8, = A,-, -C(.V. S) + b. and 2. 2, are i. id. random variables
which have the residual life time distribution at age S. F s ( * ) . Hence. Ed, =
E ( k , - C(.V. S) + b ) = h(S) - EC(.V. S ) + 6 = 0. Obviously. E(0f) < x. so
from Lemma 2.6 with 3 = 1. E ( S U ~ , ( ~ ; - ~ ( X , - , - C(l. S) + h ) - f i b ) - ) < x.
Consequently. E ( s u p n l r ) < m. Obviously. 1, t -cx.
Thus. o~ E D is the optimal stopping time which maximizes El&\).
Q.E.D.
Furthermore. since we have a hlarkov representation for (2.3.2). we get
from Corollary 2.5 by putting
Denote
Thus.
Obviously. g,(t) is a deterministic function of n and t .
Hence. a,, has the form of a repair-cost-limit policy. From the monotonicity
of the failure rate h( t ) . the stochastic monotonicity and houndedness of repair
cost C(n. t ) . it is easy to see that g, ( t ) is monotonically decreasing in rr and t.
and gn(t) is continuous. .A proof is in Appendis (Lemma -4.3).
When the repair cost C(n. t ) is a deterministic function of n and t . then
the optimal policy in (2.3.6) has the following form: replace the unit as soon
as the n-th failure time exceeds t i . This model is a special case in Makis and
Jardine (1992) where the optimal policy is obtained by applying semi-hlarkov
decision processes.
?kt, we consider the case where the repair cost does not depend on n. i-e.,
C(n , t ) = C( t ) . In this case. a single function g(t) exists which is the optimal
cost-limit function. Now. from the definition of g , ( t ) . (2.3.5), we have
g ( t ) = C:\(t) - At + C,..
where i ; ( t ) is given by (2.3), i.e..
and Sk( t ) is the k-th failure time after t . When t = 0.
g(o) = 1;(0) + c,. (2.3.3)
Remark 3.2. It is easy to see that the repair-cost-limit ( g ( s ) . s > t ) is also
optimal in [t. x) period. Consider a repair/replacement problem for a unit
which has age t, i.e.. setting t to be 0 in the new system. Then the failure
rate is h ( s ) and the repair cost is C(s) at time s - t for every s 2 t. For this
problem. the optimal stopping time exists and it is a repair-cost-limit policy.
where g(s) is the repair-cost-limit at time s - t . Furthermore. its optimal value
I.;,, is equal to h(t) - At. where i-i(t) is given by (2.3.7). Then.
which gives a clear intuitive meaning to g ( t ) .
'29
In order to obtain the form of g ( t ) . define
T : the replacement time under the repair-cost-limit policy g(t)
G(t) : the distribution function of T
F ( t ) : the distribution function of the first failure time
4 t ) : the expected cost of a cycle initiated from age t
B ( t ) : the expected length of a cycle initiated From age t
Lsing the well known results on thinning of nonhomogeneous Poisson pro-
cesses (see, e-g.. Block et d. (1985)). we have
Differentiating the above equation and taking into account (2.3.8). we have:
In this equation, X is a parameter, so that in order to determine g ( t ) . X must
be uniquely determined first. We will consider the following equation:
t
,\ = - g t ( t ) + h ( t ) J $ ~ ) I&(z)dz
g(0) = Cr (2.3.9)
g ( t ) E (0. C,). Vt > 0. \
Lemma 3.3. There is a unique A. /or which the solution to (2.3.9) exists.
This solution is unique.
A proof is in Appendix (Lemma A?).
Denote this unique X as X . . Recall that the optimal expected average cost
is A'.
Lemma 3.4. A. = A * . and o ~ . is the optimal stopping time for the original
problem (2.3.1).
Proof. Obviously. A' 5 Ao. Now. we will show that A. 5 Xo. W e only need
to consider the case Xo < x.
From (2.3.9).
From Theorem 2.2 and Lemma 3.1. for all X < Xo. the optimal stopping
time for {Y,, Hn);D exists in problem (2.3.2): and it is a,, E C.
Now. we examine the solution of Equation (2.3.9). Only two cases can
happen:
Case 1. There exists t > 0. such that R,(g(t)) < 1.
Case 2. For every t > 0. Rt(g( t ) ) = 1.
In case 1. it is easy to see that < m. o ~ . E C and A. < Xo. In fact.
A. = h ( x ) J$") Rm(u)du < h ( x ) E C ( x ) = X o . Thus. from Theorem 2.2 and
Lemma 3.1. ox. is the optimal stopping time for {k&\.). li,);". Therefore.
together with El,,. (X.) = \;.(0) = g(0) - Cr = 0. we obtain h. = A * . anti
hence. a~. is the optimal stopping time of the A-maximization problem (2.3.2)
and consequently. the optimal stopping time of the original problem (2.3.1 ).
t 2 O,gA,( t ) 2 gA,(oc) 2 ess s~p,,~C(t). Since ho is the expected average cost
under minimal repair policp and a~. = q, is the minimal repair policy. so a,,.
is again an optimal policy fur problem (2.3.1).
Q.E.D.
Furthermore. since
g l ( t ) = - A o + h ( t ) J o g " ' ~ t ( z ) d x = h( t )EC( t ) - h ( m ) E C ( m )
g(0) = c r .
then
= Cr - ( h ( m ) E C ( m ) - h(r)EC(r))z lk - J,' x d ( h ( x ) ~ C ( r ) ) .
Since J; rd (h (x ) E C ( r ) ) is increasing monotonically. it has limit as t + x. If
this limit is +m. then gAo ( x ) < ess supt,,C(t). Thus. we only need to consider
the finite limit case. Together with the boundedness and the monotonicity of
gA, ( t ) . ( h ( x ) E C ( x ) - h( t )EC( t ) ) t has a limit. and it must be zero. 0 t h -
envise, J,,(h(m) E C ( ~ ) - h ( r ) E C ( x ) )dx 2 5; (a/.r)dx + +x. Therefore.
g x o ( m ) 2 ess sup,,,C(t) C, - r d ( h ( x ) E C ( r ) ) 2 ess sup,,,C(t) e
zd (h (x )EC(z ) ) 5 C, - ess sop,,,C(t).
We will summarize the results in the following theorem.
Theorem 3.5. The optimal policy !or problern (2.3.1) exists. It is a repair-
cost-limit policy and the repazr-cost-limit function g,(t) becomes g( t ) when the
minimal repair cost C(n. t ) does not depend on n. This g ( t ) is uniquely deter-
mined by the following first-order differential equation
where X is the optimal expected average cost. The repair-cost -hi t policy is the
minimal repair policy if and only lf
'3t I/; i d ( h ( r ) E C ( r ) ) 5 C, - css sup,,,C(t).
Example. Assume h ( t ) = h > 0 and EC(t) = EC for all t 1 0.
Since 7Crd(h(r)EC(x)) = 0 5 C, - ess sup,,,C(t). the minimal repair
policy is optimal. In fact. we can also see that h ( t ) E C ( t ) = h(O)EC(O) for all
t g ( t ) = C, for all t .
If C( t ) is a deterministic, nondecreasing and continuous function of t . then
the optimal policy is the age replacement policy. i.e..
F = first n such tha t C(S.) 2 g(S,).
This is equivalent to
o = first n such that S, 3 t*. where t' = inf{t : C ( t ) = g ( t ) } .
Obviously7 C(t*) = g ( t * ) . From (2.3.9): when C ( t ) <_ g ( t ) .
x = -g'(t) + h ( t ) C ( t ) .
35
Figure '2.1: Optimal age replacement policy.
arid combined with g(0) = Cr.
When C ( t ) 2 g ( t ) .
x = - g l ( t ) + h ( t ) g ( t ) .
Since g ( w ) E (0. C,) we have
y ( t ) = A lX F ( r ) d z / F ( t ) .
Thus,
Therefore. from the above two expressions
(2.3.13)
for g ( t ' ) , (2.3.11) and (2.3.12).
This coincides with the results obtained by other methods. Therefore. we
have proved that. in the model with deterministic repair cost C(t) . the optimal
age replacement policy is optimal in the class of stopping times.
2.4 Appendix
In the following lemma. we restrict our attention to functions g ( t ) which satisfy
the first two equations in (2.3.9). We will also be using notation g,,(t) to
indicate its dependence on A.
Lemma A.1. If for all t > 0 g ( t ) E (0. C,), then it must be nonincreasing.
Proof. First. we prove that if there is t > 0 such that g l ( t ) > 0. then
g ( t ) -+ 02.
Consider the first to > 0 such that g t ( t 0 ) = 0. Then from the differential
equation and X > 0. we have that g ( t o ) > 0. It is easy to see that Vt >
to g r ( t ) 1 0. g " ( t ) 2 0. Hence. if there is t > t o such that g r ( t ) > 0. g ( t ) t x.
Therefore, if for all t > 0. g ( t ) E (0 . C,). then it must be nonincreasing.
Q.E.D.
Lemma A.2. There exists a unique A. fo r which the solution to (2.3.9) exzsts.
This solution is unique.
Proof.
(i) . Uniqueness.
From Lemma -1.1, if (2.3.9) is satisfied, then g(t) is nonincreasing. Consider
XI > X2 > 0 and for i = 1.2. denote gr,(t) as g i ( t ) . From (2.3.9).
0 t ) - Since (g2(t) - gl( t)) l = h( t ) J:;:~) R t ( x ) d x + ( A L - X 2 ) > 0. then ( p ( t ) -
gl ( t )) is increasing. Since K ( x ) decreases when x increases. we can see that
J+~?$) Rt (r)dx increases in t , and consequently.
( a ( t ) - 91 ( t ) 11 = h(t) / 9 2 ( t ) R~ ( + i ~ + [/\I - A?) 91 ( t )
increases. Then, (g,(t) - g , ( t ) ) 4 +x. Therefore. at most one X exists. such
that gA ( t ) is a solution to ('2.3.9). Finally. the uniqueness of the solution g,,. ( t )
for this fixed A, (
(ii) Existence
Denote
can be easily seen.
.\ = { A : gA(t) reaches 0 at some t > 0)
1' = { A : gA(t) reaches C, at some t > 0)
A ( A ) = inf{AlA A}
A(i\') = S U ~ { A I A E .it}.
First we show that neither ;\ nor :\' is empty. therefore above four definitions
are well defined.
Choose = C, + h(1) J? z l ( x ) d x . and denote the solution for this as
g ( t ) . Hence, when t E (0 .1) . g ( t ) < 4,. and g ( t ) reaches 0 a t some point
to E (01 1). Thus. X E .I.
Regarding to A'. if for all t 2 0. h ( t ) E C ( t ) = h(O)EC(O) = Ao. according
to Example 3.6. gx, ( t ) = C, for all t 2 0. hence. A. E .\I: othenvise. there is
0 < t o < m such that gA,(t()) > Cr. It is easy to see that when X increase to
X o . then gA(t) converges to gA, ( t ) in (0. to ] uniformly. Hence. there is a X I < Xo
such that g x l ( t o ) > 0. Since g ~ , ( 0 ) = C, and & ( 0 ) = XI - X o < 0. then. there
is a 0 < t l < to such that g,&) = Cr. Therefore. Al E .\I.
We have A(.\) = A(.\'). To see this. if A(.\) > A(.\ ' ) . then for every X E
(A(.\'). ,\(.I)). g A ( t ) E (0. C,) for every t > 0. This is a contradiction to (i). the
uniqueness of A.
Thus, we can construct two sequences {A,} and { A ; } such that { A , } mono-
tonically decreases to A(;\) and {Ah} monotonically increases to &\') = ,\(.\).
Obviously. {A,} c .\ and {Ah} c A'.
Yaw. we prove that A(.\) = X., i.e.. gx(,,(t) E (0. C,) for all t > 0.
Since for all X > 0. i > j. and t E [ O , X ] : 0 5 gA,(t) -gx , ( t ) 5 gx , (X ) -
gx, (N) , then { g A n ( t ) ) converges to a function. say g ( t ) , uniformly on (0. N].
39
From (2.3.9); gin ( t ) converges to a function: say w ( t ) . uniformly. It is easy
to see that w [ t ) = gl(t). Hence. A(.\) + gl(t) = h( t ) J:'~) E(2)d~. Finally.
g(t) = gx(.\) ( t ) . Also, gA(.\) ( t ) is the limit of gx; ( t ) Obviously, for this A(.\).
its solution gx(A)(t) E (0. C,) for all t > 0.
Q.E.D.
Lemma A.3. g,(t) monotonically decreases in n and t , is continuous in t and
0 5 g,,(t) 5 C, for all n 2 1.t 2 0.
Proof. Following results in stochastic order theory. we can construct a
probability space such that for each u and i 2 j : t 2 s. C( i . t ) (w) 3 C( j . s ) (u).
and S,( t ) - t 5 S , ( s ) - s. where S J t ) is the i-th failure time after t. So(t ) = t .
It is easy to see that for t > s. Si(t) 2 S,(s). We have
Then the monotonicity follows from
Also. it is easy to see that 0 < gi(t) 5 g , ( s ) 5 C,. Hence. following the
optimal policy. the system is never repaired if C(n. t ) > C,. Thus. we can
assume that C(n. t ) 5 C, without loss of generality.
The continuity of g, (t) follows From the following inequalities:
hAt + gn(t + At) 2 gn(t) > g,(t + At).
This completes the proof.
Summary of Notation
a+: maz{O. a )
a A 6: min{a, 6)
4 t ) : expected cost of a cycle initiated from age t
B( t ) : expected length of a cycle initiated from age t
41
C(n. t ) : repair cost of the nth failure at age t
Cf : failure lost
C,: failure replacement cost
F ( t ) : distribution function of the first failure time
g,(t): repair cost limit function for the n - th failure
g ( t ) : repair cost limit function if the repair cost is stocliastically independent
to the number of failures
G(t ) : distribution function of T
h( t ) : failure rate
R&): distribution function of repair cost a t age t
S( t ) : J$') rdRt(z)
S,: n-t h failure time
T: replacement time under the repair-cost-limit policy g ( t )
I,A(t): value function for the X maximization problem with initial age t .
( A ) : the objective function for the X maximization problem
A*: optimal average cost
cr(X): optimal stopping time for the X maximization problem
*/(t): mean residual time to failure at age t
A,: E C ( x . ~ ) / ^ ~ ( 0 3 )
Chapter 3
OPTIMAL PREVENTIVE
REPLACEMENT UNDER
MINIMAL REPAIR AND
RANDOM REPAIR COST
3.1 Introduction
This chapter is a natural extension of the previous chapter.
PVe consider a single unit repairable system subject to random failure. A
new unit is installed at time t = 0 and when the unit fails. the repair cost
44
is observed and a decision is made whether to replace (overhaul) the unit or
repair it. We assume that the repair is minimal, i.e.. the unit is restored
to its functioning state just prior to failure without changing its age. The
repair cost is a random variable that is a function of the unit's age and may
depend on the number of repairs since the last replacement. The unit can be
preventively replaced at any time prior to failure. The failure and preventive
replacement costs are assumed to be given constants. The objective is to find
the repair/replacement policy minimizing the long-run espected average cost
per unit time.
Although this model is formulated as a model of a single unit systems. it
can also be used as a suitable representation of a complex. multi-unit repairable
system. If only a small part of the system is repaired or replaced upon failure.
this would not affect considerably the failure rate and the minimal repair
assumption is acceptable. For such systems. the repair cost typically depends
on the age (or operating age) and may depend also on the number of failures
since the last replacement or overhaul of the system. Usually. the main concern
is the proper planning of a major (and costly) maintenance action such as an
overhaul or replacement of the system. which is the focus of this work.
.A similar model had been investigated in L'Ecuyer and Haurie (1987). Un-
der more restrictive assumptions. such as. the discounted factor in the objec-
tive criterion is bounded away from zero (therefore. the average cost criterion
is excluded), and the repair cost does not depend on the number of repairs,
etc. The structural result regarding the optimal policy is obtained by using
Markovian approach.
We will proceed our study of this model based on the optimal stopping
approach. The major advantages are listed below.
1. Wider policy class. The obtained optimal policy is now among the whole
stopping time class instead of the Markovian policy class only. Therefore.
a range of policy comparison results can be obtained by varying the
information level within the general framework.
2. More explicit form. We obtain the differential equation for computing the
optimal repair-cost-limit and the optimal preventive replacement time
which is much simpler than solving the integral equation derived from
hlarkovian approach.
3. More intuitive interpretation. The optimal policy can be expressed in terms
of "residual value" which is naturally introduced with the optimal stop-
ping approach.
The optimal stopping theory has been applied in preventive maintenance
mainly to analyze preventive replacement problems with quite general dete-
-46
riorating processes (e.g. Bergman (1978). hven (1983). Aven and Bergrnan
(1986) and Jensen (1989)).
The main mathematical difficulty involved in this work is that. we now
need to consider jointly two kinds of stopping problems: a repair/ replacement
problem at failure times. and a preventive replacement in continuous time.
Obviously, the former is a discrete time and the latter a continuous time stop-
ping problem. In addition. we no longer have the closed forms of the expected
cost in a cycle. 4 t ) . and the expected cycle length B ( t ) as in Chapter 2 when
the repair cost is assumed to depend on n. the number of repairs.
The chapter is organized as follows. In Section 3.2. we formulate the
problem in the optimal stopping framework. By developing a characteriza-
tion result for the stopping times of general jump processes. we reduce the
continuous time optimization problem to a discrete time stopping problem.
In Section 3.3. we prove the existence of the optimal policy and find its
form by applying A-maximization technique (see e.g. h e n and Bergman
(1986)) and semi-martingale decomposition approach (see e.g. densen ( 1989)).
The X -maximization technique transforms the original fractional optimization
problem to a parameterized (with parameter A) optimization problem with an
additive objective function for which the optimal stopping theory can be ap-
plied. It is also exactly this X that absorbs much of the complesity. resulting
in the simplification from the integral equation to the differential equation.
The semi-martingale decomposition approach further simplifies the objective
function by removing its martingale part without loss of optimality. The o p
timal policy has the following form: replace the unit a t the first failure time
when the repair cost exceeds certain limit or at age T , whichever occurs first.
In Section 3.4. we address the computational issues and present an algorithm
for finding an e-optimal policy. We will show that an e-optimal policy can
be obtained by solving a system of ordinary differential equations with hound-
ary conditions. A numerical example is given to illustrate the computational
procedure. In Section 3.5. the discounted cost case is solved in parallel to the
average cost case by applying the same optinlizat ion procedure. Iriterestingly.
the result coincides with that for the average cost criterion when the discount
fact or degenerates to zero.
3.2 Problem Formulation
We will make the following assumptions.
I. The time to failure of a unit is a generally distributed random variable
with distribution function F ( t ) , density f (t) and the failure rate h ( t ) =
f ( t ) / ( l - F ( t ) ) , which is a continuous and non-decreasing function of t .
2. At failure time. the unit can be either minimallv repaired a t cost C(n. t).
where n is the number of repairs since the last replacement and t is the
age of the unit, or replaced a t cost C,. The unit can be preventively
replaced a t any time prior to failure a t cost C,, < C,. We assume that
Cp and C, = Cp + Cf are given constants and {C(n, t)} are random
variables stochastically increasing in n and t . Further. we assume that
EC(n. t ) is continuous in t. and that C1 5 C(n. t ) 5 C, (without loss of
generality).
3. All maintenance actions take negligible time.
The objective is to find the repair/replacement policy minimizing the long-
run expected average cost per unit time.
Let t be the time of the first replacement if a new system is installed at
time zero. We define TC( t ) . the total cost incurred up to time t. as
where Si is the i-th failure time, So = 0. iV(t) is the number of failures before
t : X(t) = I i s t5 , ) . and I is the set indicator function.
Let (4) be the (completed) natural filtration of process {TC(t) . t 2 0).
For the average cost criterion, the maintenance decision problem can be
formulated as follows. Find an (Ft)-stopping time r*, if it esists, minimizing
the long-run expected average cost per unit time given by
E(TC(r ) 1 (3.3.2) Er
Using the methodology presented in the Lemmas 2.2-2.4. we will show
that this continuous time stopping problem can be reduced to a discrete time
stopping problem. The methodology is quite general and can be applied also
to other kinds of continuous time stopping problems for jump processes.
We will need the following result from the theory of jump processes.
Lemma 2.1. [Davis 1993. Theorem -4.2.3. p.2611 Let & = (S,.C(i. S,)). 2, =
a{<&, i 5 n ) Fsn and r be an (Ft ) -stopping time. Then there exist a
constant To and ?in-measurable junctions Tn /or n = 1 . 2 . .... such that
and for n = 1.2. ...
- j I{s,,<r <sn+l} = (Tn A S-1 ) ' { ~ ~ < r ~ s ~ + l } :
where a A b = min{a, b). Note that T, is a function of ([I? .... Cn).
Lemma 2.2. For any (Ft)-stopping time r and {T,}ofCF defined in Lemma
2.1. we have for n > 1 ,
Proof. We use mathematical induct ion.
For n = 1, we want to prove that T 1 Sl To 2 Sl. We have from
Lemma 2.1,
~ I { , < s , ) = (To A SI)I{,<S,) -
Define T' = r A S1. Then T' is also an (Ft)-stopping time. and frorn Lemma
1. there exists a constant TA such that r' = Ti A SL (because r' 5 SI ).
Hence. T' = S1 TA > Sl. But r' = S, if and only if r 2 Sl. so that
r 2 Sl * TA 2 Sl. We will prove that Ti = To. For T < St. To = T = r' = TA
and (3 .23) holds for n = 1.
Now assume that (3.2.5) is true for n 2 1. We want to prove that r 2
Sn+1 * T, 2 Si+1 for i 5 n.
Obviously. r 3 S,,l * r 2 Sn: and we have frorn the induction as-
sumption that 2 S,+l for i < n - 1. Thus. it suffices to prove that T 2
S,+l * Tn 2 Define T' = rASnL1. Then from Lemma 1. T' = T;AS,+~.
From the definition of T'. r 2 Snil T' = Sntl * T'A 2 S,+,. But for
Sn < T < Sn+I, we have from Lemma 1 that T, = r = T' = TA, and the result
follows? since T, and TA are 'fl, -measurable.
Q.E.D.
Lemma 2.3. Any (FJ-stopping time T has a representation
where a is an (31,)-stopping time, To is a constant and Tn is ?in-measurable for
n 2 1. We define 1. Conversely, for any (R,) -stopping time o. constant
To 2 0 and X,, -measurable functions T, 2 S,, n = l , 2 , 3 .... r(o. {T,)) defined
by (3.2.6) is an (q)-stopping time.
ProoJ For given (F t ) -stopping time T . choose {T,):" as in Lemma 2.1.
and define
n 2 1 l j { T = Sn}
+m otherwise.
Since {a = T I ) E U,.o is an (U,)-stopping time. We will show tha t the
right-hand side of (3.2.6) is equal to r.
First. assume that a = +m. This implies that T # Sn for n = 1.2. .... so
tha t 0 5 r < S1, or S,,, < T < for some rn 2 1. or r = +m.
If S, < r < for some rn >_ 1. then from Lemma 2.2, T, 2 Siil for
i 5 rn - 1 and from Lemma 2.1. r = T, < S,+l: so tha t
Similarly for 0 5 r < Sl.
If r = ca. then from Lemma 2.2. Ti 2 Sitl for all i and
Sow. assume that a = n for some n 2 1. Then T = S, and from Lemma
2.2, T, 2 Si+1 for i 5 n - 1. We have,
Conversely, Let T(O. { T i ) ) be defined by (3.2.6) for given (H,) -stopping
time o. constant To 2 0 and H, -measurable functions Tn > S, for n = 1. '1. ...
Define
First, we will show that
For anys, E R. there exist 1 5 n 5 +m. such that a = n and rn. 0 5 m 5
+m. such that Ti 2 S,+l for O 5 i < m and > T,.
Hence,
and T, A T* = Tm A S,. On the other hand,
We will show that rl and 72 are (3,)-stopping times.
We have for any t 2 0.
Since Sn is an (Ft)-stopping time. and {o = n } E FSn, by the definition of
so that rl is an (3,)-stopping time.
Nest. we will show that F~ is an (3,)-stopping time.
From the definition.
Since
we have that
{Ti 2 Si+l for 0 5 2 5 m - l.Tm < Sm+l} E F T , ,
and
{T, 2 Si+l for 0 < i < m - 1, T, < Sm+l} n {T, 5 t } E Ft,
so that {Q 5 t } E Ft.
Since rl and rz are (Ft)-stopping times, r(07 {Ti ) ) = rl A 7 2 is also an
(Ft ) -stopping time.
Q.E.D.
Lemma 2.4. Let r be an (Ft)-stopping time. (0. { T , ) ) be a representation of
r and let TC(r ) be the total cost incurred u p to time T . Then.
where C(0,O) = C,.
Proof. For given w E R. we have the following four possibilities: 0 5 r <
SI. S,,, < r < S,,, for some m 2 1. r = S, for some n 2 1. or r = +m.
First, assume that there exists rn 2 1 such that S, < r < S,+ Then.
N(T) = m. r # S,V(,l and from (1).
On the other hand, we have from Lemmas 2.1-2.3 that a 2 m + 1. T = T, <
Sm+l and Ti 2 St + 1 for i < m - 1. so that
Similarly lor 0 5 s < SI.
If there is n 2 1 such that r = S,, then X ( r ) = n. r = S,vc,l, and from
n-L n-l
T C ( r ) = C, + C C(i . S,) + Cf = C C(i. S i ) -
It follows from the proof of Lemma 2.3 that in this case. either o = n and
T, 2 Si+l for i 5 n - 1. or o > n. T, 2 Si,, for i 5 n - 1 and T,, = S,,. In
both cases, we have
n-1
T C ( q (T,}) = C(i. S,) = TC(r) . i=O
Finally. if T = xt we have from (3.2.1) that
X X)
TC(r) = C, + C C(i. S,) + GI = C C(i . S i ) = S . i= t r=O
In this case, a = +w, Ti >_ Si+' for i = 0.1.2, ... and
Q.E.D.
Lemmas 2.3 and 2.4 provide the characterizations of any (6) -stopping
time : and the corresponding total cost TC(T) in terms of 0 and {Ti}:r.
This representation of r corresponds to a decomposition of a replacement plan
into competing failure and preventive replacement schedules. We have seen
this explicitly in (3.2.9). where r ( q {Ti}) was written as TI Q, TI represents
C ~ I ures. replacement at failure times and 71 preventive replacement between I. -1
This representation together with the result in Lemma 2.4 leads to the following
equivalent formulation of the continuous time stopping problem in (3.2.2).
Find
( E(TC(o9 (Ti ) 1 ) ) A * = inf inf {TI E(r(0 . { T t } ) )
and (d. ITz*}) minimizing E(TC(o. {Ti))) (if they exist). where the first infi- E b b * {TH)
mum in (3.2.11) is through (2.) -stopping times a and the second infimum is
through {Ti}, Ti is Xi-measurable. Ti 2 Si for i 2 1, To 2 0 is a constant.
The optimal stopping time ~ ( 0 ' . {Tta} ) and the minimum average cost A *
can be found by solving the following A-maximization problem. For A > 0.
find
where
A* = sup{X : C*(X) < 0) (3.2.14)
and (0'. {T;}) maximize the right-hand side of (3.2.12) for X = A * .
In the next section. we prove the existence of the optimal po
form and derive a system of differential equations that determines
policy.
licy? find its
the optimal
Optimal Policy
The maximization problem in (3.2.12) can be simplified by removing the mar-
tingale part from Ii (a. {Ti)) defined by (3.2.13) through conditioning (semi-
martingale decomposition. see e.g. Jensen (1989)). i-e.. we can consider the
following problem. For X > 0- find
where
= ZZiY-C(i. Si) + Cf + f . . ( A - h ( t ) C j ) G , (t)dt)Its,+r,} n;;b Its,ii
(3.3.2)
- Fs( t ) = P({ > tl< > s ) . ( is the time to failure of the unit. Then, A*
defined by (3.2.1 1 ) can be obtained from (3.2.14) and the optimal stopping
time r ( d . IT:}) for the problem in (3.2.11) is the stopping time maximizing
E(T&. { T , ) ) ) for X = A'.
Define for X > 0.
TA = inf{t : X <_ Crh( t )} .
Then, we have
The first inequality in (3.3.5) follows from the following result:
which holds for any Ti > Si7 and the third inequality in (3.3.5) follows from
Hence.
sup, E ( ~ A (0: {TA v S i ) ) )
2 s u p g ( s u p ~ ~ , , E ( ~ A ( U , (Ti}))) = C-(h)-
On the other hand, TAvS, is FS, -measurable for i 1 0, therefore. E(FA(a. {TAv
S i ) ) ) 5 V(X) for any ('fln)-stopping time 0. so that sup, E ( ~ A ( o . {T., v
S,))) = V(X) and (3.3.4) holds. To summarize, we have proved that for a given
X > 0, TA is the optimal preventive replacement time for problem (3.3.1).
To finish solving the maintenance decision problem. it is now sufficient to
find the optimal stopping time for the sequence {CC,(X)}, where
LG(X) r 1::; ( - C ( i Sj) + CI + ~2 ( A - h(t)CI)Fs, ( t ) d t ) ~ ~ ~ < ~ + , ) .
(3.3.6)
which is obviously a discrete time stopping problem.
We only need to consider the values of X < Xo = h( t )EC(n. t ) .
i-e.. A. is the average cost when the minimal repair policy is applied at failure
times and no preventive replacement is planned. I t follows from Theorem 4.5
in Chow et a1.(1991), p.82. that for X < Xo the optimal stopping time a,~
maximizing EI.C,(X) exists. From Remark on page 105 in the same book.
has the following form:
Denote for n 2 OF
In particular. for n = 0. t = 0 and X = A' . where A' is defined by (B.2.l-L).
Then for S, 5 TA. the optimal stopping time o~ has the form
From (3.3.8)? we have for S, > TA,
and from (3.3.9). we get for Sn > TA1
It follows from (3.3.4) and (3.3.3) that for given X < XI].
so that ~ ( q , {TXvSi}) is the optimal stopping time maximizing E(E;(o. IT,})).
Hence. if we denote gn(t) - gn(X8. t) and T TA8. where A* is defined by
(3.2. l l) , the optimal repair/ replacement policy has the following form: the
unit is minimally repaired at the nth failure time Sn if and only if S,, <_ T and
gn(Sn) > C(nl S,), otherwise it is replaced. If the unit has not been replaced
at failure times before T. it is preventively replaced at time T.
Remark 3.1. Function gn(t) has the following interpretation. From (3.3.5)
and (3.3.9) we have for X = A'.
The last equality in (3.3.14) gives a clear intuitive meaning to g, ( t ) , that
can be explained as follows. Consider a repair/replacement problem for a unit
installed at age t 5 T. i.e.. we put t to be zero for this system. The failure
rate a t time s is then h l ( s ) = h(t + s) and the repair cost incurred a t the i th
failure time equal to s is C(n+ i. t + s ) . Soticing that C(0. So) - C'(O, Sb) = C,.
which is the installation cost of a new unit, we can assume that the installation
cost of this old unit is equal to zero. The last equality in (3.3.14) says that
y,(t) - Cj is the optimal value for the corresponding A-maximization problem
for the old unit. Hence. g,(t) - Cj can be interpreted as the residual value of
the unit.
.-\ccordingly. the optimal policy can be described as follows: the unit is
replaced at the n th failure time Sn if and only if the pure repair cost C(n. S,) -
Cf equals to or exceeds its residual value gn(Sn) - Cf. or a t the time when
its residual value reaches zero ( because gn(T) = Cf ). In addition. it is
obvious from the above analysis that the optimal policy for the operating unit
installed at age t has the same form as the optimal policy for the new unit. i.e..
gl(s) a gn+,(t + s ) is the optimal repair cost limit for the it h failure at time s
and a preventive replacement is scheduled to be carried out a t time T - t.
In the next theorem, we summarize the results concerning the form of the
optimal policy and derive a system of ordinary differential equations for finding
the optimal control limit functions and the optimal preventive replacement
time.
Theorem 3.2. Let T = TA- and gn(t) = g, ( A * , t ) for n 2 O? where A* is the
optimal average cost and TA and g n ( X , t ) are defined b y (3.3.3) and (9.3.9).
respectively. Then the optimal policp has the follo*wing form:
The unit is repaired at the n th failure t i n e Sn 5 T if and only if g,&(S,) >
C(n. Sn), otherwise it is replaced. If the unit has not been replaced at failure
times before T , it is preuentiuelg replaced ut time T .
The optimal control limit junctions {t~,(t). n 2 0) and the optimal average
cost A' satisfy the following sgstem of differential equations with b o v n d a q
conditions:
where Rn,,(u) = P(C(n, t ) 5 u ) ! and T is the optimal preventive replacement
t ime determined by the equation Clh(T) = A. If Cjh( t ) < X for all t , T = +CC
and the optimal policy is a repair-cost-limit policy.
Proof. From (3.3.4). we have for t 5 T by conditioning on Sn+I and
By differentiating (3.3.16). we can see that the optimal control functions
{g,,(t), n 2 0 ) and the optimal average cost A' satisfy the differential equa-
tions in (3.3.15). The boundary conditions are obtained from (3.2.14). (3.3.3).
(3.3.9) and (3.3.12).
Q.E.D.
In the next theorem. we investigate the case where the repair cost C(n. t )
does not depend on n.
Theorem 3.3. If C(n. t ) C ( t ) for all n. then g,(t) = g( t ) and the optimal
control Junction g ( t ) and the optimal average cost A' are uniquely determined
b y
where Rt(u) = P(C(t) > u ) .
Proof. If follows from (3.3.8) and (3.3.9) that if C(n. t ) = C ( t ) for all
n, g,(t) does not depend on n and we have from Theorem 3.2 that (X8,g(t))
satisfy (3.3.17) and (3.3.18).
To prove uniqueness, we use the standard arguments from the theory of
ordinary differential equations. namely the existence and uniqueness of local
and global solutions (e.g. Theorem 1.1 and Theorem 3.1 in Hartman(l96-l)).
Define A(X) = inf{t < ~7 : gA(t) = Cj. and g;(t) 5 0): and A =
supA{A(A)}. It is easy to see that {A(X)} is not empty and therefore 1 = x
or 1 < x. If A < x. then there is a sequence of decreasing positive real A,
such that g x n ( t ) = C j and g'Jt) 5 0. where An ---+ X and A(,\,) + A. CVe
will show that gA(A) = CI and g;(A) = 0. If g i (A) < 0. then we can find an
> 0 such that gA(A + e ) < CI. Since gx(t) is continuous in X . we can find a
X' < h such that A(XJ) > A. which contradicts the definition of A. Similarly.
we can prove gA(A) = Cf .
Hence, (A, gA(t)) is a solution to (3.3.17) and (3.3.18). Obviously. there is
no other X satisfying (3.3.17) and (3.3.18). From g(0) = C, and from (3.3.9).
we see that V(X) = 0. which implies that g(t) = gA(t) and T = A form the
optimal policy and X is the optimal average cost.
If 1 = m, then there is a sequence of decreasing positive real A, such
that gxn(t) = Cf and g i ( t ) 5 0, where A(A,) -+ x. Since A, t A.
gA, ( t ) + gA( t ) for all t > 0. Thus. gA( t ) E (CI. C,) for all t > 0. For the
case P{g(oo) < C(m)} > 0, it is clear from (3.3.17) that X < X o and the
optimal stopping time ES,, < oa. V(X) = 0 again implies that g ( t ) = g x ( t )
and T = co form the optimal policy and X is the optimal average cost. Hence.
in this case, the repair-cost-limit policy is optimal.
In case P{g(ca) < C(m)} = 0. we see from (3.3.17) that X = A,. which
is the average cost under the minimal repair policy. Hence, although OA = x
does not belong to the finite case. (3.3.17) still suggests the form of the optimal
policy. which is the minimal repair policy. The optimalitp of the minimal repair
policy can be seen from the fact that for each X < X o , I,'(,\) < 0. In fact. in
Chapter 2 (see also Jiang et a2.(1998)), we proved that this occurs if and only
00
rd (h ( r ) E C ( r ) ) 5 C, - ess sup,,,C(f).
which is a very rare case. Finally. we should emphasize that in this case. \ ( * (Ao)
is not necessarily equal to zero.
In general. it is difficult to solve (3.3.15) because the number of the equa-
tions is infinite. In the i l e ~ t section. we develop a computational procedure for
finding an coptimal policy.
3.4 Computational Algorithm
Lemma 4.1. Assume that the unit can be repaired at most 8' - 1 times for
some N 2 1. Then, the optimal repair/ replacement policy has the followiny
form:
the unit 2s repaired at the n th failure t ime S, 5 T. n < N. if and only if
gn(Sn) > C(n, S,), otherwise it is replaced. If the unit has not been replaced at
failure times Sn < T. n < :V. it is replaced at the N t h failure t ime Siv or- at T .
uthichever occurs first. The optimal control functions g,(t). O 4 n 5 .V - I and
the optimal average cost X.V are uniquelg determined bg the following system
of equations:
If C/h( t ) < X for all t , T = +m.
Proof. Csing the same approach as in Sections 2 and 3, one can show that
the optimal stopping time r has the following form:
Sn, if C(n , Sn) 2 gn(Sn), n < N
Sx. if C(n.S,,)<g,(S,), f m a l l n < . V
T. if no replacement occurred be f me T.
and (AN. {gn(t). 0 5 n 5 N ) ) satisfy (3.4.1). For given X > 0. the unique
solution to the differential equations is obtained from (3.3.16).
Observe that g t ( 0 ) is a continuous strictly increasing function of A, g:(0) =
Cf and lirnA,, g$ (0) = x. Hence, there is a unique satisfying &" (0) =
Cry so that (3.1.1) determines the optimal control functions and the optimal
average cost uniquely.
Q.E.D.
Theorem 4.2. For any c > 0 there ezists :V E :V(e). such that the solution
to (3.4.1) determines an c-optimal policy, i.e., 0 5 A N - A' < c , where X' is
the optimal average cost.
Proof. From Theorem 3.2. the optimal stopping time r is either equal to
+x (the minimal repair policy with no planned replacement is optimal). or
Er < +w.
i ) If r = x7 the optimal average cost Xo = lim,,t,o h( t )EC(n . t) . and.
obviously. the following policy:
repair the unit iV - 1 times and replace at the X t h failure time SN is
c-optimal when N is large enough.
ii) If Er < +m, then.
so that
Hence. for any e > 0. there exists X ( E ) , such that AN(,) - A' < c .
If we further assume that
then we can find an easily computable upper bound for .V(e).
We have,
where T = h - l ( X * / C I ) . P u t A' = Cr/ESL and T' = h - l ( X 1 / C I ) < -I-=. Then
TI 2 T and since &(t ) is nondecreasing in t. KN(T t ) 2 K x ( T ) . If we choose
Y* to be the smallest .V such that X'Kx(T1) 5 e? then An- - X' 5 f . and the
policy determined uniquely by (3.4.1) for :V = X* is an e-optimal policy.
Q.E.D.
Based on Theorem 4.1. and assumption ( 3 . 4 2 ) : which is satisfied in most
practical applications, we have the following algorithm for the computation of
the E-optimal policy.
The algorithm.
Step 0. Choose E > 0 and put X L = 0. Xu = Cr/ESI.
T' = h-'(Cr/(ES1CI)), :V' = rnin{iV 2 1 : X'KN(T1) 5 € 1 2 1 ,
&1 = ln(2Cr/(~ESl))/ln2. i = 1.
Step 1. Put X = ( A L + Xo)/2.
Step2. Put T = h-I (X/CI). (t) = CI.
Step 3. Find the solution {&t). .... &-,(t)} to (3.4.1).
Step 4. If i 2 M. or g$(0) = C,. go to step 5 .
Otherwise, put i = i + 1 and compare &0) and C,.
If gt(0) < C,. put X L = X and go to step 1.
ICgt(0) > C,: put X L r = X and go to step 1.
Step 5. Stop. {gt(t): i 5 :V'} and T determine an e-optimal policy and
X is the corresponding average cost.
I t follows from Lemma 2.2 that X l V - - A' 5 €12 and from the definition of
.\.I in step 0, IX - 1 5 €12 so that IX - X'I 5 c.
Remark 4.3. In case the repair cost C ( n , t ) does not depend on n. the
algorithm for the computation of an €-optimal policy remains the same except
Step 3, where we need to solve only one differential equation. which is easier.
To illustrate the computational algorithm, we consider the following nu-
merical example.
Example. Put e = 0.01: C, = 1: Cf = 1. h( t ) = t (Weibull hazard function
with the shape parameter a = 2) and let C(n, t ) = (n + 5)/6 + U : where U is
uniformly distributed on interval [O, 21.
I
Figure 3.1: Optimal repair cost limits.
Using the above algorithm and MATLAB. we obtained X* = 15. .\.1 =
9.3. X* = T = 2.97 and the optimal repair cost limit functions depicted in
Figure 3.1. The optimal repair cost limit functions { g , ( t ) ) determine the
optimal policy. When the ith failure occurs at time t < T. the observed repair
cost C(i. t) is compared with gi(t). If C(i , t) is less than gi(t), the unit is
repaired, otherwise it is replaced by a new unit. If no replacement has been
carried out before T ! the unit is preventively replaced at time T . This plot
also provides useful information regarding the residual value of the unit. If a
unit that has age t has been repaired n times. then its residual value is equal
to g,(t) - C,. In particular. a new unit has residual value go( t ) = C, - Cf.
which is equal to the preventive replacement cost C,.
3.5 Optimal Policy in the Discounted Cost Case
In this section. we examine the structure of the optimal policy minimizing
the total expected discounted cost associated with the maintenance model
described in Section 3.3.
Let n > 0 be the discount factor and TC&) be the total cost of the i-th
replacement cycle.
Consider the repairfreplacement policy determined by a sequence of stop-
ping times T = (ri2 i = 1.2: ...), where T, is the replacement time of the i - th
unit. Then the total discounted cost over an infinite time horizon has the form
It is not difficult to see that the it is sufficient to consider the stationary
policy: i.e., T, = T for all i . The original optimization problem can then be
reduced to the following optimal stopping problem. Find
infi E ( T G ) E(1 - e-OT) '
and the optimally stopping time T: (if it exists)
(3.5.2).
(3.52)
minimizing the expression in
Using the notation in the Section 3.3? the total discounted cost TC,(r) has
the form
TGAr)
= ~ ~ ~ ~ ( ( C ( i . s,) - C1)e-." + Cfe-QS1+l Its.+lgJ )I{s.+T.I n;=b ~ s , ~ s T , 1
where C(0,O) 6 C,: and the stopping time T has the representation given by
(3 .213): T = ~ ( q {r}).
Next?
= X~Z: $2 e - Q ' ~ ~ s , + 1 2 t l d t ~ f , b 4sJt,g,) 1 do. {T , ) ) ,
Applying the A-maximization technique and the semi-martingale decomposi-
tion. we obtain the formulas for YxQ(o. IT ,} ) and f-;l(o, {x)), corresponding to
k;(n: {T , } ) and yA(a. {T , ) ) in Section 3.3. We have,
and
nf=b rt,+,,,}.
As Section 3.3.
T\ = inf{t : X 5 Cfh( t )} .
is the optimal preventive replacement time.
Vie have that
where
and
I'-"(A) = s u p t ~ ( s u p ( ~ , } ~ ( ~ ( ~ ~ {r ) ) ) ) (3510)
Finally, the maintenance decision problem reduces to the optimal stopping
76
problem for the sequence {I,V,"(A)} defined by
The optimal stopping time a&) has the form
Hence. the optimal repair/ replacement policy has the same form as in the
average cost case. i.e.. the unit is minimally repaired at the nth failure time
Sn if and only if S, 5 T and g,"(X,, S,) > C(n. S,). otherwise it is replaced.
A: is obtained from (3.5.8). If the unit has not been replaced at failure times
before T . it is preventively replaced a t time T , where T r TAG is determined
by (3.5.6).
Similar to the average cost case, we have the following recursive equation
for gt (A:, t). For simplicity. denote g," ( X i ? t ) as g, ( t ) , and A: as X .
And the differential form of the above equation is,
If the repair cost C(n. t ) does not depend on n, the optimal control function
ga(X,, t) and A, are obtained as the unique solution to the following differential
equation with the boundary conditions.
g(0) = C,. g(T) = Cf2 CIh(T) = A. (3.5.16)
where K(u) = P ( C ( t ) > u).
Note that when cr t 0. equation (3.5.15) has the same form as equation
(3.4.1) For the average cost case. Therefore, A: decreases to A* (where A'
is the optimal expected average cost), and the repair cost limit function for
the discounted cost case g" (A,, t ) increases to g(X'. t), the repair cost limit
function for the average cost case. Therefore. the espected average cost case
can be viewed as a special case of the expected discounted cost case.
Summary of Notation
C(n , t ) : repair cost of the nth failure a t age t
78
: failure lost
C,: preventive replacement cost
F ( t ) : distribution function of the first failure time
g,(t): repair cost limit function for the n - th failure
h( t ) : failure rate
I<*: total discounted cost over an infinite time horizon
S,: n-t h failure time
TA: in f ( t : X 5 C I h ( t ) }
TC(t ) : total cost incurred up to age t
I*(,\): value function for the X maximization problem
: the objective function for the X maximization problem
a: discount factor
A': optimal average cost
~ ( a * , (T; 1): optimal stopping time
Chapter 4
OPTIMAL MAINTENANCE
POLICY FOR A GENERAL
REPAIR MODEL
4.1 Introduction
The concept of general repair was introduced by Kijima et al. (1988). They
assumed that the general repair can improve the condition of the system by
decreasing its virtual age. The minimal repair and replacement are special
cases of the general repair, the former does not change the system's age and
the latter reduces the system's age to zero.
The effect of a general repair on the system's condition is described by
a repair degree. The repair degree B=0 corresponds to replacement, 8=1 to
minimal repair, and 0 < 0 < 1 to a general repair that improves system's
condition.
The repair degree determines the virtual age that describes the condition
of the system. Two variants of the definition were proposed in Kijima (1989):
Type I, Vn = I.,-, + B,.Y,, and Type 11. V, = O n ( l ~ , - l + .\',). where C, is
the virtual age after the n-th repair. S,., is the length of the operating time
between the (n - 1) - th and the n- th repair and 8, is the repair degree of the
n - t h repair.
Some generalizations and modifications have also been proposed in the
literature. For example. Bavter et al. (1996) considered a model with I-, = 1'.
where E' is a random variable.
Kijima et al. (1988) considered a periodic replacement policy for a re-
pairable system with general repair, 4Iakis and Jardine (1991) proved the op-
timality of T-policy for their model, and Stadje and Zuckerman (1991) showed
that a bang-bang policy is optimal if the repair degree is a decision variable.
Dagpunar (1998) studied properties of a general repair process considering
Kijima's model 11, established monotonicity results for such a process and pre-
sented a computational method for evaluating repair density and the expected
number of repairs in a given time interval.
Scarsini and Shaked (2000) obtained bounds on the expected total profit
generated by an item subject to general repair and Zhang and Love (2000)
considered a Markov chain model under general repair and developed a simple
recursion to determine the optimal replacement policy.
Models with minimal repair and random repair cost have been studied
under different assumptions by Beichelt (1993). L'Ecuyer and Haurie (1987).
Jiang et al. (1998). and Makis et al. (2000), among others. In each case.
the optimality of a repair-cost-limit policy was established. The repair-cost-
limit policy prescribes replacement of a unit at failure time if the repair cost
exceeds certain limit and this repair-cost-limit at age t can be interpreted as
the residual value of the unit. The age replacement policy is a special case of
the repair-cost-limit policy, i.e.. when the residual value of the unit decreases
to zero, a preventive replacement should be carried out.
The objective of this chapter is to investigate the structure of the average
cost optimal policy for a model with general repair and preventive replace-
ment. This model is a natural extension of the minimal repair models pre-
sented in Chapters 2 and 3. Using results from the theory of jump processes.
X -maximization technique. semi-martingale decomposition and dynamic pro-
gramming approach, we will show that the optimal policy is a combination of
a repair-cost-limit policy for failure replacement and a virtual age-based pre-
ventive replacement. We consider Kijirna's Type I model with general repair,
other models can be analyzed using the same approach.
4.2 Model Description and the Main Result
We make the following assumptions.
1. System deterioration.
The time to failure has distribution function F ( t ) . density f ( t ) and the
failure rate h ( t ) = J ( t ) / ( l - F ( t ) ) , which is a nondecreasing function of
t. i.e.. F is IFR (increasing failure rate). For simplicity assume that F
has a full support on (0. +m)? and h(m) = x.
2. Maintenance actions considered: preventive replacement. failure replace-
ment. and general repair. All actions take negligible time.
9. Cost structure. Preventive replacement cost Cp7 and failure loss CI
are assumed to be given constants. The repair cost C(v) is a ran-
dom function of the virtual age u with distribution function G,(c ) , and
C(vl) C(,y) (stochastically) for any ul < UZ The costs incurred
at the n-th repair epoch. preventive replacement epoch and failure re-
placement epoch are (?(I.;) + Cf. C, and C, + Cf, respectively. where
83
I/,- = Vn-l + A', is the virtual age just prior to the n - th repair. The
repair cost is observable at failure time.
4. Type of general repair. Kijima's Type I general repair model is considered.
1.e.,
r; = c; - , + B(c;;.c(I;,-)).Y,.
The repair degree B(v , C(u) ) is a random function of the virtual age c.
and the random repair cost C(u). We assume that B ( u . Gyl(p))
B ( u 7 G; ' (p ) ) for any 0 5 p 5 1 and u < u. where G;' is the inverse func-
tion of G,. We will write B ( v . C) as B ( u ) . The repair degree represents
the available information about the systemk condition after a repair.
We assume that 1 2 B(v. C ) 2 e > 0.
5. Objective.
Find the repairlreplacement policy minimizing the long-run expected
average cost per unit time.
Figure 4.1 illustrates a sample path of the failure rate and the repair cost.
where Si is the ith failure time and Ci is the repair cost at time S,. The
state of the operating unit is described by ( u . x ) . where -u is the virtual age
immediately after the last repair and I is the length of the operating time since
the last repair. The state of the unit at the n-th failure epoch is described by
(v, I, C), where (u, x) = X,,) and C = C(\-) is the repair cost. Finally,
the state immediately after the n-th repair is (C,, 0). We write this state as
1:;.
Figure 4.1: Sample path of the failure rate arid the repair cost.
The main result obtained in this chapter is summarized in the following
theorem.
Theorem 2.1. T h e optimal policy exists, and it is a combination of a gen-
eralized repair-cost-limit policy for failure replacement and an age-based policy
for preventive replacement. i e . . the optimal replacement t ime r is determined
S n + ( T ) - ) if no replace.ment occurred a t or be fore Sn-
and Sn-I + (T(V,-1) - 1/,-1) < Sn,
(4.2. I )
where Sn is the n-th failure time, Cn C(I;;) is the repair cost at Sn.
The repair cost limit g: (u . x. C ) is calculated from
g ( t ) satisfies the following integral equation
g ( t ) = F ( U ) [ A - h(u)(Cf - &. u - t ) ) ] i d u / F ( t ) . (4.2.3) t
where
g2(u. I) = ~ ~ ( g ; ( u . r. c)). ( 4 2 . 4 )
and
g ; ( ~ X. C) = [ E O [ ~ ( U + e ( ~ + I? c)x ) ] - c]+. ( -1 .23)
g(0) = C,, and g(T) = 0 f o r T = i n f { t : X 5 h ( t ) C f } . Here we use the
notation (x)+ = max(x , 0).
where
and X is the optimal average cost.
Remark 2.2. Functions g. yo, g;. g: and g2 have the following meaning.
g ( u ) = go(u. 0) is the residual value of the repaired unit in state ( u . 0).
go ( u . x) is the residual value of the unit in operating state (z.. x).
gJu, x. C ) is the residual value of the failed unit before repair. given that
the repair cost is C.
$(v. x. C) is the residual value of the unit after repair.
g2(u: Z) is the mean residual value of the failed unit in state ( u . r ) before
assessing the repair cost.
The optimal policy can be expressed as follows.
Preventive replacement is carried out at virtual age T(t.7) when the residual
value of the unit go(v, T(v ) - u ) reaches 0.
Failure replacement is carried out a t failure time Sn when g; (CL-,, S,. C,)
reaches 0, or equivalently. when the repair cost Cn = C(I;,-) is greater than or
equal to g: (Vn- dYn Y,: Cn).
To summarize, the unit is replaced when its residual value reaches zero.
In addition: it is easy to see that is the first time such that g(T) = 0.
i.e., the residual value of a unit starting with virtual age T is equal to zero.
To prove the theorem, we formulate the decision problem in continuous
time and then reduce it to a discrete time optimal stopping problem by using
results obtained in Chapter 3. Finally, the form of the optimal policy is found
by using a dynamic programming approach.
4.3 Problem Formulation and Analysis
Let t be the time of the first replacement if a new system is installed at time
zero. We define TC( t ) . the total cost incurred up to time t . as
where Si is the i-th failure time, So = 0. C, is the repair cost at the i-th
failure time, N( t ) is the number of failures before t. X ( t ) = I:'=, I{s,<t). and
I is the set indicator function. We denote Bi as the repair degree of the i - th
repair.
Let (Ft) be the (completed) natural filtration of process {TC( t ) , t 2 0).
For the average cost criterion? the maintenance decision problem can be
formulated as follows. Find an (Ft)-stopping time rat if it exists. minimizing
the long-run expected average cost per unit time given by
By using Lemmas 2.3 and 2.1 in Chapter 3, we will show that this continu-
ous time stopping problem can be reduced to a discrete time stopping problem.
To keep self-continedness of this chapter and keep notation consistency, we
list these two lemmas again as Lemma 3.1 and 3.3 as follows.
Lemma 3.1. Any (Ft)-stopping time T has a representation
where 0 is an (%.)-stopping time, TA is a constant and TA i s %,-measurable
for n 2 1. R, = C T { ( S , . C ~ , O ~ ) , ~ 5 n } FSn. We define 1. Conversely,
for any (?in)-stopping time a. constant Ti 2 0 and R,-measurable functions
TA 2 S,, n = 1: 2.3 ...: r(o. {T;}) defined bg (4.3.3) is an (Ft)-stopping time.
Lemma 3.2. Let T be an (Ft)-stoppdng time, (cr:{T:}) be a representation
of r and let TC(r) be the total cost incurred up to time r . Then.
where Go = C, + Cf.
Lemmas 3.1 and 3.2 provide the characterizations of any (&)-stopping
time T and the corresponding total cost TC(r ) in terms of o and {l;'),fm. This
representation of r corresponds to a decomposition of a replacement policy
into competing unscheduled failure replacement 0 and scheduled preventive
replacement {q'):OD, and it leads to the following equivalent formulation of
the continuous time stopping problem in (43.2) . Find
and (o*, {Tr }) minimizing E(TC(a? {T"') (if they exist), where the infimum is E(+. {TiW
through (Xn) -stopping times cr and the random variables IT:}. is Ri -measurable.
T: 2 Si for i 2 1. Ti 2 0 is a constant.
It is clear that this problem is equivalent to the following sequential mini-
mization problem
A' = inf
In fact. if A, < A * . then there
sponding average cost A' < A* .
exists a r1 = (0 ' . {T:'}) such that its corre-
which is a contradiction.
The optimal stopping time ~(o*: {Tr }) and the minimum average cost X'
can be found by solving the following A-maximization problem (see e.g. h e n
and Bergman (1986)). For X > 0. find
where
Then,
A' = sup{A : V(X) < 0)
and (om, {T:*)) maximizes the right-hand side of (4.3.7) for X = A'.
The maximization problem in (4.3.6) is further simplified by removing the
martingale part from k\(o, {T:}) defined by (4.3.7) through conditioning (semi-
martingale decomposition, see e.g. densen (N89)), i.e.. we can consider the
following problem. For X > 0. find
where
be obtained from (4.3.8) and the optimal stopping time r(a8. {TI*)) for the
problem in (4.3.5) is the stopping time maximizing E ( P ~ (o, {T:))) for X = A*.
From the last expression in (4.3.10), it is clear that, although T, by def-
inition is a 'Hi-measurable function, it can be further restricted to depend
only on C; without loss of optimalitp because of the Markovian property of
the model. T, represents the virtual age of the system just prior to the n-th
scheduled preventive replacement time.
Thus we have reduced the original decision problem to a discrete time s t o p
ping problem (for a). with an embedded one-dimensional optimization prob-
lem at each stage (for T, at stage 1 ) . We can now use a dynamic programming
approach to solve this problem.
4.4 Dynamic Programming Approach
To find the form of the optimal policy, we only need to consider the following
maximization problem:
where EL is the expectation for the system starting a t virtual age t. For X
equals to the optimal average cost, g(X, t ) represents the residual value of the
unit with the initial (virtual) age t. In particular. g(X.0) = V(X) + C,.
Conditioning on the information at the first failure epoch, we obtain the
following dynamic equation:
where g2,x(v. x) = EC[g,,,(u, x. C)], and gcA(o. x. C) = (gEA(u. x. C) - C)'.
and g[,(u, x. C) = EB[g(X. u + B(u + x. C)x)].
The first equality is obtained from the backward optimality property of
the optimal policy. and obviously? " st' becomes '' =" iff o is of the following
form: replace the unit a t the first failure time. i.e.? o = 1 if g L x ( t . s - t . C) =
0: or g [ A ( t , s - t .C) 5 C; otherwise repair it. i.e.. 0 > 1. Obviously. this
o corresponds to the optimal policy for failure replacement. and g 2 , A ( ~ . x )
and g;,(u. x9 C) have the interpretation of the residual values as discussed in
Remark 2.1.
Later we prove that g(X. t) decreases in t. From this and from the mono-
tonicity of C ( t ) and B(t), one can show that g2,J(2)! x - U ) decreases in z for
x > u. Hence?
[ A - h(s)(Cf - g2,X(t. s - t ) ) ] F ( s )
decreases in s beyond zero. From (4.4.2): we have that
t )
= Js[A - h(s)(C, - ~ ? , ~ ( t . s - t ) ) ]+F(s )ds /F( t )
= J""'[x - h( s ) ( C j - g2,&. s - t ) ) ] F ( s ) d s / F ( t ) ,
where T ( t ) = i n f { s 2 t : X 5 h(s)[Cf - g l , ~ ( t . s - t ) ] } .
Thus. if g(X. t ) decreases in t , T ( t ) maximizes the right side of (4.4.').
Obviously. T ( t ) is the virtual age corresponding to the optimal preventive
replacement time.
- Lemma 4.1. T ( T ) = T. and it is the lower bound of T ( u ) . T ( v ) is also
bounded from above. Denote this upper bound as Tu < 3c.
Proof. From the definitions of T (o) and T. and from g2,,(.u, z) > 0. we have
- that T(u) 2 T . Since 92.X(T? 0) = OI we have that T(T) = T. Hence. T is the
lower bound of T(u) .
On the other hand, Tu 5 T/c. where e is the lower bound for B(c. c). To
see this. we need to prove that for any u < T . T(u) 5 T/c.
We first notice that from (4.42): g(X, s) = 0 for s > T. Therefore, o +
e(T/e, c)(T/c - V ) > and
92 ,A (~7 - V )
= E C [ g l , A ( ~ , TI6 - U. C)j
= E ~ [ ( E ~ [ ~ ( x : u + @(TIE: c)(T/E - u))] - c)+]
= 0.
and from h(T/c)ci > A. we have that T ( v ) 5 T/c . Consequently. TL. =
sup , ,T(~) 5 T I C . This completes the proof.
Q.E.D.
?iow, the key step is to show the monotonicity of g(X, t). In the rest of this
section, we first prove several lemmas which will lead to the monotonicity of
s(A t ) .
Define operator L on Rp: the space of continuous. decreasing, non-negative
functions with support belonging to [O. F] as
and write (44.2) using (4.4.4) as
L can be represented as
Tu - ( 1 ) ( t ) = F ( u ) [A - h(u) (Cf - W2(t? u - t ) ) ] + d ~ / F ( t ) , (4 .45)
t
where CV2(t. u - t ) = EC[(Ee[W( t + 0(t: C ) ( U - t ) ) ] - C)+]. We prove that L
is a contraction operator. Define
as the norm. Clearly. this function space is a complete metric space. i.e..
Banach space.
Lemma 4.2. If W ( t ) decreases in t . then CV2(u.x) decreases in both u and r .
In addi t ion, W ( u , t - u ) decreases i n u for u 5 t .
Proof. First. we prove the monotonicit? of i;(o, x) in both arguments.
We have for ul 5 L I Z ? xl 5 1 2 :
II'; (,ul . x L )
= LV&*, 4.
The first inequality follows from the monotonicity of W ( t ) and B(t. C) and the
second one from the monotonicity of W ( t ) and C( t ) .
Next. we prove that W 2 ( u . t - u ) is decreasing in u for u 5 t. We have for
u l < u2 < t ,
Therefore,
LC>(ul, t - u l )
Q.E.D.
Lemma 4.3. L is a contraction operator with contraction factor n = F(Tu) <
1.
Proof. We have for any bounded functions W' and W2,
1 1 L(LV1) - L(W2)l1
= S U ~ ~ I S U ~ ( ~ ~ ~ ~ , . ~ $:[A - h(s)(CI - CC;l(t. s - t ) ) ] F ( s ) d s / F ( t )
- S U P ( S S T ~ ) $:[A - h( s ) (Cf - bv(t+ s - t ) ) ] F ( s ) d s / F ( t ) 1
5 supt { s u ~ ( ~ < ~ ~ - J: 1 Vl;?l ( t . s - t ) - CC;2 ( t , s - t ) ( f ( . s ) d s / T ( t ) }
5 ~ ~ p t { s ~ ~ ( s g ~ , I: IEC[E6(W + 8(#s+ C ) ( S - t ) ) ) ] (4.4.8)
- ~ ~ [ E ' ( t r * ~ ( t + B ( s . C)(S - t ) ) ) ] l f ( s ) d s / T ( t ) }
I S " P ~ { ~ " P { S ~ T ~ } f'; l - qIf ( s ) d s / m ) }
5 supt 1 lwl - w21 I ( F ( t ) - P ( r , ) ) / F ( t )
F(Tu)lllC'l - W2jI,
which proves the contraction property.
It follows from the boundedness of T ( v ) and uniform continuity of functions
on this function space that L maps a continuous function to a continuous
function. Also, L maps a positive function to a positive function. which has
support in [O,T]. In the next lemma: we prove that L maps a decreasing
function to a decreasing function.
Lemma 4.4. L mops a decreasing junction to a decreasing function.
Proof. It follows from Lemma 1.2 that if W ( t ) decreases in t: then W ( u . x)
decreases in u and x. From this, we have that for any decreasing function II'.
In order to show that L ( W ) ( t ) 5 L ( W ) (s) for t > s 2 0, it is sufficient to
prove that for any t l > t . there exists sl > s, such that
Since F ( x ) is continuous by assumption. we can choose sl satisfying
First. we prove that
To see this. notice that from (4.4.11), we have
Therefore. I:' h(u)du = I," h(u)du and. consequently. t t - t 5 sl - s, and
t L 2 S l .
We have for u > t.
u-(t-s)
h(u)dvl
The last inequality is equivalent to (-I.-L.l2).
Next: we will show that
- Indeed, for any O 5 z 5 1 - ~ ( s ~ ) / F ( s ) = 1 - F ( t l ) / F ( t ) . define x and y
It is easy to see that y 2 x, i.e.,
(4.4.14) is equivalent to
Denote W(u) = I.V2(t, u - t). It follows from Lemma 4.2 that m ( u ) is a
decreasing function of u. We also have from Lemma 4.2 that, CV&. u - s) 2
L C 5 ( t , u - t ) . Hence,
6' kV2(st u - s)dGs(u)
J:' V ( U ) ~ G , ~ ( U ) .
It is now sufficient to prove that
Then. combining (4.4.19) and (4.4.18), we obtain (4.4.17).
By using the variable transformation 2 = Gs(u): 2 = G v ( u ) on both sides
of (4.4.19). we get
From (4.4.16) and the monotonicity of I.C;(s. u - s ) in u. we see that (4.4.20)
holds and this completes the proof of (4.4.14) and consequently. the proof of
Lemma 4.3.
201
4.5 Proof of the Theorem
To prove the Theorem, we first need to verify the monotonicity of g(X.t).
Since it is difficult to do this directly, we consider the following truncated
problem, which not only leads to the monotonicity of g(X, t ) ; hut also provides
an algorithm for computing g(X. t ) .
Lemma 5.1. Assume that the unit can be repaired at most 1V - 1 times for
some N 2 1. Then, there exists a series of functions gn(A. t ) . n 5 iV. svch
that
s n , the first Sn such that
if no replacement occurred at or before
if none o f the aboue occurs be fore Slv (4.5.1)
is the optimal replacement time, where g:,, (A, u , r , C) G E' (~ , (X . u + 6(1; +
The repair cost limit functions g,(X, t ) are the unique solution of the jol-
lowing equations
where T = inf { t : X 5 Clh(t)} < x, and X is the optimal average cost for
this truncated model. I n addition, g,(X, t ) increases in n and decreases i n t .
Proof. This truncated problem can be formulated as a standard finite stage
dynamic programming problem. In fact.
The monotonicity of g,(X. t ) in t follows from Lemma 4.3. The monotonicity
in n is obvious from (-1.53). The other boundary conditions can be verified
easily.
Q.E.D.
We can now provide the proof of the main theorem.
203
Proof of the Theorem. From (4.4.1) and (4.5.3). we have that lim,,,g,(A. t ) =
s(A 0-
On the other hand, from Lemma 4.3? Lemma 4.4. and comments in be-
tween, L is properly defined on the space R p and is a contraction operator
with contraction factor cr = F(Tu) < 1. Consequently, limn,,gn(X. - ) =
li.m,,, L n ( g o ) ( X : .) = .). Applying the fixed point theorem to contraction
mapping L, we have that q(X..) is a unique fixed point of operator L. i-e..
g(A, .) = L(g) ( A . .).
Therefore. q(X. .) = g ( X . a). and consequently. g(X. .) = L ( g ) ( A . -).
Since the optimal value function g ( t ) of the original problem equals to
g ( A . t) for X = Xb.where h* is the optimal average cost. we obtain that g = L ( g ) .
The monotonicity of g follows from the monotonicity and convergence of
g n ( K *).
To verifv the boundary conditions. notice that gn(X, T ) = 0 and, conse-
- quently, g ( X ? T ) = 0 since g ( A , T ) = limn,, gn(X, T ) . In particular, g(A ' . T) =
0, i.e., g(T) = 0. g ( 0 ) = C, is obtained directly from (4 .4 .1) .
g ( t ) is the optimal value of the maximization problem (4.41) obtained by
applying policy (4.2.1) with parameter X = A'. Therefore. policy (4.2.1) is the
optimal policy. This completes the proof of the Theorem.
Finally. it is easy to show that lim,,,XA = A'. where X i is the optimal
average cost for the truncated problem.
Q.E.D.
Based on Lemma 5.1, and the proof of the main theorem, we propose the
following computational algorithm to compute g ( t ) .
The algorithm.
Step 0. Choose c > 0. Put X L = 0. XI; = Cr/ESI.
T' = h-'(C,/(E&C,)).
Step 1. Put X = ( A L + Xo)/2.
Step2. Put T = h - l ( h / C I ) , g&. t ) = 0.
Step 3. Compute A(,\, t ) = L(gnd1 (A. t ) ) = Ln(go(X. t ) ) . Based on the
contraction property of L. g,(X. t ) converges to g(X. t ) . If 1 lgn+l(A. .) -
g,(XI -) I I < c. go to Step 4. and put g ( h . t ) = g,+l(X, t ) .
Step 4. If I X L r - X L I < go to Step 5. Otherwise, compare g(h.O) and C,.
If g ( A , 0 ) < C,, put X L = X and go to step 1.
If g ( A , 0 ) > C,? put X c = X and go to step 1.
Step 5. Stop. g(X, t ) is the optimal repair cost limit function and X is the
corresponding average cost.
4.6 Special
Case 1. $(t ,C) is a
Cases
deterministic function of ( t . C) .
The optimal policy is a combination of a repair-cost-limit policy for failure
replacement and an age-control-limit policy for preventive replacement. i.e..
there exists a repair cost limit function g(t) . and a control-limit function T ( u ) ,
such that
the first n such that C, 3 g(C,).
S n + ( T I . - I ) if no replacement occurred be fore S,-l
(4.6.1)
is the optimal replacement time, where C, is the virtual age after the n-th
repair, and X is the optimal average cost. Obviously. in this case. I,, is deter-
ministic for given (CL-, , S,) and C, = C(I.k).
The repair cost limit g( t ) satisfies equation g = L(g). where L is defined as
(4.62)
g(0) = C,, g(T) = 0 for T = in f { t : X 5 h(t)CI) .
The age-control-limit function T ( v ) = in f { t : X 5 h( t ) (CJ - Ec[g(zl +
W ) ( t - 4) - C(t) l i ) ) -
Remark 6.1. It is obvious that the above policy is optimal for the following
generalization of this special case: the repair degree 0(t , C ) can be a random
variable of t and C , and both the repair cost and repair degree are observable
at each failure time before repair is carried out.
Case 2. 0(t) = B(t, C ) is a random function of t independent of C.
The optimal policy is a combination of a repair-cost-limit policy for failure
replacement and an age-control-limit policy for preventive replacement. i.e..
the first n such that Cn 3 gi(C,-l. S,).
S n + ( ( I . - ) if no replacement occurred bejore
(4.6.3)
is the optimal replacement time, where g:(zl? r ) E g t ( u . x? C) = E'~(u + 8 ( u +
42).
Case 3. Deterministic repair cost C( t ) and constant repair degree &.
Since this is a special case of Case 1, the optimal policy is further simplified
and the optimal replacement time has the following form:
the first n such that X,, 2 Tl(I.,-1) - L - 1 .
S n L + ( ( ) - ) if no replacement occurred be f we
(4.6.4)
'='a- g ( t ) = / F(u){A - h(u)(Cf - [g ( t +Oo(u - t ) ) - C(u)] ' ) j idu/P(t) . ( 4 6 . 5 )
t
Case 4. Minimal repair model. For 0 = 1, the model is a minimal repair
model. and we have from (4.4.3)
where T = in j { t : X 5 C f h ( t ) } and G t ( x ) is the distribution function of C ( t ) .
Differentiating (4.6.6). we get
with boundary conditions g(X. 0) = C,, g(X. r) = 0. This result coincides with
the result in Makis et. al. (2000) in the case when the repair cost does not
depend on the number of repairs.
The optimal policy has the following form: preventively replace the unit
at age T if no replacement occurred before r. If a failure occurred at t < r. replace the unit if the repair cost C(t) > g(t) , and repair otherwise.
Figure 4.2: Optimal policy for a minimal repair model.
4.7 Example
Consider Special Case 1 with the random repair cost C uniformly distributed
on [O, 0.51. Put c = 0.01. C, = GI = 2, h( t ) = t and
112 when C E [O. 0.3) q t . C) =
?Ve programmed the algorithm in Section 4.5 in MATLAB. We discretized
the time interval using il = 0.02. applied spline interpolation, and iterative[?*
solved the truncated repair problems for N 5 7. It took about 5 minutes on
a Pentiurn I1 366 PC to obtain the results. The value of the optimal average
cost A* = 2.692. The values of the optimal repair cost limit function g ( t ) and
109
the optimal age-preventive replacement limit function T ( t ) are listed in Table
4.1 and the graphs of these functions are in Figure 4.3.
Table 1.1: Optimal control limits g ( t ) and T ( t ) for different values of t.
The optimal policy can be described as follows. The unit starts in state
(u? x) = (0: 0). If the unit does not fail in interval [O. T(O)], it is preventively
replaced a t time T (0).
If the first failure occurs at time xl < T(0) ? and the repair cost is CI: then
a replacement is carried out if Cl 2 g(vl), where vl = xlB(xl,C1). The unit
Figure 4.3: Optimal repair/replacement policy.
is repaired if CI < g(v l ) .
After the first repair. if the operating time +2 of the unit is greater than
T ( q ) - u l ? the unit is preventively replaced at time X I + T ( u l ) - u l . If the
second failure occurs in interval [ x l . x1 + T(wL) - u 1 ) . the repair cost C2 is
estimated and the unit is repaired if C2 < g ( u 2 ) and replaced if C2 2 g ( 0 2 ) :
where 1.72 = U I + 0(u1 + 2 2 , C2)x2? etc.
For minimal repair, the virtual age is the same as the real age, and from
the definition of T( t )? T ( t ) - t = T - t (a straight line with angle - i r / 4 ) , i.e..
the control-limit function T ( t ) is a constant and Figures -4.3 and 4.2 coincide.
Summary of Notation
C(v): repair cost of age t
Cr: failure lost
C,: preventive replacement cost
f ( t ) : density function of the first failure time
F ( t ) : distribution function of the first failure time
h( t ) : failure rate
V(X): value function for the X maximization problem (with initial age 0)
g(h, t ) : value function for the h maximization problem with initial age t
g ( u ) : residual value of the repaired unit in state (u. 0)
go(o: x): residual value of the unit in operating state ( u . x)
g; ( u . x ! C): residual value of the failed unit before repair. given that the repair
cost is C
g:(u, x, C): residual value of the unit after repair
gz (v , z): mean residual value of the failed unit in state ( u , x) before assessing
the repair cost
G,(c): distribution function of repair cost at virtual age v
L: operator that defines the value iteration
T(v ) : scheduled preventive replacement time at virtual age u
TC(t): total cost incurred up to age t
- T: in f { t : g ( t ) = 0)
: virtual age after the n - th failure
k;: the objective function for the X maximization problem
0: repair degree
A*: optimal average cost
r(ob , {T; 1): optimal stopping time
Chapter 5
OPTIMALITY OF
LEVEL-CROSSING POLICY
FOR A CBM MODEL
Introduction
We consider a maintenance model with partial information about the state of
the system, obtained through monitoring a signal process at equally spaced in-
spection times. .An example of a signal process is the overall vibration level of a
machine that is considered to be a good indicator of the machine condition(e.g.
Mitchell (1981)).
The evolution of the signal process is determined by random factors and
minor maintenance actions between inspections. We assume that a major
failure that requires an overhaul or replacement of the unit. occurs when the
signal process first exceeds a critical level. To model the situation where the
signal process carries only partial information about the machine state. we
assume that the critical level is a random variable independent of the signal
process. In practical situations. a failure may occur even when the signal level
temporarily decreases. which is expressed in this model by an assumption that
the critical levels might be different a t different inspection times. The profit
in the i th period ( between the ith and the ( i + 1)th inspection ) is a random
function of the signal level at the i t h inspection. and it includes the cost of
rninor maintenance in this period. The preventive and failure replacement
costs are given constants. The objective is to find the replacement policy that
maximizes the total expected profit during the machine lifetime.
X distinguishing feature of this model is that we do not assume monotonic-
ity of the signal process. We consider two kinds of maintenance actions and
random critical levels that define major failures. The effect of minor main-
tenance actions between two subsequent inspections on the signal process is
a decrease or increase of the signal level a t the next inspection by a random
amount that represents an improvement or worsening of the machine condi-
tion, respectively. This assumption is similar to the assumption of a general
repair which vvas introduced by Kijima and Sumita (1986), and has been dis-
cussed in detail in Chapter 4. Studies of these kinds of systems have also been
conducted by Stadje and Zuckerman (1991) and Makis and Jardine (1993).
The model also bears certain similarities with the shock models found in the
literature (see e.g. Taylor (1973), Zuckerman (1978) and Stadje (1994)).
Examples of other condition-based maintenance models include a state
space model considered by Christer and Wang et al. (1997) for furnace erosion
prediction and replacement. a counting process model by h e n (1996). and a
proportional hazards decision model developed by Makis and Jardine (1992)
where the hazard function of the system depends on its operating age and
on stochastic covariates that can represent the information obtained through
condition monitoring, such as spectrometric analysis of engine oil, over time.
The chapter is organized as follows. In Section 5.2, we describe the model.
formulate the maintenance decision problem and prove the existence of the
optimal policy. In Sectian 5.3. we will show that under weak monotonicity
assumptions the optimal policy is of a control-limit type. i.e.. replace the unit
if and only if the signal level exceeds certain critical limit. An algorithm for
the computation of the optimal control limit is developed and a numerical
example is given to illustrate the computational procedure.
5.2 Problem Formulation and Existence of the
Optimal Policy
We assume that the signal process is observable a t equidistant points of time
iil? i = 0,1, .. .? and the signal level at time iA is determined by
where .Yo = 1 is the normalized initial level and { d ) is an i.i.d. sequence of
random variables, independent of St. C represents the effect of random factors
and possible minor maintenance actions in the i th period (between the i th and
the ( i+ l ) th inspection). on the signal level at time (i+l)A.O < & < D < +x.
Ci can have the following interpretation. We can write C = dimi, where di > 0
represents the effect of random factors. mi = 1 if there was no maintenance in
the ith period, and mi = B if there was a maintenance. where 0 > 0 is a random
variable. 0 < 1(> 1) can be interpreted as an improvement (or worsening) of
the machine condition by the repair and 0 = 1 represents minimal repair which
has no effect on the signal level. We assume that E(1nC) > 0,i.e.. E(1nXJ >
E ( n 1 ) The major system Failure time T is defined as the first time the
signal process exceeds a critical level, i.e..
where Xn is the critical level at time nA. Since the signal process {&) carries
only partial information about the system and the system can fail even when
the signal level decreases, it is reasonable to assume that the critical levels H,
are random variables. We further assume that (31,) is an i.2.d. sequence with
distribution function F ( . ) , and that 1 < 31, 5 .-I < +m.
The profit in period i (after subtracting the possible minor maintenance
cost in that period) is S(Xi). a random function of the signal level -Yi, - B 5
S(X,) 5 B < +oo; Cp > 0 is the preventive replacement cost and Cp + Cf
is the failure replacement cost, CI > 0. Both Cp and Cf are assumed to be
constants.
The objective is to find the replacement policy that maximizes the total
expected profit during the system lifetime.
Define for n 2 l?
The problem can be formulated as follows. Find
sup Ek; 7
in the class of stopping times relative to the process history {F,), where
Fn = o{(.&, I{T>lJ)r z 5 n)? and a stopping time T' (if it exists), for which the
supremum in (5.2.4) is attained.
Lemma 2.1. ET < +m.
Proof. From the definition of T in (5.2.2),
T = inf{n 3 1 : S, > R,}
= inf(n 2 1 : inti > In?&,)
5 T4 r inf{n 2 1 : ~ ~ I $ n & > lnd}.
Since { S i ) is an i.i.d. sequence with E(ln&) > 0. it follows from Theorem 2.4
in Chow ( e t al. (1971). p.29 that ETe4 < +sc and hence. ET < +x. This
completes the proof.
Q.E.D.
Remark 2.2. If we denote
then it follows from Lemma 1 that E T ( x ) < +m for any x > 0.
In the next lemma. we will prove the existence of the optimal stopping time
for sequence {Y,).
Lemma 2.3. The optimal stopping time r* maximizing EY; exists.
Proof. It follows from Theorem 4.5' in Chow et al. (1971). p.82, that if
E(sup Y,C) < +a, the optimal stopping time exists. We have from (5.2.3):
This completes the proof.
Q.E.D.
The following Lemma will be useful to find the form of optimal stopping
time for our problem.
Lemma 2.4. [Chow et a/. (1971), Remark. p.lO5] Define
where {Zk} is a homogeneous Markov chain, w, ( . ) and yn(.) are deterministic
functions. Then, the optimal stopping t ime for {CI;,} has the fo~lowing forrn:
and Zo = r .
The optimal stopping time for our problem is determined by the following
theorem.
Theorem 2.5. Denote Z,, = (Sn: I{=,,)). The optimal stopping time r' /or
{E,) is given b y
where
i f z = (x, 0)
- C(zk) = [E(S(-&) I~yk) - C j P ( - Y k b > x k + 1 l - Y k ) ] I { ~ > k )
(5.2.9)
= C ( - Y ~ ) I { T > & } ,
und Zo = z. If z = (I, O ) , the system is in a jaifure state and the current signal
level is x.
Proof. We have from (5 .2 .3 ) .
( E d 10)
I t follows from the definition of Z, in Theorem 2.5 and from (5.2.1). {Z,)
forms a homogeneous Markov chain. Therefore. the optimal stopping time
r' can be found by applying Lemma 2.4, and (5.2.7) is obtained easily from
(3.2.5) and (5.2.10). This completes the proof.
Q.E.D.
Remark 2.6. The form of the optimal stopping time in (5.2.7) has been ob-
tained under weak assumptions regarding the signal process. We have assumed
that E ( l n 4 ) > 0, which is equivalent to the assumption of the monotonicity
of E(ln?i,), but the signal process needs not be monotone. This is important
for practical applications. For example, the overall vibration level typically
does not exhibit monotone behavior, but the process tends to increase on the
average, which can be expressed by the monotonicity assumption regarding
the mean value.
The optimal stopping time in (5.2.7) can be found in the general case by
computing function C ( z ) in (52.8): but the optimal replacement policy is not
necessarily of a control-limit type.
In the next section. we will show that if the expected profit in a period is
a monotone function of the signal level. a control-limit policy is optimal and
we provide an algorithm for finding the optimal control limit.
5.3 Optimal Control-Limit Policy
In this section? we assume that E(S(.Y)IS = x) is a non-increasing function
of x, E(S(S)I.Y = z) and F ( x ) r P(H, 5 x) are continuous. and C(1) > 0.
Denote
where C(&) is defined by (3.2.9). Then, it follows from (5.2.7) and (5.2.5),
that the optimal stopping time T' has the form:
Obviously, Cr(&) -C, is the optimal expected profit for our stopping problem.
In the next lemma. we will show how to obtain V ( x ) .
Lemma 3.1. Define for n 2 1 ,
LJr) = sup E ( I * ~ ) Y ; + C,. (r:r<n)
Then {Vn (I)} satisfies the following system of equations:
,where C(x) is defined by (5.2.9) and G(z) = P(& 5 x ) . Furthermore. C;,(r)
converges to V ( x ) uniformly on [a. -4) for any a > 0.
Proof. The equations in (5.3.4) are easily obtained using the definition
of C,,(r) in (5.3.3) and dynamic programming. Let a be a positive num-
ber. Obviously, for each x > a. (C,(x)} is a non-decreasing sequence. so
that lim,,, C,i (x) exists. Denote T; the optimal stopping time for which the
supremum in (5.3.3) is attained. Since V,(x) 5 V ( x ) for all x. we have
5 B limn+, E ( ( T ( a ) - n ) I { ~ ( a ) > n ) ) = 0.
so that VJz) + V ( x ) and the convergence is uniform. The last equation
follows from Remark 2.2. This completes the p r o d
Q.E.D.
In the nest theorem. me will prove the existence of an c-optimal control-
limit policy for any € > 0.
Theorem 3.2 For ang c > 0. there exist .Ye* 5 +m such that the following i s
an 6 - optimal policy:
replace the unit on failure or at the first time the signal level exceeds .Ye* ,.
whichever occurs first.
Proof. It follows from the continuity and monotonicity of E(S(1)I.Y = x)
and F ( x ) that Vn(x) is continuous and nonincreasing in z for n 2 1. Choose
any e > 0 and define
If .Y,: = +oo for some n, i.e.. I;,(x) > 0 for all x. then C'(x) > 0 for I > 0
and the optimal policy is the policy that replaces the unit only on failure. In
this case, we can define .YE = +m.
Next, we will prove that if Xi < +cc for all n. there exists no 2 1 such
that 0 5 I'(.Y&) 5 6.
We have for n 2 1,
0 < \,'(Xi) - \JAY;)
5 BE((T(xi) - ~ ) I { T ( . Y ; ) > ~ ) )
5 BE( ( T ( S ; ) - ~ ) I { T ( , Y ; --Sn-r+m 0-
The last two inequalities follow from the fact that {Xi} is non-decreasing and
from Remark 2.2.
From (5.3.6). we can see that there is no 2 1 such that 0 < C'(Si) 5 r for
all n 2 no.
Define ,Y; = Xi, and the stopping time r,:
We will show that r, defines an c- optimal control-limit policy. It follows
from the assumptions and from Lemma 3.1 that {VJx)) are continuous and
125
non-increasing in x. non-decreasing in n. and I;,(x) --t V(x) uniformly. so
that V(x) is continuous and non-increasing in x. Since 0 < V ( X 3 5 c. we
have from (5.3.2) and (5.3.7) that T, 5 r* and
EY,. = sup,l, Ek;
= E(C:L;' c(Si)I{~>i) + C : ~ ; ' C ( - ~ , ) I { T > ~ ) ) - Cp
E l , , + c.
so that El, 2 sup,, , El; - e and hence, r, defines an c-optimal policy. -
Q.E.D.
Remark 3.3. Since \'(I) is continuous and non-increasing, the optimal stop-
ping time r' defined by (5.3.2) has the following form:
where .Yt = inf{x : \'(I) = 0). i.e., a control-limit policy is optimal.
Denote .Y* = limn,+, Sl. From (S.3.6): C'(-Y;) 2 0 and limn,,, Ip(.Yi) =
0. From this, and from the continuity of C'(x) , we have that V(S') = 0 so
that ,Ye 2 .Yt. .-\ssume that S' > S t . Then there is mo such that X,; > .I-'
and we have from the monotonicity of I.&,(x), that
which contradicts (5.3.5). Hence, the optimal control limit .Yt = X*.
126
Lemma 3.1 and Theorem 3.2 provide a computational procedure for the
control limit ,Y; for any E > 0. which can be summarized as follows.
Step 1. Choose c > 0, set n = 1. calculate C ( x ) and find
Xi = inf{x > 0 : C ( x ) = 0).
Step 2. Calculate (or estimate) c, = B E ( ( T ( S i ) - n)IiT(,s;,,n)).
If En 5 E , set S: = X; and stop.
Step 3. If E , > e. set n = n + 1. calculate
D ( ) = C ) + i;;-,(ru)F(ra)dG(a)
and
X: = inf {I : C ~ ( L ) = 0).
Go to Step 2.
We now illustrate the computational procedure by the following numerical
example.
Example. Assume that the initial signal level & = 1 and -Yi = Xi- ,d,- mi-
for i 2 1, where {d,) is an i.i.d. sequence, di .- U(1, '2) and {mi) is an i.i.d.
sequence independent of {di), m, describes a minor maintenance action in
period i, mi = 1 if no maintenance was performed and mi = 0.5 if there was a
maintenance, i.e.. maintenance improves machine condition. We assume that
and the random critical level Ui - U ( 2 , 4 ) , where U ( a , b) denotes uniform
distribution on [a, b].
The expected profit in a period given that the signal level S = x is
E(S(.Y)IX = x) = exp (-x/2). the failure penalty cost Cf = 2 and B = 1.
We have from (5.2.9).
Obviously? for 0 < z < 1. P (X < r d m ) = 0. Assume that 1 5 x 5 2. By
conditioning on m and then on d. we get
P(H < rdm) = ~ P ( H n < rd) + fP (3 < +)
= $5: P('H < xz )dz + J: P(31 < 7 ) d ; (5.3.10)
- - 1 J ~ = ( ~ - 2)& = iX + & - i 6 r 2 3 '
Finally, we have the following formula for C ( x ) :
- r / 2 , .E € (0 : 1):
To apply the computational procedure. we first find the expected value and
the variance of in<, where = md. We have
We have considered c = 0.01 and computed the optimal control limits {Xi)
for n = 1, .... 500, which takes about 3 minutes on a 586 PC computer. After 40
steps, the optimal control limit stabilized a t value 2.10. Then using (5.3.9): we
have found an upper bound for ciao, which was 0.00066. Taking into account
other possible computational errors in each computation step, we are quite
sure that the policy with critical level 2.10 is a t least a 0.01-optimal policy.
The computational results are in Table 5.1.
Table 5.1: Optimal control limits for different values of n.
5.4 Conclusions
In this chapter, we have proposed a condition-based maintenance model for sit-
uations where the observed process carries only partial information about the
system and does not necessarily exhibit monotone behavior. The optimization
problem has been formulated as an optimal stopping problem and the struc-
ture of the optimal replacement policy has been found in the general case. We
have shown that under weak monotonicity assumptions, the optimal policy is
of a control-limit type and a computational procedure has been developed For
finding the optimal control limit. Numerical results indicate fast convergence
and the policy is easily implementable. The model is suitable in situations
where the production unit is frequently monitored and information is used for
planning major maintenance activities such as an overhaul or replacement of
the unit.
Summary of Notation
a*: max(a.0)
CI: failure loss
C,: preventive replacement cost
C(.Y,): expected net profit in period n with signal level .Y.
130
D: upper bound of <
F: distribution function of 3C
G: distribution function of <
'H,: random critical level
S(.Y,): profit in period n
T: failure time
I.'(x): optimal expected net profit with signal level x
( x ) : n-th truncated optimal value function
1,: signal level at time iA
.Ye: optimal preventive replacement level
: net profit up to period n
r': optimal stopping time
: random deterioration factor
Chapter 6
A CBM FRAMEWORK
BASED ON HIDDEN
MARKOV MODELS
6.1 Introduction
In recent years, Condition=Based Maintenance (CBM) is gradually gaining
its popularity in the reliabi1ityJmaintenance area from both practitioner and
researcher's perspective (see two recent survey papers on maintenance. Scarf
(1997) and Dekker and Scarf (1998): for more details).
In industrial practice. the engineering aspect of CBM has been undergo-
ing rapid development for decades. Many maintenance information systems
have been developed and are commercially available, with emphasis on con-
dit ion monitoring, fault detection, diagnosis and automation. Typical con-
dition monitoring techniques includes vibration monitoring, and oil analysis.
Equipped with these maintenance information systems. large amount of data
become available. which provides great potential for improving maintenance
performance.
Yet. at present time, most of these systems serve merely as maintenance
databases, used only for producing simple statistics for management report-
ing. Maintenance decision making are in general still based on field experts'
experience, which are normally not quantitatively justified. For those systems
that incorporate maintenance optimization features: the policies in most cases
remain age-base policy type or level-crossing type. which are too rudimentary
with respect to the rich availability of information.
In the research community. CBhl is not a new concept. Relevant work
can be found under the titles of CBhl. information- based maintenance. predic-
tive/proactive maintenance. etc. Yet it seems to me that there is a latency for
the theoretical development in the management aspect of maintenance. espe-
cially in maintenance optimization. -1s a result. a gap between the engineering
aspect and the management aspect of maintenance exists. Fortunately. the
common awareness of CBkI concept now serves as an umbrella that enables
researchers to share and combine strength in this promising area.
It is beneficial to view the general maintenance optimization problems.
which is the main theme of this thesis, from the CBM perspective. The
essence of CBM is simply utilizing available information to support optimal
maintenance decision making. In this sense, any maintenance system has to
be condition-based.
We now summarize this chapter as follows.
In Section 6.2, we provide a literature survey. focusing on various mathe-
matical methodologies and models that are related to CBM optimization.
In Section 6.3, we propose a CBM model which matches the abstract CBM
framework closely. This model is then transformed to a simpler form in Section
6.4, and finally solved in Section 6.5.
6.2 Literature Survey
The French school of "general theory of stochastic processes'' provided the
theoretical foundation for CBhI mathematical modeling and optimization. For
good survey papers see Arjas (1989) and Jensen (1996). While major efforts
in this category are on statistical analysis and filtering of stochastic processes.
there is also a considerable amount of research on maintenance optimization.
which is quite often based optimal stopping theory.
Based on the concept of filtration (as well as subfiltration)! different levels
of information can be considered. Transformation from high to low level infor-
mation can be carried out by applying the " projection theorem". which cor-
responds to estimating system condition based on partial information. Hein-
rich and Jensen (1992) considered in detail an optimal replacement problem
for a two-unit non-repairable system with different information levels; Jiang
and Cheng (1995) applied this approach to single-unit repairable systems. and
conducted policy optimization and policy comparison for a dozen well-known
policies based on the information level each policy can utilize.
The concept of stopping time is defined based on the filtration. The optimal
stopping time implies that full information in the filtration has been utilized.
Thus this approach has advantage over those methods in which policies are
limited to presumed forms, such as well-known age replacement. block re-
placement. and so on. Interestingly enough, many of those well-known policies
derived from intuition are indeed optimal stopping rules with respect to the
properly selected filtration. Standard references on optimal stopping theory
are Chow et al. (1971) and Shiryayev (1978). For its application to reliabil-
ity/maintenance, see e.g. Bergrnan (1978) and Jensen (1989).
Optimal stopping rules often take the control-limit form, which is intuitive,
easy to calculate and easy to implement in practice. In particular, when the
system has certain monotonicity property, the control-limits are easy to obtain
and have the following intuitive meaning, i.e., when the loss (rate) is larger
than the gain (rate), then stop the system, otherwise, repair (or continue) it.
This is the reason that almost all calculated models using optimal stopping
assume certain kinds of monotonicity property. (see eg. . h e n (1983), h e n
and Bergman (1986).
One interesting paper that incorporates both 4Iarkov-modulated process
state estimation and optimal stopping is by Jensen and Hsu (1993). In that
paper, a weak form of monotonicity assumption is used. i.e.. the monotonicity
assumption is replaced by a submartingale property. -4 stopping rule with a
weak sense optimality is derived. i.e.. an optimal stopping time with respect
to a subfiltation. which means less information is utilized. This paper provides
a good example of providing a suboptimal policy when the optimal one is
not easy to generate. Also, it provides a good example of combining state
estimation and replacement optimization.
The level-crossing approach is well accepted by the industrial community.
Even though it may not always provide the optimal solution, it is well under-
stood and is easy to implement.
Level-crossing approach refers to the following scenario: a set of variables?
which are selected using technical considerations, are observed, and a repair or
replacement is initiated whenever any of them exceeds a preset control limit.
Normally, these monitored variables are certain measurements that reflect the
wear or damage degree of the system. Commonly used measurements are
vibration monitoring and oil analysis among others. For properly selected
variables, they provide informative indication about the system's condition.
.A considerable number of level-crossing related models exist. such as dif-
fusion processes (observable) with drifts. and discrete time independent incre-
ment models. The random failure limit can be incorporated to emphasize that
the observed signals are just partial information Jiang et al. (1998). See also
Christer and Wang (1997). where the determination of optimal inspection is
the major concern.
Accelerated aging models can also be thought of as special cases of the level-
crossing model, in which a transformation (maybe random or deterministic)
from age to accumulated stress are used to describe the random environment
effects, see e.g. Doksum (1991).
Marked point processes are suitable for modeling the shock processes where
the underlined counting processes represent the number of occurred shocks,
and the marks represent the damage degree of each shock. For the first work
of such kind see Taylor (1975). where the optimal replacement policy is proved
to have a level-crossing form. See also Arjas (1989) for a comprehensive theo-
retical summary.
Proportional Hazards Model (PHM) is another widely accepted model
which incorporates both age information and condition information (covari-
ates) in the most natural manner. .A few maintenance policies have been
developed for these kinds of models, among which. Makis and Jardine (199%)
derived the optimal replacement policy for a model with equal inspection in-
tervals. Kumer and Westberg (1997) combined this model with a TTT plot
approach.
Some extensions of age-based models. such as group maintenance models.
random repair cost models and general repair models can be also viewed as
CBbI models in the sense that additional randomness is introduced in the
modeling of system deterioration. and consequently. the optimal maintenance
decision is based on the system condition instead of the calendar time. The
group replacement model is an extension from single-unit system to rnulti-unit
system, where components in a system are mutually dependent because any
replacement of component is subject to a fised installation fee in addition to
the cost of replacing each component. For a good survey. see Van der Du-m
Schouten (1996). In this class of models, the age information of all components
forms an entire information database and good maintenance should properly
utilize it.
The random repair cost model is a single-unit system model with random
repair cost as the information additional to its age. As economic considera-
tions are important for maintenance practice, such kind of generalization is
appealing in practice. In addition. under certain monotonicity assumptions
representing the deterioration of the system, it can be proved that the repair-
cost-limit policy is optimal? i.e., when the repair cost exceeds an age-dependent
limit, then replace the system. otherwise. repair it. This cost limit has a very
intuitive meaning, i.e.. it is the residual cost of the system after the repair.
The preventive replacement time can also be expressed in terms of residual
value. i.e.. it is the time when residuai value decreases to zero. See Chapters
2, 3 for more detailed information.
General repair concept was first introduced by Kijima et al. (1988)? and
several maintenance models have been proposed, see e.g. Kijima et al. (1985)
for s periodic replacement policy: Makis and Jardine (1991) for the optimality
of T-policy: and Stadje and Zuckerman (1991) for the optimality of a bang-
bang policy when the repair degree is a decision variable.
This concept generalizes the concepts of minimal repair and replacement.
because it assumes that the general repair improves the condition of the re-
paired system to a certain degree better than that after a minimal repair. but
worse than that after a replacement. A concept of virtual age can then be
defined as a function of the real age and the repair degree, which directly rep-
resents the condition of the system. Therefore it is natural to view the virtual
age concept in the domain of CBLI. see Chapter 5 for a comprehensive opti-
mization model based on general repair, which incorporates features such as
random repair cost. preventive replacement/failure replacement and optimal
stopping.
Time series and state space approach represents a wide and mature area.
including optimal filtering and control which can be applied to CBM area.
A typical paper following this direction is by Christ Sr Wang (1997), where
Kalman filtering is used for predicting the residual life and a suboptimal re-
placement policy is derived based on the prediction. The major feature of this
model is that there are two processes involved, an observation process and a
state process, which suggests a more general framework, partially observable
process modeling and control.
-4 promising approach is the hidden Markov model(H1LllvI). which is also
called partially observable 4larkov decision process (PO bIDP). Developed in
early 60's. it has shown wide range of applications in engineering area. Typical
applications include speech recognition, fault diagnosis, demodulation, robotic
control. artificial intelligence. etc. For standard theoretical reference. see El-
liott (1995). There has been some research works on maintenance optimization
based on HLIM, see eg. Fernandez-Gaucherand et al. (1991) and Smallwood
and Sondik (1973), and Hernandez-Hernandez et aL(l999). However, mainte-
nance models in these works were mainly used as illustrative examples. More
modeling work has to be carried out to transfer the theoretical developments
into practical applicable results.
Besides aforementioned research which can be safely classified as main-
tenance models, there are also interesting works from other areas, such as
survival analysis. sequential analysis. signal processing? diagnosis and esperi-
ment design among many others. see eg. Saaty and Vargas (1998): and Zhou
et aL(1996). Broader survey into literature is highly appreciated for CBSI
research.
In this Chapter. we propose a CBbl framework based on HMhI. The HMhI
model we use has the form of continuous time horizori and discrete observation.
which has not been seen in the literature. We expect that this model will be
a starting point of a continuous effort for CBSI modeling and optimization in
the HMM framework.
6.3 Model Description
We make the following assumptions.
1. System Dynamics: the system operates in one of iV unobservable states
(1,2. ..., N ) = SS? over a continuous time horizon. Denote the state at
time t as St. Then, (&) forms a right continuous homogeneous Markov
chain with transition rates
P ( X h = jlX0 = i) qj i = lim
h < ~ , Z # ~ E S ~ ,
h+O+
The system failure at random time 5 is self-announced, and the hilure
rate in state i is pi c m. i E S".
2. Observations: measurements are taken at discrete times kL. k = 1. '2. ....
with value k; E {I. 2. .... .\.I) = s', satisfying
3. Cost Structure: the profit rate in state i is Ci, and the cost of system
failure from state i is Ki for i E S".
4. Maintenance Actions: preventive replacement and failure replacement are
considered.
5. Objective Criterion: find the preventive replacement policy maximizing
the expected net profit over the system's lifetime.
142
To simplify the presentation. we define the extended state space and ob-
servat ion space
3' = S'U {W) , M' = 1L.I + 1: where
N': Mf represent the failure state and the failure signal, respectively.
it' In addition, we denote for i E SS. qilvt = 0, q ~ ~ ~ = pi , qii = - x j = l + j + i qji -
pi and the state transition matrix
Similarly, denote for i E SS.j E SF'. DApLvt = 1. DiLIfi = DjNl = 0. and the
observation matrix
D = (Dji) , t l tx ,~t . (6.3.4)
Denote also for j E S'. D, = ( D j I , .... DjLvt). and diag(D,) as the matrix
with diagonal equal to Dj, and the remainder of the elements equal to 0.
Denote the net profit rate vector
where Ti = C* - Kipir i E SS.
Finally, we define
Let (0. T! P ) be a complete probability space, ( X t ) , (Y t ) , and ( G t ) be the
(complete) natural filtrations generated by stochastic processes .Yt, I;(, and
both _Yt and x, i.e.,
For an!
xt UX.
i filtration FL, a (&)-stopping tim e T is a random vari abl
R+ u {cc} with {T 5 t ) E 3; for all t E R,.
With the above assumptions and notations. we have the following expres-
sion for the total net profit over time interval (0. t A (1
The optimization problem can now be formulated as follows.
Find an (Yt ) -stopping time T ' . if it exists. maximizing the total expected
net profit over the system's lifetime
where C = {r 1 (Yt) - stopping time).
6.4 Problem Reduction
In this section, we will follow several steps to transform the original optimiza-
tion problem (6.3.9) to the format which is easy to solve.
First, as the objective function of (6.3.9) Zt is not (yt)-adapted7 we turn
to consider the following maximization problem
where Z = E(Zt IYt). is the conditional expectation of Zt with respect to
filtration (Yt), and it is (yt)- adapted.
Directly from the definition of conditional expectation. (see e.g. Elliott
(1992), P3.), we have for r E CY.
EZT = E(E(ZTlyT)) = ~ 2 , . (6 .42)
Therefore, the optimization problems (6.3.9) ! and (6.4.1) are equivalent.
Xow we need the following definition and lemmas to characterize 2,.
Definition 4.1 [Jensen (1959)] A process Z is called a smooth semimartingale
(F-SSM) zf it has a decornposition Zt = Zo + i,' fsds + .\It where ( ft) is a real
progressively measurable process with respect to filtration F , E($ I fs 1 ds) < m
for Vt E R,, E(Zol < m and M = (MJ as a martingale with paths which
are right-contznuoust have left limits and start with Mo = 0. Short notation:
z = ( f : M ) .
Lemma 4.2 [Projection Theorem (Van Schuppen (1977). Bremaud (1981).
and Kallianpur(l980)] Let Z = (1, JM) be an F-SSM, and A = (At) a subfl-
tration of F . T h e n 2 = ( j , M ) is an A-SSM. where
i) . it is A- adapted and it = E ( Z t ]At), Qt E R+:
ii). it is A-progressive with jt = E( f t l A t ) for almost all t E R+: (Lebesgue
measure);
iii). M is A-martingle.
I n addition. if Zo and $; 1 fJds are square integrable and .CI is a square
integrable martingale. then the same properties hold true for the corresponding
t e n s &. 1jJds. and M reespectivelg.
Lemma 4.3 A (YJ-stopping t ime r has the following representation
where a is a n (U,) -stopping t ime, 'H, = CT{(&, I{<>.); k 5 n} , s 5 kL. To
i s o constant and T,, 2 nL is R,-measurable for n 3 1. W e define nrL n 1. Conversely, for any (RJ-s topping time a? conatunt To 2 0 and Rn-measurable
functions T, 2 nL, n = l , 2 . 3 .... T(O: {T, ) ) defined b y (6.4.3) 2s an (YJ-stopping
t ime.
Lemma 4.3 can be proved in exactly the same way as lemma 4.4 in Chapter
4. Intuitively, rr and T, correspond to the preventive replacement between
observation epochs, or at observation epochs, respectively.
With Lemma 4.3, it is clear that the preventive replacement decision is
made only at observation epochs, with r, for immediate replacement, and r~
for planning the preventive replacement before the next observation epoch.
Lemma 4.4 We have the following (yt)-SSM representation for it a-l i-L ( t + L ) L G r . PS
z t = E II I ( ( ~ + ~ I L ~ T 1 lL I - P ~ ( L V ~ ) ~{<,s}ds + m(t), i=O j =O
where P, = E(Is lys ) is obtained iteratively as follows.
Ps = q a l ~ l ~ . E S P ( ( s - [s /L]L)Q) .Vs # iL. and
Proof. First we show that Zt has a (GI)-SSXI representation as follows.
t i = Sol I {z ,=i )~ .~ ,ds + m2 ( t ) ,
= $Is( i )pids+ m2( t ) :
Therefore,
Zt = ~ ~ L $ ~ C i I s ( i ) d ~ - ~ ~ , ~ i $ , L ~ s ( i ) C I X , d ~
= J ~ ~ ~ ~ ~ ( c ~ - ~ ~ K ~ ) I ~ ( i ) d s + m ( i )
= Ji r I,ds + m(t) .
Clearly, from the definition of (y,), we have
where e:, = (0: ..., 0, l ) l , , ~ t .
For kL < t < (k + 1)L.Vk 2 1.
dPt d t - = lim E(It+h - I t l Y ! t / ~ ! ~ ) / h
h+O
= QPt.
Consequently, we have P, = PkL EX P ( ( s - iL )Q) .
For t = kL. Pt corresponds to the Bayesian posterior distribution of the
system state with the prior distribution equal to PC-. and the observation equal
to k;. In fact, For k; = JI i E S S : we have
In matrix form, we have
This completes the proof.
Q.E.D.
Finally, notice that ( Y J - stopping time T is dominated by failure time
. From the well-known Optional Stopping Theorem (see e.g.. Elliott (1982).
P36.), we have Erh, = Erho = 0. and consequently. the optimization problem
( 6 . 4 . ) is equivalent to the following one, where the martingale part fi, is
removed from it, i.e..
In the next section. we will see that the above problem is ready to be solved
within the Markov framework.
6.5 Optimal Policy
From the proof of Lemma 4.4, we have seen that E ( I t l y t ) forms a piecewise
deterministic process (PDP) with jumps a t discrete times iL, i = 1: 2. ... .As
the consequence of the strong Markov property of PDP (see Davis (1995)):
we may restrict TT and T, to depend only on the posterior distribution a t the
149
last observation epoch without loss of optimality. Therefore. we establish the
dynamic equation as follows,
where we denote V ( P ) as the value function of distribution P.
To further simply the equation, we extend the variable space of the value
function I-' to un-normalized measure space, i.e. IlPll = E:'.!~ P( i ) need not
equal to 1. Define
Then, we have
mau(Vi(qo), V 2 ( q 0 ) ) :
with q, = EXP(sQ)qo For 0 < s 5 L.
Define operator T on function space C:, which is formed by the continuous
functions defined on 1 lqJ I 5 1, and satisfying (6.5.2) as follows,
It is clear that T is a contraction operator. In fact, for 1 lqll = 1:
From this contraction property. we have
V ( q ) = lirn,,,T"(~") ( q ) = limn,, i&+l ( q ) , (6.5.7)
where V1 is defined in (6.5.3).
Consequently. the value function V can be obtained by applying the foi-
lowing algorithm within any accuracy c.
Algorithm.
Step 0. = 0;
Step n. Vn = T(Vn-,);
Stopping Rule. Stop a t K if
Now, with the value function V(q), the optimal policy can be described as
follows:
.At each observation epochs. say time kL E {lL. 2L, ..., nL, ..).
If V(qk ) 5 0. then replace the system immediately.
If bP(qk) > 0, then if VL(qk) 2 CW2(qk), then run the system till
t
r* = nry rnirt(1 rq3ds).
Otherwise, run the system till the next observation epoch. i-e.. time (k + 1 ) L .
Now we are ready to show the convexity property of C'(q), which implies
the convexity of replacement region 0 = {qJL ' (q) 5 O ) , and the convexity of
O2 = {qlV2(q) 0) > 0, which is the region that the system need to be
replaced before the next observation epoch.
Lemma 5.5 CP(q) is a convex function.
Proof. We will use the mathematical induction method.
i). Case K = l . We want to shour that
is convex. In fact: for any positive measures Po, Qo such that I I Po 1 1 . 1 lQo 1 1 5 1.
ii). Case K=n. .Assume the convexity holds for C,(q).
iii). Case K=n+l. We have recursion
The operator "rnau" is convexity preserving operator. 1;: are convex func-
tions from i), ii). J: r ~ d s is linear. and consequently, convex. Finally, d i a g ( D j ) q L
is linear. and therefore conves. Hence. we obtained the convexity of C,+I, i.e..
iv). Case K ---+ +oo. As we have shown, VK + I.' in (6.5.'i), then from
(6.5.12), we get the convexity of the value function V ( q ) .
Q.E.D.
Example.
Suppose a system has 3 working states, and with Q: D and r defined as
follows.
Then, the value function V and the finite-stage value functions C, are shown
in Figure 6.1.
Figure 6.1: Optimal values G',.
Figure 6.2: Optimal preventive replacement policy.
The optimal replacement region and the finite-stage optimal replacement
region are shown in Figure 6.2.
The last part of this section will focus on the optimality of discretized poli-
cies, i.e.. the preventive replacement is not carried out between observations.
Theorem 5.6 Assume that
and the observation is non-trivial, ie., there exists i E S S . j E S' such that
then there ezists an > 0 such that for the system with obseruation interval
L < Z, the optimal policy (for !V 2 2) is a discretized policy, i-e.,
replace the system immediately;
or, run the system till the next observation epoch.
Proof. From the structure of the optimal policy, we notice that the optimal
preventive replacement between observation times occurs only a t T' . i.e.. the
first time such that
P;!' E B, (6.5.16)
where
is clearly an LV - 2 dimensional simplex.
We need to prove that
Once (6.5.18) holds, then for L small enough, the possible loss in one observa-
tion interval 1; rP,ds + 0? and the prior distribution a t the next observation
time is about the same as the present one. Therefore, as long as the value func-
tion is uniformly bounded from zero for P E B, the optimal policy is to run
the unit to the next observation time, and determine whether replace it or not
at that time based on new Y observation.
By the same token, it is clear that we need only to prove that V p B.
156
there exists j . such that
r diag(Dj) P > 0. (6.5.19)
We prove (63.19) by contradiction. Notice that from (6.4.17), xy=l rdiag(D,)P =
xy==, r P = 1. Therefore, we have that V p E B , j E S',
because otherwise. (6.5.19) is true for same j E SY.
But as B is a .V - 2 dimensional simplex in an .V - 1 dimensional linear
space. we have that r and r diag(D,) are parallel for all j E S'. i.e.. V j E SS'
Consequently, V j E S'..
which implies that
This is a contradiction to (6.5.15). which completes the proof.
Q.E.D.
In addition, if the observation matrix is trivial, then it is clear that the
optimal policy is an age replacement policy with preventive replacement time
T* satisfying
T T* = org min rE.YP(tQ) Podt.
This gives the classic age-replacement models a new interpretation as a
trivial case of CBM models.
Summary of Notation
C,: profit rate
D,,: condition probability of the observation y
S: filtration of system state and observation
It: indication function of the system state
K1: failure cost
L: observation interval
P,: estimation of the system state
qji: system state transition rate
ri: net profit rate
Say: system state space
SY: observation space
T: operator that defines value iteration
Cw(P): value function of initial distribution P
\ ( P ) : n-step (optimal) value function of initial distribution P
.&: system state a t time t
X: filtration of the system state
k:: observation at time nL
y: filtration of the observation
Zt: total net profit over up to time t
2,: estimation of the net profit over up to time t
r': optimal stopping time
pi: failure rate at state i
<: failure time
C Y : set of Y-stopping time
Chapter 7
SUMMARY AND FUTURE
RESEARCH DIRECTIONS
7.1 Summary
It is natural to identify the following relationship between maintenance termi-
nologies and mat hemat ical concepts:
Replacement o Stopping Time
Condition Monitoring e Filtration
Information Processing Filtering
CBM Optimization Optimal Stopping.
These correspondences lay the foundation of our mathematical modeling
and optimization for maintenance systems.
A common modeling framework repeat themselves in the models presented
in this thesis. It is our expectation that this common framework, together with
its supporting mathematical techniques will grasp the essence of this subject
better than a pool of concrete models.
Modeling framework
The maintenance models considered in this thesis
following 6 attributes:
1) Time Horizon
2) . Deterioration Dynamics
3). 1Iaintenance .-\c t ions
4). Cost Structure
5). Information Level
6). Optimization Criterion.
can be defined by the
This framework can be also thought as the framework for CBbI without
loss of generality. This is because from CBM's perspective. any maintenance
optimization has to be condition-based.
Optimization procedure
The generic optimization procedure we developed in this thesis can be
summarized by the following 6 steps.
Step 1. X -maximization technique
This technique is applied to average cost and discounted cost criteria for
the following two reasons:
i) . to transform the original objective function to an additive function,
to which results from the optimal stopping theory can be applied.
ii). A large amount of computational complexity is absorbed into the
parameter A. therefore. more insights is gained on the structural results
and less computational effort is needed to obtain numerical results. In
addition, the value of X in the dynamic equation is exactly the optimal
value of the original objective function.
Step 2. Characterization of stopping times of jump processes
This result is developed for general jumping processes in continuous time.
The purpose of its application here is to separate two optimal stopping
problems. failure replacement and preventive replacement, without loss
of opt irnality.
Step 3. Smooth semi-martingale decomposition (SSZVI)
This technique is essentially to separate the informative trend of deterio-
ration with the non-informative randomness (the martingale part), and it
allows one to consider only the trend part without loss of the optimality.
Step 4. Subfiltration SSM projection
This result is used to estimate the true system condition based on the
given information level. Consequently, the maintenance policy with re-
spect to that information level is then optimized based on this estimation
instead of the unobservable system state.
Step 5. Dynamic programming approach
With steps 1-4. the model is equivalently transformed to a Markov model.
and can be solved through dynamic programming approach. Therefore.
the optimal !vIarkov policy is guaranteed to exist and possesses the opti-
malit); in the whole stopping time class.
Step 6. Approximation with Truncated Problem
This is a standard procedure to solve the dynamic programming equation
with continuous time or infinite discrete time horizon,
While not every model needs to go through all the steps, the integration
of these steps provides a general procedure to solve a wide class of problems
using the aforementioned modeling framework.
163
7.2 Future Research Directions
In general, the future reliability/maintenance research will proceed along the
following two mutually-dependent directions:
1. Theoretical development
2. Theorylpract ice interactions
Theoretical development has been the focus of this thesis. One major issue
that was not addressed here is the statistical issue. Certainly, the statistical
issues are of fundamental importance for the maintenance optimization re-
search. The interactions between information processing and control impose
additional challenge. which are of theoretical interests. For relevant research
work related to this aspect. see Pena et al. (2000) on weak convergence of
recurrent and renewal data. and Jensen and Wiedmann (2000) on the analysis
of dependent censoring.
It is natural to think of the adaptive control scheme that merges the statis-
tics and optimization procedures into a recursive one. .-\ recent work by Bur-
netas and Katehakis (1997) found an adaptive control policy that has the
optimal convergence rate to the optimal policy for finite-state Markov Deci-
sion Processes. It would be a very interesting and challenging problem to find
out whether a similar result can be established for the Hidden hlarkov Models.
In addition, the computational issues for HMM are far from trivial, and
there is a definite need for further investigation. It is expected that more
specific structural results can be obtained for HMM optimal stopping problem.
which consequently would reduce the computational burden significantly.
HMhI is appealing also because of its potential of applicability. The fol-
lowing two other practical scenarios. which might also be thought as CBbI in
a broad sense, can be properly modeled by HMLI to certain extent.
1. software reliability/maintenance
2. communication network management.
First. the following duality between software reliabilityfmaintenance and
CBM can be easily identified: profit of software release vs. failure loss: release
time vs. replacement time: reliability growth vs. condition deterioration:
debugging vs. (general) repair: bugs vs. unobservable system defects, etc. I t
is expected that similar modeling framework and optimization techniques can
be applied to this scenario.
In the communication network management scenario, due to the layered,
distributed, and hierarchical nature of networks, the network administrators
constantly face the information of great volume, pet still incomplete, delayed
and error prone. Certain decisions related to controlling and managing the
network have to be made based on the estimation of system's current true
condition instead of the available information directly. Again. the generic
optimization procedure - condition monitoring, information processing, and
decision making, is valid.
Certainly, whether the decision making is an optimal stopping problem or
not depends on the nature of the concrete applications to handle.
Ultimately, the practical application is the driving force for the theoretical
development. But unfortunately. the procedure to transfer the theory to prac-
tice is more complex than it seems to be. Intense additional efforts. including
model interpretation, implementation. verification and validation, are required
to make the optimization models work properly.
Many theoretical developments. such as papers in academic journals or
presentations at conferences! are not immediately available for application.
The most welcome format of these developments from practitioners' point of
view are their implementations as software/hardware systems. So, proper
balance between the efforts of abstraction and user-friendliness have to be
carefully maintained in any researcher's mind.
Bibliography
[I] Arjas E. (1989). Survival models and martingale dynamics. Scand. .I
Statist 16. 177-225.
[2] h e n . T. (1983). Optimal replacement under a minimal repair strategy -
a general failure model. Advances in Applied Probability 15. 198-211.
[3] h e n . T. ( 1996). Condition-based replacement policies-a counting process
approach. Reliability Engineering and System Safety 51. 275-281.
[4] h e n , T. and B. Bergman (1986). Optimal Replacement times - a General
Set-up. Journal o j Applied Probability 23. 432-142.
[5] Bai, D.S. and W.Y. Yun (1986). An age replacement policy with minimal
repair cost limit. IEEE Transactions on Reliability R-35, 452-455.
[6] Bauter. LA.: M. Kijima and 11. Tortorella (1996). h point process model
for reliability of maintained system subject to general repair. Stochastic
Models 12. 37-65.
[7] Barlow R.E.. C. -1. Calarotti and F. Spizzichino (ed.) (1993). Reliabilit y
and Decision Making. Chapman & Hall.
[S] Beichelt, F. (1993). A unifying treatment of replacement policies with
minimal repair. Naval Research Logzstics 40, 51-67.
[9] Berg, M.. h1. Bienvenu and R. Cleroux (1986). Age replacement policy
with age dependent minimal repair. INFOR 24, 26-32.
[lo] Bergman! B. (1978). Optimal replacement under a general failure model.
Advances in Applied Probability 10, 43 1-451.
[ll] Block. H.W.. W.S. Borges and T.H. Savits (1985). Age-dependent mini-
mal repair. Journal of Applied Probability 22, 370-385.
[12] Bremaud, P. (1981). Point Processes and Queues: Mart ingale Dynamics.
Springer-Verlag, Berlin.
[13] Burnetas. A.N. and 4I.N. Katehakis (1997). Optimal adaptive policies
for Markov decision processes. Mathematics of Operations Research 22.
[I41 Chow, Y.S.. H. Robbins. H. and D. Siegmund (1971). Optimal Stopping
Theory. Dover Publication Inc.. New York.
[15] Christer AH.: W. Wang and J.Y. Sharp (1997). A state space condi-
tion monitoring model for furnace erosion prediction and replacement.
European Journal of Operational Research 101, 1-14.
[16] Christer A H . and W. Wang (1995) , -4 simple condition monitoring model
for a direct monitorig process. European Journal of Operational Research
82, 258-269.
[17] Cleroux. R., S. Dubuc and C. Tilquin (1979). The age replacement prob-
lem with minimal repair and random repair costs. Journal o/ the Oper-
ations Research Society of America 2?(6), 1158-1 167.
[18] Dagpunar, J.S. (1998). Some properties and computational results for a
general repair process. Naval Research Logistics 45. 391-405.
[19] Davis. M.H.A. (1993). bIarkov Models and Optimization. Chapman Si
Hall, London.
[20] R. Dekker and P. Scarf (1998). On the impact of optimization models in
maintenance decision making: the state of the art. Reliability Engineer-
ing and System Safty 60, 111-119.
[21] Doksurn, K. (1991). Degradation rate models for failure time and survival
data. C WI Quarterly 4, 195-203.
[22] Drinkwater, R.W. and N.A. J. Hastings (1967). An economic replacement
model. Operational Research Quarterly 18, 12 1-138.
169
Elliott R. (19%). Hidden Markou Models. Springer-Varlag.
Fernandez-Gaucherand, E., A. Arapostat his, and S.I. Marcus (199 1). On
the average cost ooptirnality equation and the structure of optimal poli-
cies for partially observable Markov decision processes. Annals o/ Oper-
ations Research 29, 439-470.
Hartman P. (1964). Ordinary Dzflerential Equations. John Wiley Sr Sons.
Inc., New York.
Halasz, 41.. F. Dub. R. Orchard and R. Ferland (1999). The integrated di-
agnostic system (IDS): remote monitoring and decision support for corn-
mercial aircraft - putting theory into practice. http://ai.iit .nrc.ca/IR-pu blic
/ids /papers/aaai99idspaper.pdf.
Hastings, N.A. J. (1969). The repair limit replacement method. Opera-
tional Research Quarterly 20(3) : 37-349.
Heinrich, G. and U. Jensen (1992). Optimal replacement rules based
different information levels. Naval Research Logistics Quarterly 39 937-
955.
Hernandez-Hernandez, D, S.I. Marcus and P. J. Fard (1999). Analysis
of a risk-sensitive control problem for hidden Markov chains. IEEEE
Transactions on Automatic Control 44, 1093-1 100.
[30] U. Jensen and G.H. Hsu (1993). Optimal stopping by means of point
process observation with applications in reliability Mathematics of Op-
eration Research 18, 645-657.
[3 11 Jensen U. (1996). Stochastic models of reliability and maintenance.
In Reliability and Maintenance of Complex System. Ozekici S . (ed.);
Springer.
[32] Jensen, U. (1989). Monotone stopping rules for stochastic processes in a
semimartingale representation with applications. Optimization 20. 837-
852.
[33] Jensen. U. and d. Wiedmann (2000). Estimation of survival curve under
dependent censoring. . Abstract book of the Second International Con-
ference on Mathematical Methods in Reliability - Methodolodge, Practice
and Interference. Bordeaux, France, July 2000.
[34] Jiang, S. and K. Cheng (1995). On the optimality and comparison of
some standard maintenance policies. Operations Research and Its Appli-
cations. World Publishing Corporation, Beijing.
[35] Jiang X. V. Makis and A.K.S. Jardine (1998). IMA Journal of .Mathe-
matics Applied in Business k Industry 9: 201-210.
[36] Jiang, X., K. Cheng and V. 4Iakis (1998). On the optimality of repair-
171
cost-limit policies. Journal of Applied Probability 35. 936-949.
[37] Jiang, X., V. Makis and A.K.S. Jardine (2001). Optimal RepairfReplacement
Policy for a General Repair Model. Advances in Applied Probability, to
appear.
[38] Kijima, Lf. (1989). Some results for repairable systems with general
repair. Journal of Applied Probability 26, 59-102.
[39] Kijima. hf . ? H. Morimura and Y. Suzuki (1985). Periodical replacement
without assuming minimal repair. European Journal of Operational Re-
search 37. 194-203.
[40] Kijima. .LI. and U. Sumita (1986). A useful generalization of renewal the-
ory: counting processes governed by nonegative Markovian increments.
Journal of Applied Probability 23: 71-58.
[41] Kobbacy K.A.H.. N.C. Proudlove and M.X. Harper (1995). Towards
an intelligent maintenance optimization system. .Journal of the Optimal
Research Society 46, 831-853.
[42] Kumer D. and D. Westberg (1997). Maintenance scheduling under age
replacement policy using proportional hazards modeling and total-time-
on-test plotting. European Journal of Operational Research 99, 507-313.
[43] L'Ecuyer and P.. A. Haurie (1987). The repair vs. replacement problem:
a stochastic approach. Optimal Control Application and Methods 8 , 219-
230.
[44] Love C.E. and R. Guo (1991). Using proportional hazards modeling in
plant maintenance. Quality and Reliability Engineering International 7,
[45] Makis, V. and A.K.S. Jardine (1992a). Optimal replacement in the pro-
portional harzards model. INFOR 30. 172-183.
1461 Makis. V. and A.K.S. Jardine (1992b). Optimal replacement policy for a
general model with imperfect repair. Journal of the Operational Research
Society 43. 111-120.
[47] blakis. V. and A.K.S. Jardine (1993). A note on optimal replacement
policy under general repair. European Journal of Operational Research
69, 75-82.
[48] Makis V., X. Jiang and K. Cheng (2000). Optimal preventive replace-
ment under minimal repair and random repair cost. Mathematics of
Operations Research 35, 111-156.
1491 Mitchell J.S. (1981). An introduction to Machinery Analysis and Moni-
toring. Pennwell Publishing Company? Tulsa, Oklahoma.
Ozekici. S. (ed.) (1996). Reliability and Maintenance of Complex Sys-
tems. Springer.
Park, K.S. (1983). Cost limit replacement policy under minimal repair.
Microelectronics and Reliability 23, 347-349.
Park, K. S. (1985). Pseudodynamic cost-limit replacement models under
minimal repair. Microelectronics and Reliabilzt y 25, 573-579.
Pena. E.. R.L. Strawderman and Y. Hollander (2000). A weak con-
vergence result in recurrent and renewal models. Recent Advances in
Reliability - Methodolodge. Practice and Interference. Limnios Y. and
&I. Nikulin (ed.). Birkhasuser.
Pierskalla. W. and J. Voelker (1976). .-\ survey of maintenance models:
the Control and surveillance of deteriorating systems. Naval Research
Logistics Quarterlg 23. 353-388.
Saaty T. and L. Vargas (1998). Diagnosis with dependent symptoms:
Bayes theorem and the analytic hierarchy process. Operations Research
46, 491-503.
Scarf P. (1997) On the application of mathematical models in mainte-
nance. European Journal of Operational Research 99, 493-506
[57] Scarsini M. and ?vI. Shaked (2000). On the value of an item subject to gen-
eral repair or maintenance. European Journal of Operational Research
[58] Sherif, Y. and iLI. Smith (1981). Optimal maintenance models for systems
subject to failure: a review. Naval Research Logistics Quaterly 38. 47-74.
[59] Shiryayev, A.N. (1978). Optimal Stopping Rules. Springer, New Yorork.
[60] Smallwood S. and E.J. Sondik (1973). The optimal control of partially
observable Markov processes over a finite horizon. Journal of Operations
Research 21 1071-1088.
[61] Van der Duyn Shouten. F.. (1996). Maintenance polices for multicompo-
nent systems: an overview. In Reliability and Maintenance of Complex
System. Ozekici, S. (ed.), Springer.
1621 S tadje, W. and D. Zuckerman (1991). Optimal maintenance strategies
for repairable systems with general degree of repair. Journal of Applied
Probability 28, 384-396.
[63] Stadje, W. (1994)- Maximal wearing-out of a deteriorating system: an
optimal stopping approach. European Journal of Operational Research
73, 472-479.
[64] Taylor, H.M. (l975). Optimal replacement under additive damage and
other failure models. Naval Research Logistics Quarterly 22, 1- 18.
[65] Valdez-Flores, C. and R.M. Feldman (1989). A survey of preventive
maintenance models for stochastically deteriorating single-unit systems.
Naval Research Logistics 36 (4), 4 19-446.
[66] White, D.J. (1989). Repair Limit Replacement. OR Spektrum 11. 143-
149.
[67] Zhang, Z.G. and C.E. Love (2000). A simple recursive blarkov chain
model to determine the optimal replacement policy under general repairs.
Computers 8 Operations Research 27. 32 1-333.
[68] Zhou. H.. L. Qu and A. Li (1996). Test sequencing and diagnosis in
electronic system with decision table. Mzcroelectronics Reliability 36.
116'7-1175.
[69] Zuckerman, D. (1978). Optimal stopping in a semi-Markov shock model.
Journal of Applied Probability 15, 629-634.