modeling and optimization of maintenance · this field with several universal optimization models...

MODELING AND OPTIMIZATION OF MAINTENANCE SYSTEMS

Xiaoyue Jiang

A thesis submitled in conformity with the requirements

for the degree of doctor of philosophy

Graduate Department of Mechanical and Industrial Engineering

University of Toronto

@Copyright by lXiaoyue Jiang (2001)

National Library If! of Canada Bibliotheque nationale du Canada

Acquisitions and Acquisitions et Bibliographic Services sewices bibliogmphiques

395 Wellington Street 395, rue Wellington Ottawa ON K I A O N 4 Ottawa ON K I A ON4 Canada Canada

The author has granted a non- L'auteur a accorde une licence non exclusive licence allowing the exclusive pennettant a la National Library of Canada to Bibliotheque nationale du Canada de reproduce, loan, distn'bute or sell reproduire, prGter, distribuer ou copies of this thesis in microform, vendre des copies de cette these sous paper or electronic formats. la forme de rnicrofiche/film, de

reproduction sur papier ou sur format electronique.

The author retains ownership of the L'auteur consewe la propriete du copyright in this thesis. Neither the droit d'auteur qui protege cette these. thesis nor substantial extracts from it Ni la these ni des extraits substantiels may be printed or othewise de celle-ci ne doivent etre imprimes reproduced without the author's ou autrement reproduits sans son permission. autorisation.

To my parents

MODELIXG AND OPTIMIZATION OF MAINTEN-UCE SYSTEMS

Xiaoyue Jiang (Ph.D. 2001)

Department of Mechanical and Industrial Engineering, University of Toronto

Abstract

This thesis focuses on modeling and optimization of maintenance systems.

Although the terminology we use is within the domain of manufacturing in-

dustry. we can identify its potentials in IT sections, such as software reliability

engineering and communication network management. to name a few.

The basic problem we are attacking is how to arrange preventive replace-

ment optimally based on the available information about the system's health

condition. Instead of emphasizing the concrete models. which are extremely

rich and diverse, we focus on the fundamental methodologies to grasp the

essence of this subject. In Chapters 2 to 6. we propose five models. which

can be roughly classified into two categories: age-based models (Chapters 2.

3 and 4) and condition-based models (Chapters 5 and 6). While each of the

models is of its own practice interest. it serves also as the vehicle to convey the

methodologies we integrated from the literature or developed in this thesis.

We solve these models in a fairly unified manner. The unified methodology is

further summarized in Chapter 7 in terms of a common modeling framework

and the associated optimization procedure. We espect that this framework

will be valuable for a wide range of applications.

Acknowledgements

I wish to thank my thesis supervisors Professor Viliam hIakis and Professor

-4ndrf.w K.S. Jardirle for their technical insights and guidance during the course

of my thesis. Their generous support and encouragement at critical times was

much appreciated.

Working on the thesis wodd not he the same without, the community and

s i p port of the CB hI Lab researchers and st udmts. Their suggest ions. inspira-

tion. and friendship over years has made a big difference to me. In particiilar.

I wish to mention 4111 Yang. Dr. Darning Lin. Dr. Dragan Banjevic. Walter

Wei Hlia Mi. Jayne Beardsmore Yimin Zhan. Babak Karirni and Kevin Doyle.

I am deepb indebted to my parents for r b i r loving snpport. and to my

talented brorher Zhongyie who has given me more than he mill ewr know.

Finally. I wish to thank my wife Jun who has been there with me rvery step

of the way.

Contents

1 INTRODUCTION 1

1.1 Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Core hlet hodologies . . . . . . . . . . . . . . . . . . . . . . . . . -I

1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 OPTIMALITY OF REPAIR-COST-LIMIT POLICIES 17

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2 Background and 4fodel Description . . . . . . . . . . . . . . . . 19

2.3 Repair/Replacement Problem . . . . . . . . . . . . . . . . . . . '23

2.4 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 OPTIMAL PREVENTIVE REPLACEMENT UNDER MIN-

IMAL REPAIR AND RANDOM REPAIR COST 44

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction 44

. . . . . . . . . . . . . . . . . . . . . . . . 3.2 Problem Formulation 48

3.3 Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

. . . . . . . . . . . . . . . . . . . . . 3.1 Computational Algorithm 69

. . . . . . . . . . . 3.5 Optimal Policy in the Discounted Cost Case 74

4 OPTIMAL MAINTENANCE POLICY FOR A GENERAL

REPAIR MODEL 80

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SO

4.2 Model Description and the Slain Result . . . . . . . . . . . . . . 53

4.3 Problem Formulation and .Analysis . . . . . . . . . . . . . . . . 88

4.4 Dynamic Programming Approach . . . . . . . . . . . . . . . . . 92

4.5 Proof of the Theorem . . . . . . . . . . . . . . . . . . . . . . . . 102

4.6 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

4.7 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5 OPTIMALITY O F LEVEL-CROSSING POLICY FOR A CBM

MODEL 114

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

5.2 Problem Formulation and Existence of the Optimal Policy . . . 117

. . . . . . . . . . . . . . . . . . . 5.3 Optimal Control-Limit Policy 122

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Conclusions 130

6 A CBM FRAMEWORK BASED ON HIDDEN MARKOV

MODELS 132

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction 132

. . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Literature Survey 131

. . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Model Description 112

. . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Problem Reduction 1-45

. . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Optimal Policy 149

7 SUMMARY AND FUTURE RESEARCH DIRECTIONS 160

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Summary 160

. . . . . . . . . . . . . . . . . . . . . 7.2 Future Research Directions 164

Bibliography

List of Tables

4.1 Optimal control limits g(t ) and T ( t ) for different values of t . . . 110

3 . 1 Optimal control limits for different values of n. . . . . . . . . . . 129

vii

List of Figures

2.1 Optimal age replacement policy . . . . . . . . . . . . . . . . . . . 36

. . . . . . . . . . . . . . . . . . . . . . 3.1 Optimal repair cost limits 73

4.1 Sample path of the failure rate and the repair cost . . . . . . . . 8.5

-1.2 Optimal policy for a minimal repair model . . . . . . . . . . . . . 109

4.3 Optimal repair/replacernent policy . . . . . . . . . . . . . . . . . 111

6.1 Optimal values 1, . . . . . . . . . . . . . . . . . . . . . . . . . . 1.54

. . 6.2 Optimal preventive replacement policy . . . . . . . . . . . . . . . 1m

viii

Chapter 1

INTRODUCTION

1.1 Overview

Maintenance is now a significant activity in industrial practice. lccorcling

to Halasz e t a1 (1999) on the 1996 costs of maintenance across 11 Canadian

industry sectors. "in addition to every dollar spent on new machinery. an

additional 58 cents is spent on maintaining existing equipment. This amounts

to repair costs of approsimately $15 billion per year". .As a consequence. the

importance of maintenance optimization becomes obvious.

Essentially. the problem of maintenance optimization can be described as

follows. Consider a system that is prone to failure. Instead of running the

system to failure. one can arrange preventive replacement a t high risk situa-

tions to avoid costly failure. Also. one may have the opportunity at the failure

epoch to decide whether to repair the system or to replace it by a new one.

The objective is to optimize the system performance based on a given crite-

rion. such as average cost. discounted cost, or total net profit criterion. The

two fundamental questions are: when to carry out a replacement and how well

the system performs?

To obtain concrete mathematical models. one needs to specify all the con-

tents in the above conceptual model. such as the deterioration dynamics, the

cost structure, information level. available maintenance options. etc. !dore-

over. proper interpretation of the model is required for real life applications.

For instance. we may use "cost" to represent the economical expenses. or the

duration of down time (which leads to availability analysis); use "age' to repre-

sent the operating time. the mileage. or the number of takeoffs for airplane. to

adapt to the specific real situations. For models with random "observations".

the observations can be either raw data. or the preprocessed information from

the raw data. From this standpoint. we see that the modeling aspect is insep-

arable from the optimization aspect of the maintenance research and practice.

As an academic subject. the research on reliability and maintenance crosses

multi-disciplines. such as operat ions research. applied probability. statistics.

engineering and management science. Originated in mid-40s: this field has been

undergoing explosive growth. Hundreds of models and policies have appeared

in the literature annually in recent years, and they are distributed among

mathematical, engineering and management science journals. According to

a survey conducted by densen (1996) based on MATH DATAB.ASE of STY.

from 1972 to 1994, the number of publications with keyword "Reliability" is

$3521 and in addition. 1909 papers have keywords "Maintenance" or "Repair".

These papers account for about 0.8% of all mathematical publications which

are related to reliability and maintenance. This shows the importance of this

field and in the meantime. the difficulty of providing a complete overview on

the subject.

Several intensive surveys can be found in the journal of Naval Research Lo-

gistics Quarterly. where Pieskalla and Voelker (1976) has 259 references. Sherif

and Smith (1981) has an extensive bibliography of 52.1 references. and Valdez-

Flores and Feldman (1989) has 129 references. Certainly. it is getting harder

and harder to grasp this huge and growing field. Attempting to summarize

this field with several universal optimization models is definitely infeasible.

A more appropriate way to review this field is to study the core mathe-

matical and modeling methodologies. to investigate typical models from each

methodology domain. and to develop one's own models and methodologies

that meet the needs from the wide range of real world applications. It turns

out that the core mathematical methods in this field are much more compact

than the concrete models. In the next section, we will focus on those core

met ho du logies .

1.2 Core Methodologies

Generally speaking, there are three major categories of approaches that are

widely used in maintenance - age-based approach. Uarkovian approach. and

optimal stopping approach. We will review each of them in the sequel. and

it will be clear by the end of this review that proper integration of these

approaches is beneficial.

Age-based approach

Age based maintenance models are the most classical ones. rooted back

into the origin of this subject. The basic idea is to describe the system's

deterioration by a single index. the age. This quantity possesses some nice

analytical properties - deterministic. one-dimensionalt and monotone. which

make the analysis of this kind of models elementary. A general procedure of

this approach is the following:

Step 1. Propose a class of maintenance policies. with one or several colltrol

variables. Normally, the control variables are F he prc: a t ive replacement

time, number of repairs before failure replacement. etc.

Step 2. Explicitly derive the objective function. for example. the average cost.

as the function of given control variables.

Step 3. Find the optimal solution by using one or multi-dimensional opti-

mization schemes in the framework of ca1cuIus.

This approach is the most direct one to find a maintenance policy. and it is

accessible to researchers and practitioners with various background. Another

advantage of this approach is that it is very intuitive. Many fundamental

concepts. such as failure rate. minimal repair and replacement. etc. are defined

in this framework. .\.lore sophisticated concepts can also be built on top of

this. which makes it the most popular framework in the whole maintenance

area. Yet. a severe drawback of this approach exists. -4s there is no rigorous

justification to Step 1 on the optimality of the proposed policy classes. this

approach results in a huge number of policies that are neither optimal nor

provide much insight into the field.

One direction to extend classical age-based models is to introduce addi-

tional random factors. such as the random repair cost. to the models. Another

possibility is to generalize the concept of age itself to a new one - the vir-

tual age. Originated from Kijima and Sumita (1986) and Kijima (1989). the

concept of virtual age. together with the concept of repair degree, is used to

describe the effect of maintenance actions. The major difference between age

and virtual age is that the virtual age is no longer monotone. nor determinis-

tic. These two extensions of the model and the justification of optimality are

beyond the scope of the conventional age-based approach. Both Markovian

approach and optimal stopping approach are required to solve the problems

with the above extensions.

Markovian approach

Conceptually. the hlarkovian approach. which includes modeling and opti-

mization with hlarkov decision processes. is rather simple. The basic idea is

the following: the system deterioration is modeled by the state of a llarkov

process. where the state at the next **time period" depends only on its present

state. Consequently. one needs only to take into ilccount the system's present

state to make the best maintenance decision.

Nowadays. Markovian approach is completely mature. and its optimization

procedures. such as value iteration. policy iteration techniques become well-

known routines to all researchers. In the meantime. with full discrete/continuous

time horizon/state space/action sets combinations. the hlarkov approach have

tremendous modeling power. Further extensions of Markov models that in-

clude hidden blarkov models. semi-Markov and Markov renewal models. fur-

ther increase the flexibility of Markov approach. In fact, all those aforemen-

t ioned extensions can be transformed equivalently to standard Markov models

with larger state spaces.

From the application point of view. the modeling power of Narkovian ap-

proach has limits only in the sense of computational complexity and rnod-

eling inefficiency, instead of theoretical restrictions. Therefore, a successful

application of Markovian approach must involves careful modeling. and care-

ful computationai procedure design steps among others. including optimization

techniques and statistical issues. Several complementary techniques. such as

Generalized Stochastic Petri Yetworks, and Dynamic Decision Tree Analysis.

etc. have been developed from different application domains. such as cornputcr

science. and artificial intelligence to improve the efficiency of the representa-

tion for certain hlarkov models. and to at tack the computational issues within

Markovian framework.

From the above discussion, we see two major advantages of the hlarkovian

approach, the flexibility with respect to its modeling power. and the maturity

and simplicity with respect to the decision making procedure. On the other

hand. some drawbacks coesist with these advantages.

We have mentioned that the optimization procedure is now very mature.

and this fact leads some people to be satisfied with merely building a Markov

model and deriving the dynamic equation. This may prevent the researchers

from gaining more insights into finer structures. An example of this kind can

be found in Chapters 2 and 3, where by incorporating ideas from the optimal

stopping approach, we eventually derived a differential equation instead of the

dynamic equation of the integral equation form. This improvement not only

simplifies the computational task. but also uncovers the simple and intuitive

interpretation behind the optimal policy. Several attempts based on the stan-

dard Markovian approach have been reported in the literature which have not

achieved the complete solution.

hloreover. as the decision-making is based on present information instead of

the whole history. it is difficult to conduct policy comparison between different

models. For more details on policy comparison between different models which

might use different kind of information. see Jiang and Cheng (1995).

.An alternative way to attack these two problems is through the optimal

stopping approach. which has fundamentally different point of view with the

Slarkovian approach.

Optimal stopping approach

Instead of utilizing only the information about the current system state

as in the blarkovian approach. the optimal stopping approach tries to use

full information From the past up to current decision epoch. This approach

is heavily based on the general theory of stochastic processes, especially on

martingale theory. Many sophisticated mathematical objects and techniques

are utilized, by which intuitive concepts are defined and treated in a rigorous

and systematic manner.

The first important concept is the .;filtration''. which is basically the grow-

ing database that holds information about the system up to present time.

The second important concept is the "stopping time". which is a random

time that is completely determined by the available information up to present

time. In another words. for any given time epoch. whether this "stopping

time" has happened or not depends only on our knowledge about the history

of the system.

It is obvious to see the natural connection between stopping time and

replacement. The class of stopping times represents the wide range of mainte-

nance policies that are associated with the available information. The optimal

stopping problem is therefore to find the optimal policy among the whole stop-

ping time policy class for a given objective criterion.

An attractive advantage of optimal stopping approach is that for given fil-

t ration? i.e.. the information level. the optimality is among all stopping times

belonging to this filtration. including both SIarkovian. and non-Slarkovian

policies. Therefore. this op timalitp surpasses the optimality among merely

hlarkovian policies. An implicit advantage is that certain policy comparison

can be conducted by comparing the models with different information lev-

els. Obviously, higher information level means large stopping time class. and

therefore better optimal policy.

Compared with the SIarkovian approach regarding modeling power. they

are complementary to each other. On one hand. the formulation of optimal

stopping approach is not restricted to Slarkov assumption. On the other hand.

its action sets are always binary - stop or not. which is much more restrictive

than general Uarkov models.

The major limitation of optimal stopping approach is that. while its for-

mulation is general. the computational procetlures will still involve some spe-

cific assumptions, such as a certain kind of SIarkov property. or rnonotonicity.

or both. In fact, all computable non-4Iarkovian optimal stopping rules ob-

tained in the literature possess very strong monotonicity condition. .is in

many practical situations, those strong monotonicity assumptions are not re-

alistic. Markov assumption becomes a requirement for numerical computation.

Therefore, proper integration of hlarkov approach and optimal stopping ap-

proach has the potential to estract considerable value out of both.

1.3 Thesis Outline

-4s it has been discussed in the previous sections, the diversity and complexity

of maintenance problems implies the infeasibility of fitting all practical situa-

tions into one universal model. The alternative way we take here in this thesis

is to jointly utilize the core methodologies in order to establish and gradually

expand our maintenance model pool in a systematic manner.

From Chapter 2 to Chapter 6. w will develop one model in each c h a p

ter. Each model solves one particular problem of practical interest. and in the

same time. t p s to represent a general treatment for solving a range of sim-

ilar problems. Certain repetition in modeling and the treatment are kept to

stress the self-containedness of each chapter and the logical connection among

them. Instead of pursuing masimal generality in each model. we emphasize

the methodology and the procedures that are common to all of the models.

Much more variants and extensions can then be naturally developed along the

same direction.

We summarize the main results in this thesis as follows.

Chapter 2

In this chapter, we solve the repair/ replacement problem for a single unit

.,. system with random repair cost, which is proposed as an "important' and

"complicated open problem in Beichelt (1993). When the unit fails. the re-

pair cost is observed and a decision is made whether to replace the unit or

repair it. We assume that the repair is minimal. i.e.. the unit is restorsd to

its functioning condition just prior to failure. without changing its age. We

formulate this age-based model as a discrete time optimal stopping problem.

establish the existence of the optimal policy. and show that the optimal pol-

icy is a "repair-cost-limi t" policy. that is. there is a series of repair-cost-limi t

functions g, ( t ) , n = I. 2. ... such that the unit of age t is replaced at the n-th

failure if and only if the repair cost C(n. t ) 2 g,,(t): otherwise it is minimally

repaired. If the repair cost does not depend on n. then there is a single re-

pair cost limit function g ( t ) . which is uniquely determined by a first-order

differential equation with a boundary condition.

Chapter 3

We extend the previous model by incorporating the preventive replacement

optimization with the original repair/ replacement problem. The resulting

problem becomes a combination of tivo optimal stopping problems of different

nature. one is of discrete time. and the other is of continuous time. We first

develop a general result for characterizing the stopping times of jump processes.

This characterization, together with other mathematical tools such as semi-

martingale decomposition and ,\-maximization technique, enables us to solve

the two optimal stopping time problems sequentiallv without loss of optimality.

Again, we establish the existence of the optimal policy. and show that the

optimal policy is an age preventive replacement. repair-cost-limit policy. The

optimal preventive replacement time and the repair cost limits can be obtained

by solving the same system of ordinary differential equations as that in Chapter

2. with different boundary conditions. A very intuitive interpretation of the

optimal policy is obtained based on the concept of "residual value!'. Both the

average and the discounted cost criteria are treated with this approach. An

algorithm for fincling the optimal policy is presented for the average cost case

and a numerical example is given to illustrate the algorithm.

Chapter 4

The age-based models considered in the previous two chapters are further

estended to a virtual age-based model by generalizing the minimal repair to

a general repair. We use Kijima type I general repair model, and the anal-

ysis is valid for Kijima Type I1 model among others. \Ve assume that the

repair degree that affects the virtual age of the system is a random function

of the repair cost and the virtual age at failure time. The objective is to find

the optimal maintenance policy that minimizes the long-run expected average

cost per unit time. With a novel formulation of the problem as a continu-

ous time Markov model, we then apply the optimization procedure developed

in Chapter 3 to solve this problem. While more arguments on monotonicity

issues are involved, we are able to show that a generalized repair-cost-limit

policy is optimal. and the preventive replacement time depends on both the

virtual age of the system and on the Length of the operating time since the

last repair. Computational procedures for finding the optimal average cost.

the optimal repair cost limit function. and the optimal preventive replacement

time function (with respect to the virtual age) are developed. This model in-

cludes many well-known models as special cases and the approach provides a

unified treatment for a wide class of (virtual) age-based maintenance models.

Chapter 5

While various condition monitoring techniques have been widely deployed

in practice. there are still relatively few mathematical models capable of fully

utilizing the available information for optimal maintenance decision making.

A very intuitive scheme. called the "level-crossing' policy. is commonly used in

the maintenance practice. Based on this policy. the system is preventively re-

placed as soon as a certain performance parameter reaches a prespecified level.

While this scheme is practically plausible, its optimality is to be justified.

and the control-limit is to be optimized as well. In this chapter. we propose

a simple conditional-based maintenance (CBM) model to address these two

problems. The monitored signal process is a one-dimensional 4Iarkov process

over discrete time horizon. It represents onlv partial information because the

failure occurrence does not completely correspond to particular signal level.

The objective is to find the preventive replacement policy that maximizes the

total expected profit during the lifetime of the system. We formulate this prob-

lem as an optimal stopping problem. and show that under weak monotonicity

assumptions on the signal process. the optimal policy is a level-crossing policy.

We also develop an algorithm for finding the control limit for an €-optimal

policy.

Chapter 6

In this chapter. we first provide a summarized literature survey in the

condition-based maintenance area. and then propose a comprehensive CBM

model to represent a general framework for CBhI optimization. The principal

approach is optimal stopping for hidden hlarkov models. In this model. the

system state is driven by an unobservable hfarkov process. which is defined on

a continuous time horizon. The obsemations are described by another random

process which is defined at discrete epochs L. 2L. .... nL. .... and conditionally

depends on the hidden system state. By applying the results from general the-

ory of stochastic processes, we reduce the original problem significantly without

loss of optimality. and derive the dynamic equation for obtaining the optimal

value function and the optimal policy. Furthermore. we prove the optimality of

the discrete policy when the observation interval L is small enough, i.e.. opti-

mal preventive replacement is performed only at the observation epoch. While

the computational issue is not the major concern of this chapter. we provide

a simple algorithm based on fixed point theorem. and we solve a concrete

esarnple as an illustration.

Chapter 7

In this final chapter. we summarize the modeling framework and the opti-

mization procedures that are common in the models presented in the thesis.

Some subjective thoughts are then presented. and more research on the on

the hidden Markov model framework for CBM optimization is advocated. By

the end of the thesis. we indicate some theoretical problems and practical

applications along this direction for future research.

Chapter 2

OPTIMALITY OF

REPAIR-COST-LIMIT

POLICIES

2.1 Introduction

We consider a maintenance model where the failed unit can be restored by a

minimal repair or a replacement. The minimal repair cost C(n. t ) is a random

variable which depends on n and t . where n is the number of failures since the

last replacement and t is the age of the unit at the n- th failure.

This model has been proposed but unsolved in Beichelt (1993). where it

was used as a unified treatment of many well known models as the special

cases. For research related to unified treatment of maintenance models. see

also hIakis and Jardine (1992) and Jiang and Cheng (1995).

The repair-cost-limit policy has been linked to the random repair cost mod-

els from the very beginning (see e.g. Hastings (1969)). Several classes of the

repair-cost-limit functions have been studied in the literature. e.g. the constant

cost limit (e-g. Cleroux et al. (1979)) or a class of parameterized decreasing

functions (see. e.g. Berg et al. (1986), Park (1953. 19%)). Within these

particular classes. the optimal repair-cost-limit function has been found under

additional assumptions on the distribution of the repair cost. Obviously. these

approaches do not lead to the optimal policy. In the discrete case (the tleci-

sions are made at times t = 1.2. ...). the optimality results have been obtained

by Hastings (1969) and later extended by White (1989) for the expected av-

erage cost case and for the discounted cost case on the infinite horizon using

dynamic programming. The repair cost limit policy is easily implementable

and it is frequently used in practice (see e.g. Hastings (1969). Drinkwater and

Hastings (1967) and Beichelt (1993)).

We formulate this continuous time maintenance problem in the framework

of the optimal stopping theory in discrete time. First. we establish the exis-

tence of the optimal policy and then prove the optimality of the repair-cost-

limit policy. We will show that in a special case. where the random repair

cost C(n. t ) does not depend on n. the repair-cost-limit function is uniquely

determined by a first-order differential equation with a boundary condition.

The results obtained for the deterministic cost function C(n. t ) agree with the

results obtained by 4lakis and Jardine (1992b).

2.2 Background and Model Description

The following assumptions will be made throughout this chapter.

I . The failure rate h ( t ) of the unit is a nondecreilsing function of t . and

h ( t ) < x for all t .

2. Two kinds of maintenance actions. -4 , and ;I,, are considered. where -4,

and A, denote the minimal repair and replacement. respectively. A11

maintenance actions take negligible time.

3. The costs of the maintenance actions -4, and .-Ir are denoted by C(rr. t ) and

Cr . respectively. where Cr is a constant and C(n. t ) is a random variable:

n is the number of minimal repairs since the last replacement and t is the

age of the unit at the n-th failure. We assume that Cr includes a failure

loss Cf: C, 2 C1 > 0. and C(n. t ) is the sum of a repair cost C,(n. t ) 2 0

and the failure loss C1. Next. we assume that for each t 2 s > 0. and

19

for each i 2 j 2 1. Cm(it t ) 2 C,(j, s) stochastically. Finally. the repair

costs are mutually independent and observable at failure times.

4. The objective is to minimize the expected average cost per unit time over

an infinite time horizon.

We will formulate the problem in the framework of the optimal stopping

theory. First. we will summarize the optimal stopping results that mill be used

in this paper (see Chow et al. (1971)).

Let (9.3. P) be a probability space and IFnF,. n E .L;} be a right continuous

arrd complete filtration. Let 1- = {I,, n E .V+) be a scqience of random

variables such that (1,) is adapted to (a). h stopping time r is a random

variable r: R -t A L u {+a) such that {T = n} E 7, for all n E .V,. and we

define

Then. the optimal stopping problem is formulated as follows:

find a stopping time r'. if it exists, such that

El;- = sup,, (El;).

where D' = { r : r is a (Fn) - stopping time. and El,; exists }.

We will introduce the following notation

= ( r : r is a (7,) - stopping time, r < co. Ek;- < m}

= { T : r is a (3,) - stopping time. El;- < m)

first i 2 n such that 1; = 7,

x if no such i esists.

Remark 2.1. As a consequence of Theorem 4.7. p.81 in Chow et a/. (1971) .

V ( D ) = V(g). Obviously. also LF(D') = C'(D).

Theorem 2.2. [Chow et al.. Theorem 4.5'. p.821 I j E(supIk+) < x. then a

is optimal in D. If Yn + -00. then a E D.

For the Markov case. the optimal stopping time has a more specific form.

Assume that 7, = B(k;. ... Y,). and for each n = 1.2. ... there is a measur-

able space (X,, Xn) and an 3 n - measurable random variable x, taking values

in S, such that 1, = pn(xn) For some Xn- measurable function p,(.). PVe say

that (x,, X,)? provides a Markov representation of the sequence {EL: Fn}y if

P{X ,+~ E B I X,) = P{X,+[ E B I x,), ( B E F,+,,n = 1.2 ,... ). In addition.

if .YI = ... = S, S. X l = ... = Xn X . and P(X ,+~ E B I x,} is For each

n and B E X a function on X which does not depend on n. then we have a

stationary !darkov representation of (1,. Fn };C.

For n = 0 , L 2. .... denote

( x ) = ess supDn E( I ; l t , = I )

Then. we have

Theorem 2.3. [Chow et al.. Theorem .52, p.1041 In the stationaqj i h r k o v

case. there i s a version of (7,) such that /or each n = 1 . 2 . ....

C,(X) = E ( y n I I I in = x ) ?

L(4 = - / n W

Corollary 2.4. [Chow et al.. Remark. p.1051 if k, = ZFK: Bk(xt) + pn(xn):

denote

Then

and

Corollary 2.5. Using the same notation as in Corollary 2.4. if {B,(x,) } is

o sequence of mutually independent random uariubles. which depend o?dg on

state xn and on n. ,we have the same result as in Corollary 2.4.

Proof. Construct a new probability space (.V0 x S x R.N+ x X x B. P )

to substitute for (S. X. P"). Denote yk = ( k . x k . ~k(xt)). Let &-(yk) =

(0,O. 1)~; = ek(xk): & ( g n ) = qn((O. 1.0)~;) = qn(xn). Thus {gn/,,,VC x X x

n I - D)? forms a Slarkov representation of {I,.F,);O. where 1.; = 1,- Bk(!ll;) + - i j n ( ~ n ) +

Put IJ = (n. x. O,(x)). Then from Corollary 2.4 and the independence of

e n (~n))?

Also, we have

and

l if no such n exists.

Q.E.D.

We will also need the following result.

Lemma 2.6. [Chow et at.. Theorem 4.13. p.921 Assume that 8. &. 02. ... are

independent and identically distributed random uariables with EB = 0. Let

k, = x:=,O, and 3 > 0. If E ( ( B + ) ' + ~ ) < x. then V b > 0. E(supn(l, -

nb)*)P < cc.

Lemma 2.7. T h e following almost trivial result will also be useful later in

this Chapter. Assume that {k:. Fn): i = 1.2 are two stochastic sequences on

the same probability space and for each w and n l k',L(w) 5 Y:(w). Then:

2.3 Repair /Replacement Problem

We will use the A- maximization technique. see e.g. h e n and Bergman (1986)

to treat the following minimization problem:

Put

where X > 0 is a parameter. S, is the i-th failure time. and each failure is re-

moved by &. C(n, S,) is the n- th repair cost. Denote L i ( 0 ) = supcE1;(X).

Then.

If there is a X such that 1 3 0 ) = 0. then X = A * .

Denote 9, = (n . S,,C(n. Sn)). 71, = B(y,. i < n) . Then. P{!j,,I E B i

R,,) = P{y,+, E B ( 9,): thus (9,) forms a stationary hlarkov sequence. We

also denote 0 as 0~ to indicate its dependence on the parameter A. To prove

the esistence of the optimal stopping time, we only need to check the condition

in Theorem 2.2.

Denote A. = E C ( m . x ) / y ( m ) = EC(x,cc)h(cm)? XI = C,/?(O) and

X2 = XO A XI. where r ( t ) is the mean residual time to failure at age t. It is

easy to see that Xo and XI are the expected average costs under minimal repair

policy and failure replacement policy. respectively. Hence. the optimal cost A*

must be less than or equal to A2.

From Lemma 2.7. we can see that if the repair cost C(nl t ) is replaced

by C(n , t ) A C,, then the optimal value for the latter stopping problem is not

worse than for the former one. We will prove that in the latter case. the optimal

stopping time exists and it never prescribes a repair action when C(n . t ) > C,.

Hence. these two stopping problems have the same optimal solution and the

same optimal value. SIore details are provided in Lemma -1.3 in the Appendis.

Thus. we only need to consider the case C(n. t) 5 C,. From the definition of

C(n , t ) = CI + C,(n. t ) , we have

I t follows from the previous discussion. that we can restrict ourselves to the

case X 5 X o .

Lemma 3.1. If,\ < X 0 . then E(sup,Ii+) < xj. andI , i -x .

Proof. When X < X o . there is a finite real number S > 0. and an integer

N > 0. such that X$S) - EC(N. S) = -b < 0. Then.

Where w , = S, - 3,- 8, = A,-, -C(.V. S) + b. and 2. 2, are i. id. random variables

which have the residual life time distribution at age S. F s ( * ) . Hence. Ed, =

E ( k , - C(.V. S) + b ) = h(S) - EC(.V. S ) + 6 = 0. Obviously. E(0f) < x. so

from Lemma 2.6 with 3 = 1. E ( S U ~ , ( ~ ; - ~ ( X , - , - C(l. S) + h ) - f i b ) - ) < x.

Consequently. E ( s u p n l r ) < m. Obviously. 1, t -cx.

Thus. o~ E D is the optimal stopping time which maximizes El&\).

Q.E.D.

Furthermore. since we have a hlarkov representation for (2.3.2). we get

from Corollary 2.5 by putting

Denote

Thus.

Obviously. g,(t) is a deterministic function of n and t .

Hence. a,, has the form of a repair-cost-limit policy. From the monotonicity

of the failure rate h( t ) . the stochastic monotonicity and houndedness of repair

cost C(n. t ) . it is easy to see that g, ( t ) is monotonically decreasing in rr and t.

and gn(t) is continuous. .A proof is in Appendis (Lemma -4.3).

When the repair cost C(n. t ) is a deterministic function of n and t . then

the optimal policy in (2.3.6) has the following form: replace the unit as soon

as the n-th failure time exceeds t i . This model is a special case in Makis and

Jardine (1992) where the optimal policy is obtained by applying semi-hlarkov

decision processes.

?kt, we consider the case where the repair cost does not depend on n. i-e.,

C(n , t ) = C( t ) . In this case. a single function g(t) exists which is the optimal

cost-limit function. Now. from the definition of g , ( t ) . (2.3.5), we have

g ( t ) = C:\(t) - At + C,..

where i ; ( t ) is given by (2.3), i.e..

and Sk( t ) is the k-th failure time after t . When t = 0.

g(o) = 1;(0) + c,. (2.3.3)

Remark 3.2. It is easy to see that the repair-cost-limit ( g ( s ) . s > t ) is also

optimal in [t. x) period. Consider a repair/replacement problem for a unit

which has age t, i.e.. setting t to be 0 in the new system. Then the failure

rate is h ( s ) and the repair cost is C(s) at time s - t for every s 2 t. For this

problem. the optimal stopping time exists and it is a repair-cost-limit policy.

where g(s) is the repair-cost-limit at time s - t . Furthermore. its optimal value

I.;,, is equal to h(t) - At. where i-i(t) is given by (2.3.7). Then.

which gives a clear intuitive meaning to g ( t ) .

'29

In order to obtain the form of g ( t ) . define

T : the replacement time under the repair-cost-limit policy g(t)

G(t) : the distribution function of T

F ( t ) : the distribution function of the first failure time

4 t ) : the expected cost of a cycle initiated from age t

B ( t ) : the expected length of a cycle initiated From age t

Lsing the well known results on thinning of nonhomogeneous Poisson pro-

cesses (see, e-g.. Block et d. (1985)). we have

Differentiating the above equation and taking into account (2.3.8). we have:

In this equation, X is a parameter, so that in order to determine g ( t ) . X must

be uniquely determined first. We will consider the following equation:

t

,\ = - g t ( t ) + h ( t ) J $ ~ ) I&(z)dz

g(0) = Cr (2.3.9)

g ( t ) E (0. C,). Vt > 0. \

Lemma 3.3. There is a unique A. /or which the solution to (2.3.9) exists.

This solution is unique.

A proof is in Appendix (Lemma A?).

Denote this unique X as X . . Recall that the optimal expected average cost

is A'.

Lemma 3.4. A. = A * . and o ~ . is the optimal stopping time for the original

problem (2.3.1).

Proof. Obviously. A' 5 Ao. Now. we will show that A. 5 Xo. W e only need

to consider the case Xo < x.

From (2.3.9).

From Theorem 2.2 and Lemma 3.1. for all X < Xo. the optimal stopping

time for {Y,, Hn);D exists in problem (2.3.2): and it is a,, E C.

Now. we examine the solution of Equation (2.3.9). Only two cases can

happen:

Case 1. There exists t > 0. such that R,(g(t)) < 1.

Case 2. For every t > 0. Rt(g( t ) ) = 1.

In case 1. it is easy to see that < m. o ~ . E C and A. < Xo. In fact.

A. = h ( x ) J$") Rm(u)du < h ( x ) E C ( x ) = X o . Thus. from Theorem 2.2 and

Lemma 3.1. ox. is the optimal stopping time for {k&\.). li,);". Therefore.

together with El,,. (X.) = \;.(0) = g(0) - Cr = 0. we obtain h. = A * . anti

hence. a~. is the optimal stopping time of the A-maximization problem (2.3.2)

and consequently. the optimal stopping time of the original problem (2.3.1 ).

t 2 O,gA,( t ) 2 gA,(oc) 2 ess s~p,,~C(t). Since ho is the expected average cost

under minimal repair policp and a~. = q, is the minimal repair policy. so a,,.

is again an optimal policy fur problem (2.3.1).

Q.E.D.

Furthermore. since

g l ( t ) = - A o + h ( t ) J o g " ' ~ t ( z ) d x = h( t )EC( t ) - h ( m ) E C ( m )

g(0) = c r .

then

= Cr - ( h ( m ) E C ( m ) - h(r)EC(r))z lk - J,' x d ( h ( x ) ~ C ( r ) ) .

Since J; rd (h (x ) E C ( r ) ) is increasing monotonically. it has limit as t + x. If

this limit is +m. then gAo ( x ) < ess supt,,C(t). Thus. we only need to consider

the finite limit case. Together with the boundedness and the monotonicity of

gA, ( t ) . ( h ( x ) E C ( x ) - h( t )EC( t ) ) t has a limit. and it must be zero. 0 t h -

envise, J,,(h(m) E C ( ~ ) - h ( r ) E C ( x ) )dx 2 5; (a/.r)dx + +x. Therefore.

g x o ( m ) 2 ess sup,,,C(t) C, - r d ( h ( x ) E C ( r ) ) 2 ess sup,,,C(t) e

zd (h (x )EC(z ) ) 5 C, - ess sop,,,C(t).

We will summarize the results in the following theorem.

Theorem 3.5. The optimal policy !or problern (2.3.1) exists. It is a repair-

cost-limit policy and the repazr-cost-limit function g,(t) becomes g( t ) when the

minimal repair cost C(n. t ) does not depend on n. This g ( t ) is uniquely deter-

mined by the following first-order differential equation

where X is the optimal expected average cost. The repair-cost -hi t policy is the

minimal repair policy if and only lf

'3t I/; i d ( h ( r ) E C ( r ) ) 5 C, - css sup,,,C(t).

Example. Assume h ( t ) = h > 0 and EC(t) = EC for all t 1 0.

Since 7Crd(h(r)EC(x)) = 0 5 C, - ess sup,,,C(t). the minimal repair

policy is optimal. In fact. we can also see that h ( t ) E C ( t ) = h(O)EC(O) for all

t g ( t ) = C, for all t .

If C( t ) is a deterministic, nondecreasing and continuous function of t . then

the optimal policy is the age replacement policy. i.e..

F = first n such tha t C(S.) 2 g(S,).

This is equivalent to

o = first n such that S, 3 t*. where t' = inf{t : C ( t ) = g ( t ) } .

Obviously7 C(t*) = g ( t * ) . From (2.3.9): when C ( t ) <_ g ( t ) .

x = -g'(t) + h ( t ) C ( t ) .

35

Figure '2.1: Optimal age replacement policy.

arid combined with g(0) = Cr.

When C ( t ) 2 g ( t ) .

x = - g l ( t ) + h ( t ) g ( t ) .

Since g ( w ) E (0. C,) we have

y ( t ) = A lX F ( r ) d z / F ( t ) .

Thus,

Therefore. from the above two expressions

(2.3.13)

for g ( t ' ) , (2.3.11) and (2.3.12).

This coincides with the results obtained by other methods. Therefore. we

have proved that. in the model with deterministic repair cost C(t) . the optimal

age replacement policy is optimal in the class of stopping times.

2.4 Appendix

In the following lemma. we restrict our attention to functions g ( t ) which satisfy

the first two equations in (2.3.9). We will also be using notation g,,(t) to

indicate its dependence on A.

Lemma A.1. If for all t > 0 g ( t ) E (0. C,), then it must be nonincreasing.

Proof. First. we prove that if there is t > 0 such that g l ( t ) > 0. then

g ( t ) -+ 02.

Consider the first to > 0 such that g t ( t 0 ) = 0. Then from the differential

equation and X > 0. we have that g ( t o ) > 0. It is easy to see that Vt >

to g r ( t ) 1 0. g " ( t ) 2 0. Hence. if there is t > t o such that g r ( t ) > 0. g ( t ) t x.

Therefore, if for all t > 0. g ( t ) E (0 . C,). then it must be nonincreasing.

Q.E.D.

Lemma A.2. There exists a unique A. fo r which the solution to (2.3.9) exzsts.

This solution is unique.

Proof.

(i) . Uniqueness.

From Lemma -1.1, if (2.3.9) is satisfied, then g(t) is nonincreasing. Consider

XI > X2 > 0 and for i = 1.2. denote gr,(t) as g i ( t ) . From (2.3.9).

0 t ) - Since (g2(t) - gl( t)) l = h( t ) J:;:~) R t ( x ) d x + ( A L - X 2 ) > 0. then ( p ( t ) -

gl ( t )) is increasing. Since K ( x ) decreases when x increases. we can see that

J+~?$) Rt (r)dx increases in t , and consequently.

( a ( t ) - 91 ( t ) 11 = h(t) / 9 2 ( t ) R~ ( + i ~ + [/\I - A?) 91 ( t )

increases. Then, (g,(t) - g , ( t ) ) 4 +x. Therefore. at most one X exists. such

that gA ( t ) is a solution to ('2.3.9). Finally. the uniqueness of the solution g,,. ( t )

for this fixed A, (

(ii) Existence

Denote

can be easily seen.

.\ = { A : gA(t) reaches 0 at some t > 0)

1' = { A : gA(t) reaches C, at some t > 0)

A ( A ) = inf{AlA A}

A(i\') = S U ~ { A I A E .it}.

First we show that neither ;\ nor :\' is empty. therefore above four definitions

are well defined.

Choose = C, + h(1) J? z l ( x ) d x . and denote the solution for this as

g ( t ) . Hence, when t E (0 .1) . g ( t ) < 4,. and g ( t ) reaches 0 a t some point

to E (01 1). Thus. X E .I.

Regarding to A'. if for all t 2 0. h ( t ) E C ( t ) = h(O)EC(O) = Ao. according

to Example 3.6. gx, ( t ) = C, for all t 2 0. hence. A. E .\I: othenvise. there is

0 < t o < m such that gA,(t()) > Cr. It is easy to see that when X increase to

X o . then gA(t) converges to gA, ( t ) in (0. to ] uniformly. Hence. there is a X I < Xo

such that g x l ( t o ) > 0. Since g ~ , ( 0 ) = C, and & ( 0 ) = XI - X o < 0. then. there

is a 0 < t l < to such that g,&) = Cr. Therefore. Al E .\I.

We have A(.\) = A(.\'). To see this. if A(.\) > A(.\ ' ) . then for every X E

(A(.\'). ,$.I)). g A ( t ) E (0. C,) for every t > 0. This is a contradiction to (i). the

uniqueness of A.

Thus, we can construct two sequences {A,} and { A ; } such that { A , } mono-

tonically decreases to A(;$ and {Ah} monotonically increases to &\') = ,$.$.

Obviously. {A,} c .\ and {Ah} c A'.

Yaw. we prove that A(.\) = X., i.e.. gx(,,(t) E (0. C,) for all t > 0.

Since for all X > 0. i > j. and t E [ O , X ] : 0 5 gA,(t) -gx , ( t ) 5 gx , (X ) -

gx, (N) , then { g A n ( t ) ) converges to a function. say g ( t ) , uniformly on (0. N].

39

From (2.3.9); gin ( t ) converges to a function: say w ( t ) . uniformly. It is easy

to see that w [ t ) = gl(t). Hence. A(.\) + gl(t) = h( t ) J:'~) E(2)d~. Finally.

g(t) = gx(.\) ( t ) . Also, gA(.\) ( t ) is the limit of gx; ( t ) Obviously, for this A(.\).

its solution gx(A)(t) E (0. C,) for all t > 0.

Q.E.D.

Lemma A.3. g,(t) monotonically decreases in n and t , is continuous in t and

0 5 g,,(t) 5 C, for all n 2 1.t 2 0.

Proof. Following results in stochastic order theory. we can construct a

probability space such that for each u and i 2 j : t 2 s. C( i . t ) (w) 3 C( j . s ) (u).

and S,( t ) - t 5 S , ( s ) - s. where S J t ) is the i-th failure time after t. So(t ) = t .

It is easy to see that for t > s. Si(t) 2 S,(s). We have

Then the monotonicity follows from

Also. it is easy to see that 0 < gi(t) 5 g , ( s ) 5 C,. Hence. following the

optimal policy. the system is never repaired if C(n. t ) > C,. Thus. we can

assume that C(n. t ) 5 C, without loss of generality.

The continuity of g, (t) follows From the following inequalities:

hAt + gn(t + At) 2 gn(t) > g,(t + At).

This completes the proof.

Summary of Notation

a+: maz{O. a )

a A 6: min{a, 6)

4 t ) : expected cost of a cycle initiated from age t

B( t ) : expected length of a cycle initiated from age t

41

C(n. t ) : repair cost of the nth failure at age t

Cf : failure lost

C,: failure replacement cost

F ( t ) : distribution function of the first failure time

g,(t): repair cost limit function for the n - th failure

g ( t ) : repair cost limit function if the repair cost is stocliastically independent

to the number of failures

G(t ) : distribution function of T

h( t ) : failure rate

R&): distribution function of repair cost a t age t

S( t ) : J$') rdRt(z)

S,: n-t h failure time

T: replacement time under the repair-cost-limit policy g ( t )

I,A(t): value function for the X maximization problem with initial age t .

( A ) : the objective function for the X maximization problem

A*: optimal average cost

cr(X): optimal stopping time for the X maximization problem

*/(t): mean residual time to failure at age t

A,: E C ( x . ~ ) / ^ ~ ( 0 3 )

Chapter 3

OPTIMAL PREVENTIVE

REPLACEMENT UNDER

MINIMAL REPAIR AND

RANDOM REPAIR COST

3.1 Introduction

This chapter is a natural extension of the previous chapter.

PVe consider a single unit repairable system subject to random failure. A

new unit is installed at time t = 0 and when the unit fails. the repair cost

44

is observed and a decision is made whether to replace (overhaul) the unit or

repair it. We assume that the repair is minimal, i.e.. the unit is restored

to its functioning state just prior to failure without changing its age. The

repair cost is a random variable that is a function of the unit's age and may

depend on the number of repairs since the last replacement. The unit can be

preventively replaced at any time prior to failure. The failure and preventive

replacement costs are assumed to be given constants. The objective is to find

the repair/replacement policy minimizing the long-run espected average cost

per unit time.

Although this model is formulated as a model of a single unit systems. it

can also be used as a suitable representation of a complex. multi-unit repairable

system. If only a small part of the system is repaired or replaced upon failure.

this would not affect considerably the failure rate and the minimal repair

assumption is acceptable. For such systems. the repair cost typically depends

on the age (or operating age) and may depend also on the number of failures

since the last replacement or overhaul of the system. Usually. the main concern

is the proper planning of a major (and costly) maintenance action such as an

overhaul or replacement of the system. which is the focus of this work.

.A similar model had been investigated in L'Ecuyer and Haurie (1987). Un-

der more restrictive assumptions. such as. the discounted factor in the objec-

tive criterion is bounded away from zero (therefore. the average cost criterion

is excluded), and the repair cost does not depend on the number of repairs,

etc. The structural result regarding the optimal policy is obtained by using

Markovian approach.

We will proceed our study of this model based on the optimal stopping

approach. The major advantages are listed below.

1. Wider policy class. The obtained optimal policy is now among the whole

stopping time class instead of the Markovian policy class only. Therefore.

a range of policy comparison results can be obtained by varying the

information level within the general framework.

2. More explicit form. We obtain the differential equation for computing the

optimal repair-cost-limit and the optimal preventive replacement time

which is much simpler than solving the integral equation derived from

hlarkovian approach.

3. More intuitive interpretation. The optimal policy can be expressed in terms

of "residual value" which is naturally introduced with the optimal stop-

ping approach.

The optimal stopping theory has been applied in preventive maintenance

mainly to analyze preventive replacement problems with quite general dete-

-46

riorating processes (e.g. Bergman (1978). hven (1983). Aven and Bergrnan

(1986) and Jensen (1989)).

The main mathematical difficulty involved in this work is that. we now

need to consider jointly two kinds of stopping problems: a repair/ replacement

problem at failure times. and a preventive replacement in continuous time.

Obviously, the former is a discrete time and the latter a continuous time stop-

ping problem. In addition. we no longer have the closed forms of the expected

cost in a cycle. 4 t ) . and the expected cycle length B ( t ) as in Chapter 2 when

the repair cost is assumed to depend on n. the number of repairs.

The chapter is organized as follows. In Section 3.2. we formulate the

problem in the optimal stopping framework. By developing a characteriza-

tion result for the stopping times of general jump processes. we reduce the

continuous time optimization problem to a discrete time stopping problem.

In Section 3.3. we prove the existence of the optimal policy and find its

form by applying A-maximization technique (see e.g. h e n and Bergman

(1986)) and semi-martingale decomposition approach (see e.g. densen ( 1989)).

The X -maximization technique transforms the original fractional optimization

problem to a parameterized (with parameter A) optimization problem with an

additive objective function for which the optimal stopping theory can be ap-

plied. It is also exactly this X that absorbs much of the complesity. resulting

in the simplification from the integral equation to the differential equation.

The semi-martingale decomposition approach further simplifies the objective

function by removing its martingale part without loss of optimality. The o p

timal policy has the following form: replace the unit a t the first failure time

when the repair cost exceeds certain limit or at age T , whichever occurs first.

In Section 3.4. we address the computational issues and present an algorithm

for finding an e-optimal policy. We will show that an e-optimal policy can

be obtained by solving a system of ordinary differential equations with hound-

ary conditions. A numerical example is given to illustrate the computational

procedure. In Section 3.5. the discounted cost case is solved in parallel to the

average cost case by applying the same optinlizat ion procedure. Iriterestingly.

the result coincides with that for the average cost criterion when the discount

fact or degenerates to zero.

3.2 Problem Formulation

We will make the following assumptions.

I. The time to failure of a unit is a generally distributed random variable

with distribution function F ( t ) , density f (t) and the failure rate h ( t ) =

f ( t ) / ( l - F ( t ) ) , which is a continuous and non-decreasing function of t .

2. At failure time. the unit can be either minimallv repaired a t cost C(n. t).

where n is the number of repairs since the last replacement and t is the

age of the unit, or replaced a t cost C,. The unit can be preventively

replaced a t any time prior to failure a t cost C,, < C,. We assume that

Cp and C, = Cp + Cf are given constants and {C(n, t)} are random

variables stochastically increasing in n and t . Further. we assume that

EC(n. t ) is continuous in t. and that C1 5 C(n. t ) 5 C, (without loss of

generality).

3. All maintenance actions take negligible time.

The objective is to find the repair/replacement policy minimizing the long-

run expected average cost per unit time.

Let t be the time of the first replacement if a new system is installed at

time zero. We define TC( t ) . the total cost incurred up to time t. as

where Si is the i-th failure time, So = 0. iV(t) is the number of failures before

t : X(t) = I i s t5 , ) . and I is the set indicator function.

Let (4) be the (completed) natural filtration of process {TC(t) . t 2 0).

For the average cost criterion, the maintenance decision problem can be

formulated as follows. Find an (Ft)-stopping time r*, if it esists, minimizing

the long-run expected average cost per unit time given by

E(TC(r ) 1 (3.3.2) Er

Using the methodology presented in the Lemmas 2.2-2.4. we will show

that this continuous time stopping problem can be reduced to a discrete time

stopping problem. The methodology is quite general and can be applied also

to other kinds of continuous time stopping problems for jump processes.

We will need the following result from the theory of jump processes.

Lemma 2.1. [Davis 1993. Theorem -4.2.3. p.2611 Let & = (S,.C(i. S,)). 2, =

a{<&, i 5 n ) Fsn and r be an (Ft ) -stopping time. Then there exist a

constant To and ?in-measurable junctions Tn /or n = 1 . 2 . .... such that

and for n = 1.2. ...

- j I{s,,<r <sn+l} = (Tn A S-1 ) ' { ~ ~ < r ~ s ~ + l } :

where a A b = min{a, b). Note that T, is a function of ([I? .... Cn).

Lemma 2.2. For any (Ft)-stopping time r and {T,}ofCF defined in Lemma

2.1. we have for n > 1 ,

Proof. We use mathematical induct ion.

For n = 1, we want to prove that T 1 Sl To 2 Sl. We have from

Lemma 2.1,

~ I { , < s , ) = (To A SI)I{,<S,) -

Define T' = r A S1. Then T' is also an (Ft)-stopping time. and frorn Lemma

1. there exists a constant TA such that r' = Ti A SL (because r' 5 SI ).

Hence. T' = S1 TA > Sl. But r' = S, if and only if r 2 Sl. so that

r 2 Sl * TA 2 Sl. We will prove that Ti = To. For T < St. To = T = r' = TA

and (3 .23) holds for n = 1.

Now assume that (3.2.5) is true for n 2 1. We want to prove that r 2

Sn+1 * T, 2 Si+1 for i 5 n.

Obviously. r 3 S,,l * r 2 Sn: and we have frorn the induction as-

sumption that 2 S,+l for i < n - 1. Thus. it suffices to prove that T 2

S,+l * Tn 2 Define T' = rASnL1. Then from Lemma 1. T' = T;AS,+~.

From the definition of T'. r 2 Snil T' = Sntl * T'A 2 S,+,. But for

Sn < T < Sn+I, we have from Lemma 1 that T, = r = T' = TA, and the result

follows? since T, and TA are 'fl, -measurable.

Q.E.D.

Lemma 2.3. Any (FJ-stopping time T has a representation

where a is an (31,)-stopping time, To is a constant and Tn is ?in-measurable for

n 2 1. We define 1. Conversely, for any (R,) -stopping time o. constant

To 2 0 and X,, -measurable functions T, 2 S,, n = l , 2 , 3 .... r(o. {T,)) defined

by (3.2.6) is an (q)-stopping time.

ProoJ For given (F t ) -stopping time T . choose {T,):" as in Lemma 2.1.

and define

n 2 1 l j { T = Sn}

+m otherwise.

Since {a = T I ) E U,.o is an (U,)-stopping time. We will show tha t the

right-hand side of (3.2.6) is equal to r.

First. assume that a = +m. This implies that T # Sn for n = 1.2. .... so

tha t 0 5 r < S1, or S,,, < T < for some rn 2 1. or r = +m.

If S, < r < for some rn >_ 1. then from Lemma 2.2, T, 2 Siil for

i 5 rn - 1 and from Lemma 2.1. r = T, < S,+l: so tha t

Similarly for 0 5 r < Sl.

If r = ca. then from Lemma 2.2. Ti 2 Sitl for all i and

Sow. assume that a = n for some n 2 1. Then T = S, and from Lemma

2.2, T, 2 Si+1 for i 5 n - 1. We have,

Conversely, Let T(O. { T i ) ) be defined by (3.2.6) for given (H,) -stopping

time o. constant To 2 0 and H, -measurable functions Tn > S, for n = 1. '1. ...

Define

First, we will show that

For anys, E R. there exist 1 5 n 5 +m. such that a = n and rn. 0 5 m 5

+m. such that Ti 2 S,+l for O 5 i < m and > T,.

Hence,

and T, A T* = Tm A S,. On the other hand,

We will show that rl and 72 are (3,)-stopping times.

We have for any t 2 0.

Since Sn is an (Ft)-stopping time. and {o = n } E FSn, by the definition of

so that rl is an (3,)-stopping time.

Nest. we will show that F~ is an (3,)-stopping time.

From the definition.

Since

we have that

{Ti 2 Si+l for 0 5 2 5 m - l.Tm < Sm+l} E F T , ,

and

{T, 2 Si+l for 0 < i < m - 1, T, < Sm+l} n {T, 5 t } E Ft,

so that {Q 5 t } E Ft.

Since rl and rz are (Ft)-stopping times, r(07 {Ti ) ) = rl A 7 2 is also an

(Ft ) -stopping time.

Q.E.D.

Lemma 2.4. Let r be an (Ft)-stopping time. (0. { T , ) ) be a representation of

r and let TC(r ) be the total cost incurred u p to time T . Then.

where C(0,O) = C,.

Proof. For given w E R. we have the following four possibilities: 0 5 r <

SI. S,,, < r < S,,, for some m 2 1. r = S, for some n 2 1. or r = +m.

First, assume that there exists rn 2 1 such that S, < r < S,+ Then.

N(T) = m. r # S,V(,l and from (1).

On the other hand, we have from Lemmas 2.1-2.3 that a 2 m + 1. T = T, <

Sm+l and Ti 2 St + 1 for i < m - 1. so that

Similarly lor 0 5 s < SI.

If there is n 2 1 such that r = S,, then X ( r ) = n. r = S,vc,l, and from

n-L n-l

T C ( r ) = C, + C C(i . S,) + Cf = C C(i. S i ) -

It follows from the proof of Lemma 2.3 that in this case. either o = n and

T, 2 Si+l for i 5 n - 1. or o > n. T, 2 Si,, for i 5 n - 1 and T,, = S,,. In

both cases, we have

n-1

T C ( q (T,}) = C(i. S,) = TC(r) . i=O

Finally. if T = xt we have from (3.2.1) that

X X)

TC(r) = C, + C C(i. S,) + GI = C C(i . S i ) = S . i= t r=O

In this case, a = +w, Ti >_ Si+' for i = 0.1.2, ... and

Q.E.D.

Lemmas 2.3 and 2.4 provide the characterizations of any (6) -stopping

time : and the corresponding total cost TC(T) in terms of 0 and {Ti}:r.

This representation of r corresponds to a decomposition of a replacement plan

into competing failure and preventive replacement schedules. We have seen

this explicitly in (3.2.9). where r ( q {Ti}) was written as TI Q, TI represents

C ~ I ures. replacement at failure times and 71 preventive replacement between I. -1

This representation together with the result in Lemma 2.4 leads to the following

equivalent formulation of the continuous time stopping problem in (3.2.2).

Find

( E(TC(o9 (Ti ) 1 ) ) A * = inf inf {TI E(r(0 . { T t } ) )

and (d. ITz*}) minimizing E(TC(o. {Ti))) (if they exist). where the first infi- E b b * {TH)

mum in (3.2.11) is through (2.) -stopping times a and the second infimum is

through {Ti}, Ti is Xi-measurable. Ti 2 Si for i 2 1, To 2 0 is a constant.

The optimal stopping time ~ ( 0 ' . {Tta} ) and the minimum average cost A *

can be found by solving the following A-maximization problem. For A > 0.

find

where

A* = sup{X : C*(X) < 0) (3.2.14)

and (0'. {T;}) maximize the right-hand side of (3.2.12) for X = A * .

In the next section. we prove the existence of the optimal po

form and derive a system of differential equations that determines

policy.

licy? find its

the optimal

Optimal Policy

The maximization problem in (3.2.12) can be simplified by removing the mar-

tingale part from Ii (a. {Ti)) defined by (3.2.13) through conditioning (semi-

martingale decomposition. see e.g. Jensen (1989)). i-e.. we can consider the

following problem. For X > 0- find

where

= ZZiY-C(i. Si) + Cf + f . . ( A - h ( t ) C j ) G , (t)dt)Its,+r,} n;;b Its,ii

(3.3.2)

- Fs( t ) = P({ > tl< > s ) . ( is the time to failure of the unit. Then, A*

defined by (3.2.1 1 ) can be obtained from (3.2.14) and the optimal stopping

time r ( d . IT:}) for the problem in (3.2.11) is the stopping time maximizing

E(T&. { T , ) ) ) for X = A'.

Define for X > 0.

TA = inf{t : X <_ Crh( t )} .

Then, we have

The first inequality in (3.3.5) follows from the following result:

which holds for any Ti > Si7 and the third inequality in (3.3.5) follows from

Hence.

sup, E ( ~ A (0: {TA v S i ) ) )

2 s u p g ( s u p ~ ~ , , E ( ~ A ( U , (Ti}))) = C-(h)-

On the other hand, TAvS, is FS, -measurable for i 1 0, therefore. E(FA(a. {TAv

S i ) ) ) 5 V(X) for any ('fln)-stopping time 0. so that sup, E ( ~ A ( o . {T., v

S,))) = V(X) and (3.3.4) holds. To summarize, we have proved that for a given

X > 0, TA is the optimal preventive replacement time for problem (3.3.1).

To finish solving the maintenance decision problem. it is now sufficient to

find the optimal stopping time for the sequence {CC,(X)}, where

LG(X) r 1::; ( - C ( i Sj) + CI + ~2 ( A - h(t)CI)Fs, ( t ) d t ) ~ ~ ~ < ~ + , ) .

(3.3.6)

which is obviously a discrete time stopping problem.

We only need to consider the values of X < Xo = h( t )EC(n. t ) .

i-e.. A. is the average cost when the minimal repair policy is applied at failure

times and no preventive replacement is planned. I t follows from Theorem 4.5

in Chow et a1.(1991), p.82. that for X < Xo the optimal stopping time a,~

maximizing EI.C,(X) exists. From Remark on page 105 in the same book.

has the following form:

Denote for n 2 OF

In particular. for n = 0. t = 0 and X = A' . where A' is defined by (B.2.l-L).

Then for S, 5 TA. the optimal stopping time o~ has the form

From (3.3.8)? we have for S, > TA,

and from (3.3.9). we get for Sn > TA1

It follows from (3.3.4) and (3.3.3) that for given X < XI].

so that ~ ( q , {TXvSi}) is the optimal stopping time maximizing E(E;(o. IT,})).

Hence. if we denote gn(t) - gn(X8. t) and T TA8. where A* is defined by

(3.2. l l) , the optimal repair/ replacement policy has the following form: the

unit is minimally repaired at the nth failure time Sn if and only if S,, <_ T and

gn(Sn) > C(nl S,), otherwise it is replaced. If the unit has not been replaced

at failure times before T. it is preventively replaced at time T.

Remark 3.1. Function gn(t) has the following interpretation. From (3.3.5)

and (3.3.9) we have for X = A'.

The last equality in (3.3.14) gives a clear intuitive meaning to g, ( t ) , that

can be explained as follows. Consider a repair/replacement problem for a unit

installed at age t 5 T. i.e.. we put t to be zero for this system. The failure

rate a t time s is then h l ( s ) = h(t + s) and the repair cost incurred a t the i th

failure time equal to s is C(n+ i. t + s ) . Soticing that C(0. So) - C'(O, Sb) = C,.

which is the installation cost of a new unit, we can assume that the installation

cost of this old unit is equal to zero. The last equality in (3.3.14) says that

y,(t) - Cj is the optimal value for the corresponding A-maximization problem

for the old unit. Hence. g,(t) - Cj can be interpreted as the residual value of

the unit.

.-\ccordingly. the optimal policy can be described as follows: the unit is

replaced at the n th failure time Sn if and only if the pure repair cost C(n. S,) -

Cf equals to or exceeds its residual value gn(Sn) - Cf. or a t the time when

its residual value reaches zero ( because gn(T) = Cf ). In addition. it is

obvious from the above analysis that the optimal policy for the operating unit

installed at age t has the same form as the optimal policy for the new unit. i.e..

gl(s) a gn+,(t + s ) is the optimal repair cost limit for the it h failure at time s

and a preventive replacement is scheduled to be carried out a t time T - t.

In the next theorem, we summarize the results concerning the form of the

optimal policy and derive a system of ordinary differential equations for finding

the optimal control limit functions and the optimal preventive replacement

time.

Theorem 3.2. Let T = TA- and gn(t) = g, ( A * , t ) for n 2 O? where A* is the

optimal average cost and TA and g n ( X , t ) are defined b y (3.3.3) and (9.3.9).

respectively. Then the optimal policp has the follo*wing form:

The unit is repaired at the n th failure t i n e Sn 5 T if and only if g,&(S,) >

C(n. Sn), otherwise it is replaced. If the unit has not been replaced at failure

times before T , it is preuentiuelg replaced ut time T .

The optimal control limit junctions {t~,(t). n 2 0) and the optimal average

cost A' satisfy the following sgstem of differential equations with b o v n d a q

conditions:

where Rn,,(u) = P(C(n, t ) 5 u ) ! and T is the optimal preventive replacement

t ime determined by the equation Clh(T) = A. If Cjh( t ) < X for all t , T = +CC

and the optimal policy is a repair-cost-limit policy.

Proof. From (3.3.4). we have for t 5 T by conditioning on Sn+I and

By differentiating (3.3.16). we can see that the optimal control functions

{g,,(t), n 2 0 ) and the optimal average cost A' satisfy the differential equa-

tions in (3.3.15). The boundary conditions are obtained from (3.2.14). (3.3.3).

(3.3.9) and (3.3.12).

Q.E.D.

In the next theorem. we investigate the case where the repair cost C(n. t )

does not depend on n.

Theorem 3.3. If C(n. t ) C ( t ) for all n. then g,(t) = g( t ) and the optimal

control Junction g ( t ) and the optimal average cost A' are uniquely determined

b y

where Rt(u) = P(C(t) > u ) .

Proof. If follows from (3.3.8) and (3.3.9) that if C(n. t ) = C ( t ) for all

n, g,(t) does not depend on n and we have from Theorem 3.2 that (X8,g(t))

satisfy (3.3.17) and (3.3.18).

To prove uniqueness, we use the standard arguments from the theory of

ordinary differential equations. namely the existence and uniqueness of local

and global solutions (e.g. Theorem 1.1 and Theorem 3.1 in Hartman(l96-l)).

Define A(X) = inf{t < ~7 : gA(t) = Cj. and g;(t) 5 0): and A =

supA{A(A)}. It is easy to see that {A(X)} is not empty and therefore 1 = x

or 1 < x. If A < x. then there is a sequence of decreasing positive real A,

such that g x n ( t ) = C j and g'Jt) 5 0. where An ---+ X and A(,\,) + A. CVe

will show that gA(A) = CI and g;(A) = 0. If g i (A) < 0. then we can find an

> 0 such that gA(A + e ) < CI. Since gx(t) is continuous in X . we can find a

X' < h such that A(XJ) > A. which contradicts the definition of A. Similarly.

we can prove gA(A) = Cf .

Hence, (A, gA(t)) is a solution to (3.3.17) and (3.3.18). Obviously. there is

no other X satisfying (3.3.17) and (3.3.18). From g(0) = C, and from (3.3.9).

we see that V(X) = 0. which implies that g(t) = gA(t) and T = A form the

optimal policy and X is the optimal average cost.

If 1 = m, then there is a sequence of decreasing positive real A, such

that gxn(t) = Cf and g i ( t ) 5 0, where A(A,) -+ x. Since A, t A.

gA, ( t ) + gA( t ) for all t > 0. Thus. gA( t ) E (CI. C,) for all t > 0. For the

case P{g(oo) < C(m)} > 0, it is clear from (3.3.17) that X < X o and the

optimal stopping time ES,, < oa. V(X) = 0 again implies that g ( t ) = g x ( t )

and T = co form the optimal policy and X is the optimal average cost. Hence.

in this case, the repair-cost-limit policy is optimal.

In case P{g(ca) < C(m)} = 0. we see from (3.3.17) that X = A,. which

is the average cost under the minimal repair policy. Hence, although OA = x

does not belong to the finite case. (3.3.17) still suggests the form of the optimal

policy. which is the minimal repair policy. The optimalitp of the minimal repair

policy can be seen from the fact that for each X < X o , I,'(,\) < 0. In fact. in

Chapter 2 (see also Jiang et a2.(1998)), we proved that this occurs if and only

00

rd (h ( r ) E C ( r ) ) 5 C, - ess sup,,,C(f).

which is a very rare case. Finally. we should emphasize that in this case. \ ( * (Ao)

is not necessarily equal to zero.

In general. it is difficult to solve (3.3.15) because the number of the equa-

tions is infinite. In the i l e ~ t section. we develop a computational procedure for

finding an coptimal policy.

3.4 Computational Algorithm

Lemma 4.1. Assume that the unit can be repaired at most 8' - 1 times for

some N 2 1. Then, the optimal repair/ replacement policy has the followiny

form:

the unit 2s repaired at the n th failure t ime S, 5 T. n < N. if and only if

gn(Sn) > C(n, S,), otherwise it is replaced. If the unit has not been replaced at

failure times Sn < T. n < :V. it is replaced at the N t h failure t ime Siv or- at T .

uthichever occurs first. The optimal control functions g,(t). O 4 n 5 .V - I and

the optimal average cost X.V are uniquelg determined bg the following system

of equations:

If C/h( t ) < X for all t , T = +m.

Proof. Csing the same approach as in Sections 2 and 3, one can show that

the optimal stopping time r has the following form:

Sn, if C(n , Sn) 2 gn(Sn), n < N

Sx. if C(n.S,,)<g,(S,), f m a l l n < . V

T. if no replacement occurred be f me T.

and (AN. {gn(t). 0 5 n 5 N ) ) satisfy (3.4.1). For given X > 0. the unique

solution to the differential equations is obtained from (3.3.16).

Observe that g t ( 0 ) is a continuous strictly increasing function of A, g:(0) =

Cf and lirnA,, g$ (0) = x. Hence, there is a unique satisfying &" (0) =

Cry so that (3.1.1) determines the optimal control functions and the optimal

average cost uniquely.

Q.E.D.

Theorem 4.2. For any c > 0 there ezists :V E :V(e). such that the solution

to (3.4.1) determines an c-optimal policy, i.e., 0 5 A N - A' < c , where X' is

the optimal average cost.

Proof. From Theorem 3.2. the optimal stopping time r is either equal to

+x (the minimal repair policy with no planned replacement is optimal). or

Er < +w.

i ) If r = x7 the optimal average cost Xo = lim,,t,o h( t )EC(n . t) . and.

obviously. the following policy:

repair the unit iV - 1 times and replace at the X t h failure time SN is

c-optimal when N is large enough.

ii) If Er < +m, then.

so that

Hence. for any e > 0. there exists X ( E ) , such that AN(,) - A' < c .

If we further assume that

then we can find an easily computable upper bound for .V(e).

We have,

where T = h - l ( X * / C I ) . P u t A' = Cr/ESL and T' = h - l ( X 1 / C I ) < -I-=. Then

TI 2 T and since &(t ) is nondecreasing in t. KN(T t ) 2 K x ( T ) . If we choose

Y* to be the smallest .V such that X'Kx(T1) 5 e? then An- - X' 5 f . and the

policy determined uniquely by (3.4.1) for :V = X* is an e-optimal policy.

Q.E.D.

Based on Theorem 4.1. and assumption ( 3 . 4 2 ) : which is satisfied in most

practical applications, we have the following algorithm for the computation of

the E-optimal policy.

The algorithm.

Step 0. Choose E > 0 and put X L = 0. Xu = Cr/ESI.

T' = h-'(Cr/(ES1CI)), :V' = rnin{iV 2 1 : X'KN(T1) 5 € 1 2 1 ,

&1 = ln(2Cr/(~ESl))/ln2. i = 1.

Step 1. Put X = ( A L + Xo)/2.

Step2. Put T = h-I (X/CI). (t) = CI.

Step 3. Find the solution {&t). .... &-,(t)} to (3.4.1).

Step 4. If i 2 M. or g$(0) = C,. go to step 5 .

Otherwise, put i = i + 1 and compare &0) and C,.

If gt(0) < C,. put X L = X and go to step 1.

ICgt(0) > C,: put X L r = X and go to step 1.

Step 5. Stop. {gt(t): i 5 :V'} and T determine an e-optimal policy and

X is the corresponding average cost.

I t follows from Lemma 2.2 that X l V - - A' 5 €12 and from the definition of

.\.I in step 0, IX - 1 5 €12 so that IX - X'I 5 c.

Remark 4.3. In case the repair cost C ( n , t ) does not depend on n. the

algorithm for the computation of an €-optimal policy remains the same except

Step 3, where we need to solve only one differential equation. which is easier.

To illustrate the computational algorithm, we consider the following nu-

merical example.

Example. Put e = 0.01: C, = 1: Cf = 1. h( t ) = t (Weibull hazard function

with the shape parameter a = 2) and let C(n, t ) = (n + 5)/6 + U : where U is

uniformly distributed on interval [O, 21.

I

Figure 3.1: Optimal repair cost limits.

Using the above algorithm and MATLAB. we obtained X* = 15. .\.1 =

9.3. X* = T = 2.97 and the optimal repair cost limit functions depicted in

Figure 3.1. The optimal repair cost limit functions { g , ( t ) ) determine the

optimal policy. When the ith failure occurs at time t < T. the observed repair

cost C(i. t) is compared with gi(t). If C(i , t) is less than gi(t), the unit is

repaired, otherwise it is replaced by a new unit. If no replacement has been

carried out before T ! the unit is preventively replaced at time T . This plot

also provides useful information regarding the residual value of the unit. If a

unit that has age t has been repaired n times. then its residual value is equal

to g,(t) - C,. In particular. a new unit has residual value go( t ) = C, - Cf.

which is equal to the preventive replacement cost C,.

3.5 Optimal Policy in the Discounted Cost Case

In this section. we examine the structure of the optimal policy minimizing

the total expected discounted cost associated with the maintenance model

described in Section 3.3.

Let n > 0 be the discount factor and TC&) be the total cost of the i-th

replacement cycle.

Consider the repairfreplacement policy determined by a sequence of stop-

ping times T = (ri2 i = 1.2: ...), where T, is the replacement time of the i - th

unit. Then the total discounted cost over an infinite time horizon has the form

It is not difficult to see that the it is sufficient to consider the stationary

policy: i.e., T, = T for all i . The original optimization problem can then be

reduced to the following optimal stopping problem. Find

infi E ( T G ) E(1 - e-OT) '

and the optimally stopping time T: (if it exists)

(3.5.2).

(3.52)

minimizing the expression in

Using the notation in the Section 3.3? the total discounted cost TC,(r) has

the form

TGAr)

= ~ ~ ~ ~ ( ( C ( i . s,) - C1)e-." + Cfe-QS1+l Its.+lgJ )I{s.+T.I n;=b ~ s , ~ s T , 1

where C(0,O) 6 C,: and the stopping time T has the representation given by

(3 .213): T = ~ ( q {r}).

Next?

= X~Z: $2 e - Q ' ~ ~ s , + 1 2 t l d t ~ f , b 4sJt,g,) 1 do. {T , ) ) ,

Applying the A-maximization technique and the semi-martingale decomposi-

tion. we obtain the formulas for YxQ(o. IT ,} ) and f-;l(o, {x)), corresponding to

k;(n: {T , } ) and yA(a. {T , ) ) in Section 3.3. We have,

and

nf=b rt,+,,,}.

As Section 3.3.

T\ = inf{t : X 5 Cfh( t )} .

is the optimal preventive replacement time.

Vie have that

where

and

I'-"(A) = s u p t ~ ( s u p ( ~ , } ~ ( ~ ( ~ ~ {r ) ) ) ) (3510)

Finally, the maintenance decision problem reduces to the optimal stopping

76

problem for the sequence {I,V,"(A)} defined by

The optimal stopping time a&) has the form

Hence. the optimal repair/ replacement policy has the same form as in the

average cost case. i.e.. the unit is minimally repaired at the nth failure time

Sn if and only if S, 5 T and g,"(X,, S,) > C(n. S,). otherwise it is replaced.

A: is obtained from (3.5.8). If the unit has not been replaced at failure times

before T . it is preventively replaced a t time T , where T r TAG is determined

by (3.5.6).

Similar to the average cost case, we have the following recursive equation

for gt (A:, t). For simplicity. denote g," ( X i ? t ) as g, ( t ) , and A: as X .

And the differential form of the above equation is,

If the repair cost C(n. t ) does not depend on n, the optimal control function

ga(X,, t) and A, are obtained as the unique solution to the following differential

equation with the boundary conditions.

g(0) = C,. g(T) = Cf2 CIh(T) = A. (3.5.16)

where K(u) = P ( C ( t ) > u).

Note that when cr t 0. equation (3.5.15) has the same form as equation

(3.4.1) For the average cost case. Therefore, A: decreases to A* (where A'

is the optimal expected average cost), and the repair cost limit function for

the discounted cost case g" (A,, t ) increases to g(X'. t), the repair cost limit

function for the average cost case. Therefore. the espected average cost case

can be viewed as a special case of the expected discounted cost case.

Summary of Notation

C(n , t ) : repair cost of the nth failure a t age t

78

: failure lost

C,: preventive replacement cost


g,(t): repair cost limit function for the n - th failure


I<*: total discounted cost over an infinite time horizon

S,: n-t h failure time

TA: in f ( t : X 5 C I h ( t ) }

TC(t ) : total cost incurred up to age t

I*(,\): value function for the X maximization problem

: the objective function for the X maximization problem

a: discount factor

A': optimal average cost

~ ( a * , (T; 1): optimal stopping time

Chapter 4

OPTIMAL MAINTENANCE

POLICY FOR A GENERAL

REPAIR MODEL

4.1 Introduction

The concept of general repair was introduced by Kijima et al. (1988). They

assumed that the general repair can improve the condition of the system by

decreasing its virtual age. The minimal repair and replacement are special

cases of the general repair, the former does not change the system's age and

the latter reduces the system's age to zero.

The effect of a general repair on the system's condition is described by

a repair degree. The repair degree B=0 corresponds to replacement, 8=1 to

minimal repair, and 0 < 0 < 1 to a general repair that improves system's

condition.

The repair degree determines the virtual age that describes the condition

of the system. Two variants of the definition were proposed in Kijima (1989):

Type I, Vn = I.,-, + B,.Y,, and Type 11. V, = O n ( l ~ , - l + .\',). where C, is

the virtual age after the n-th repair. S,., is the length of the operating time

between the (n - 1) - th and the n- th repair and 8, is the repair degree of the

n - t h repair.

Some generalizations and modifications have also been proposed in the

literature. For example. Bavter et al. (1996) considered a model with I-, = 1'.

where E' is a random variable.

Kijima et al. (1988) considered a periodic replacement policy for a re-

pairable system with general repair, 4Iakis and Jardine (1991) proved the op-

timality of T-policy for their model, and Stadje and Zuckerman (1991) showed

that a bang-bang policy is optimal if the repair degree is a decision variable.

Dagpunar (1998) studied properties of a general repair process considering

Kijima's model 11, established monotonicity results for such a process and pre-

sented a computational method for evaluating repair density and the expected

number of repairs in a given time interval.

Scarsini and Shaked (2000) obtained bounds on the expected total profit

generated by an item subject to general repair and Zhang and Love (2000)

considered a Markov chain model under general repair and developed a simple

recursion to determine the optimal replacement policy.

Models with minimal repair and random repair cost have been studied

under different assumptions by Beichelt (1993). L'Ecuyer and Haurie (1987).

Jiang et al. (1998). and Makis et al. (2000), among others. In each case.

the optimality of a repair-cost-limit policy was established. The repair-cost-

limit policy prescribes replacement of a unit at failure time if the repair cost

exceeds certain limit and this repair-cost-limit at age t can be interpreted as

the residual value of the unit. The age replacement policy is a special case of

the repair-cost-limit policy, i.e.. when the residual value of the unit decreases

to zero, a preventive replacement should be carried out.

The objective of this chapter is to investigate the structure of the average

cost optimal policy for a model with general repair and preventive replace-

ment. This model is a natural extension of the minimal repair models pre-

sented in Chapters 2 and 3. Using results from the theory of jump processes.

X -maximization technique. semi-martingale decomposition and dynamic pro-

gramming approach, we will show that the optimal policy is a combination of

a repair-cost-limit policy for failure replacement and a virtual age-based pre-

ventive replacement. We consider Kijirna's Type I model with general repair,

other models can be analyzed using the same approach.

4.2 Model Description and the Main Result

We make the following assumptions.

1. System deterioration.

The time to failure has distribution function F ( t ) . density f ( t ) and the

failure rate h ( t ) = J ( t ) / ( l - F ( t ) ) , which is a nondecreasing function of

t. i.e.. F is IFR (increasing failure rate). For simplicity assume that F

has a full support on (0. +m)? and h(m) = x.

2. Maintenance actions considered: preventive replacement. failure replace-

ment. and general repair. All actions take negligible time.

9. Cost structure. Preventive replacement cost Cp7 and failure loss CI

are assumed to be given constants. The repair cost C(v) is a ran-

dom function of the virtual age u with distribution function G,(c ) , and

C(vl) C(,y) (stochastically) for any ul < UZ The costs incurred

at the n-th repair epoch. preventive replacement epoch and failure re-

placement epoch are (?(I.;) + Cf. C, and C, + Cf, respectively. where

83

I/,- = Vn-l + A', is the virtual age just prior to the n - th repair. The

repair cost is observable at failure time.

4. Type of general repair. Kijima's Type I general repair model is considered.

1.e.,

r; = c; - , + B(c;;.c(I;,-)).Y,.

The repair degree B(v , C(u) ) is a random function of the virtual age c.

and the random repair cost C(u). We assume that B ( u . Gyl(p))

B ( u 7 G; ' (p ) ) for any 0 5 p 5 1 and u < u. where G;' is the inverse func-

tion of G,. We will write B ( v . C) as B ( u ) . The repair degree represents

the available information about the systemk condition after a repair.

We assume that 1 2 B(v. C ) 2 e > 0.

5. Objective.

Find the repairlreplacement policy minimizing the long-run expected

average cost per unit time.

Figure 4.1 illustrates a sample path of the failure rate and the repair cost.

where Si is the ith failure time and Ci is the repair cost at time S,. The

state of the operating unit is described by ( u . x ) . where -u is the virtual age

immediately after the last repair and I is the length of the operating time since

the last repair. The state of the unit at the n-th failure epoch is described by

(v, I, C), where (u, x) = X,,) and C = C(\-) is the repair cost. Finally,

the state immediately after the n-th repair is (C,, 0). We write this state as

1:;.

Figure 4.1: Sample path of the failure rate arid the repair cost.

The main result obtained in this chapter is summarized in the following

theorem.

Theorem 2.1. T h e optimal policy exists, and it is a combination of a gen-

eralized repair-cost-limit policy for failure replacement and an age-based policy

for preventive replacement. i e . . the optimal replacement t ime r is determined

S n + ( T ) - ) if no replace.ment occurred a t or be fore Sn-

and Sn-I + (T(V,-1) - 1/,-1) < Sn,

(4.2. I )

where Sn is the n-th failure time, Cn C(I;;) is the repair cost at Sn.

The repair cost limit g: (u . x. C ) is calculated from

g ( t ) satisfies the following integral equation

g ( t ) = F ( U ) [ A - h(u)(Cf - &. u - t ) ) ] i d u / F ( t ) . (4.2.3) t

where

g2(u. I) = ~ ~ ( g ; ( u . r. c)). ( 4 2 . 4 )

and

g ; ( ~ X. C) = [ E O [ ~ ( U + e ( ~ + I? c)x ) ] - c]+. ( -1 .23)

g(0) = C,, and g(T) = 0 f o r T = i n f { t : X 5 h ( t ) C f } . Here we use the

notation (x)+ = max(x , 0).

where

and X is the optimal average cost.

Remark 2.2. Functions g. yo, g;. g: and g2 have the following meaning.

g ( u ) = go(u. 0) is the residual value of the repaired unit in state ( u . 0).

go ( u . x) is the residual value of the unit in operating state (z.. x).

gJu, x. C ) is the residual value of the failed unit before repair. given that

the repair cost is C.

$(v. x. C) is the residual value of the unit after repair.

g2(u: Z) is the mean residual value of the failed unit in state ( u . r ) before

assessing the repair cost.

The optimal policy can be expressed as follows.

Preventive replacement is carried out at virtual age T(t.7) when the residual

value of the unit go(v, T(v ) - u ) reaches 0.

Failure replacement is carried out a t failure time Sn when g; (CL-,, S,. C,)

reaches 0, or equivalently. when the repair cost Cn = C(I;,-) is greater than or

equal to g: (Vn- dYn Y,: Cn).

To summarize, the unit is replaced when its residual value reaches zero.

In addition: it is easy to see that is the first time such that g(T) = 0.

i.e., the residual value of a unit starting with virtual age T is equal to zero.

To prove the theorem, we formulate the decision problem in continuous

time and then reduce it to a discrete time optimal stopping problem by using

results obtained in Chapter 3. Finally, the form of the optimal policy is found

by using a dynamic programming approach.

4.3 Problem Formulation and Analysis

Let t be the time of the first replacement if a new system is installed at time

zero. We define TC( t ) . the total cost incurred up to time t . as

where Si is the i-th failure time, So = 0. C, is the repair cost at the i-th

failure time, N( t ) is the number of failures before t. X ( t ) = I:'=, I{s,<t). and

I is the set indicator function. We denote Bi as the repair degree of the i - th

repair.

Let (Ft) be the (completed) natural filtration of process {TC( t ) , t 2 0).

For the average cost criterion? the maintenance decision problem can be

formulated as follows. Find an (Ft)-stopping time rat if it exists. minimizing

the long-run expected average cost per unit time given by

By using Lemmas 2.3 and 2.1 in Chapter 3, we will show that this continu-

ous time stopping problem can be reduced to a discrete time stopping problem.

To keep self-continedness of this chapter and keep notation consistency, we

list these two lemmas again as Lemma 3.1 and 3.3 as follows.

Lemma 3.1. Any (Ft)-stopping time T has a representation

where 0 is an (%.)-stopping time, TA is a constant and TA i s %,-measurable

for n 2 1. R, = C T { ( S , . C ~ , O ~ ) , ~ 5 n } FSn. We define 1. Conversely,

for any (?in)-stopping time a. constant Ti 2 0 and R,-measurable functions

TA 2 S,, n = 1: 2.3 ...: r(o. {T;}) defined bg (4.3.3) is an (Ft)-stopping time.

Lemma 3.2. Let T be an (Ft)-stoppdng time, (cr:{T:}) be a representation

of r and let TC(r) be the total cost incurred up to time r . Then.

where Go = C, + Cf.

Lemmas 3.1 and 3.2 provide the characterizations of any (&)-stopping

time T and the corresponding total cost TC(r ) in terms of o and {l;'),fm. This

representation of r corresponds to a decomposition of a replacement policy

into competing unscheduled failure replacement 0 and scheduled preventive

replacement {q'):OD, and it leads to the following equivalent formulation of

the continuous time stopping problem in (43.2) . Find

and (o*, {Tr }) minimizing E(TC(a? {T"') (if they exist), where the infimum is E(+. {TiW

through (Xn) -stopping times cr and the random variables IT:}. is Ri -measurable.

T: 2 Si for i 2 1. Ti 2 0 is a constant.

It is clear that this problem is equivalent to the following sequential mini-

mization problem

A' = inf

In fact. if A, < A * . then there

sponding average cost A' < A* .

exists a r1 = (0 ' . {T:'}) such that its corre-

which is a contradiction.

The optimal stopping time ~(o*: {Tr }) and the minimum average cost X'

can be found by solving the following A-maximization problem (see e.g. h e n

and Bergman (1986)). For X > 0. find

where

Then,

A' = sup{A : V(X) < 0)

and (om, {T:*)) maximizes the right-hand side of (4.3.7) for X = A'.

The maximization problem in (4.3.6) is further simplified by removing the

martingale part from k\(o, {T:}) defined by (4.3.7) through conditioning (semi-

martingale decomposition, see e.g. densen (N89)), i.e.. we can consider the

following problem. For X > 0. find

where

be obtained from (4.3.8) and the optimal stopping time r(a8. {TI*)) for the

problem in (4.3.5) is the stopping time maximizing E ( P ~ (o, {T:))) for X = A*.

From the last expression in (4.3.10), it is clear that, although T, by def-

inition is a 'Hi-measurable function, it can be further restricted to depend

only on C; without loss of optimalitp because of the Markovian property of

the model. T, represents the virtual age of the system just prior to the n-th

scheduled preventive replacement time.

Thus we have reduced the original decision problem to a discrete time s t o p

ping problem (for a). with an embedded one-dimensional optimization prob-

lem at each stage (for T, at stage 1 ) . We can now use a dynamic programming

approach to solve this problem.

4.4 Dynamic Programming Approach

To find the form of the optimal policy, we only need to consider the following

maximization problem:

where EL is the expectation for the system starting a t virtual age t. For X

equals to the optimal average cost, g(X, t ) represents the residual value of the

unit with the initial (virtual) age t. In particular. g(X.0) = V(X) + C,.

Conditioning on the information at the first failure epoch, we obtain the

following dynamic equation:

where g2,x(v. x) = EC[g,,,(u, x. C)], and gcA(o. x. C) = (gEA(u. x. C) - C)'.

and g[,(u, x. C) = EB[g(X. u + B(u + x. C)x)].

The first equality is obtained from the backward optimality property of

the optimal policy. and obviously? " st' becomes '' =" iff o is of the following

form: replace the unit a t the first failure time. i.e.? o = 1 if g L x ( t . s - t . C) =

0: or g [ A ( t , s - t .C) 5 C; otherwise repair it. i.e.. 0 > 1. Obviously. this

o corresponds to the optimal policy for failure replacement. and g 2 , A ( ~ . x )

and g;,(u. x9 C) have the interpretation of the residual values as discussed in

Remark 2.1.

Later we prove that g(X. t) decreases in t. From this and from the mono-

tonicity of C ( t ) and B(t), one can show that g2,J(2)! x - U ) decreases in z for

x > u. Hence?

[ A - h(s)(Cf - g2,X(t. s - t ) ) ] F ( s )

decreases in s beyond zero. From (4.4.2): we have that

t )

= Js[A - h(s)(C, - ~ ? , ~ ( t . s - t ) ) ]+F(s )ds /F( t )

= J""'[x - h( s ) ( C j - g2,&. s - t ) ) ] F ( s ) d s / F ( t ) ,

where T ( t ) = i n f { s 2 t : X 5 h(s)[Cf - g l , ~ ( t . s - t ) ] } .

Thus. if g(X. t ) decreases in t , T ( t ) maximizes the right side of (4.4.').

Obviously. T ( t ) is the virtual age corresponding to the optimal preventive

replacement time.

- Lemma 4.1. T ( T ) = T. and it is the lower bound of T ( u ) . T ( v ) is also

bounded from above. Denote this upper bound as Tu < 3c.

Proof. From the definitions of T (o) and T. and from g2,,(.u, z) > 0. we have

- that T(u) 2 T . Since 92.X(T? 0) = OI we have that T(T) = T. Hence. T is the

lower bound of T(u) .

On the other hand, Tu 5 T/c. where e is the lower bound for B(c. c). To

see this. we need to prove that for any u < T . T(u) 5 T/c.

We first notice that from (4.42): g(X, s) = 0 for s > T. Therefore, o +

e(T/e, c)(T/c - V ) > and

92 ,A (~7 - V )

= E C [ g l , A ( ~ , TI6 - U. C)j

= E ~ [ ( E ~ [ ~ ( x : u + @(TIE: c)(T/E - u))] - c)+]

= 0.

and from h(T/c)ci > A. we have that T ( v ) 5 T/c . Consequently. TL. =

sup , ,T(~) 5 T I C . This completes the proof.

Q.E.D.

?iow, the key step is to show the monotonicity of g(X, t). In the rest of this

section, we first prove several lemmas which will lead to the monotonicity of

s(A t ) .

Define operator L on Rp: the space of continuous. decreasing, non-negative

functions with support belonging to [O. F] as

and write (44.2) using (4.4.4) as

L can be represented as

Tu - ( 1 ) ( t ) = F ( u ) [A - h(u) (Cf - W2(t? u - t ) ) ] + d ~ / F ( t ) , (4 .45)

t

where CV2(t. u - t ) = EC[(Ee[W( t + 0(t: C ) ( U - t ) ) ] - C)+]. We prove that L

is a contraction operator. Define

as the norm. Clearly. this function space is a complete metric space. i.e..

Banach space.

Lemma 4.2. If W ( t ) decreases in t . then CV2(u.x) decreases in both u and r .

In addi t ion, W ( u , t - u ) decreases i n u for u 5 t .

Proof. First. we prove the monotonicit? of i;(o, x) in both arguments.

We have for ul 5 L I Z ? xl 5 1 2 :

II'; (,ul . x L )

= LV&*, 4.

The first inequality follows from the monotonicity of W ( t ) and B(t. C) and the

second one from the monotonicity of W ( t ) and C( t ) .

Next. we prove that W 2 ( u . t - u ) is decreasing in u for u 5 t. We have for

u l < u2 < t ,

Therefore,

LC>(ul, t - u l )

Q.E.D.

Lemma 4.3. L is a contraction operator with contraction factor n = F(Tu) <

1.

Proof. We have for any bounded functions W' and W2,

1 1 L(LV1) - L(W2)l1

= S U ~ ~ I S U ~ ( ~ ~ ~ ~ , . ~ $:[A - h(s)(CI - CC;l(t. s - t ) ) ] F ( s ) d s / F ( t )

- S U P ( S S T ~ ) $:[A - h( s ) (Cf - bv(t+ s - t ) ) ] F ( s ) d s / F ( t ) 1

5 supt { s u ~ ( ~ < ~ ~ - J: 1 Vl;?l ( t . s - t ) - CC;2 ( t , s - t ) ( f ( . s ) d s / T ( t ) }

5 ~ ~ p t { s ~ ~ ( s g ~ , I: IEC[E6(W + 8(#s+ C ) ( S - t ) ) ) ] (4.4.8)

- ~ ~ [ E ' ( t r * ~ ( t + B ( s . C)(S - t ) ) ) ] l f ( s ) d s / T ( t ) }

I S " P ~ { ~ " P { S ~ T ~ } f'; l - qIf ( s ) d s / m ) }

5 supt 1 lwl - w21 I ( F ( t ) - P ( r , ) ) / F ( t )

F(Tu)lllC'l - W2jI,

which proves the contraction property.

It follows from the boundedness of T ( v ) and uniform continuity of functions

on this function space that L maps a continuous function to a continuous

function. Also, L maps a positive function to a positive function. which has

support in [O,T]. In the next lemma: we prove that L maps a decreasing

function to a decreasing function.

Lemma 4.4. L mops a decreasing junction to a decreasing function.

Proof. It follows from Lemma 1.2 that if W ( t ) decreases in t: then W ( u . x)

decreases in u and x. From this, we have that for any decreasing function II'.

In order to show that L ( W ) ( t ) 5 L ( W ) (s) for t > s 2 0, it is sufficient to

prove that for any t l > t . there exists sl > s, such that

Since F ( x ) is continuous by assumption. we can choose sl satisfying

First. we prove that

To see this. notice that from (4.4.11), we have

Therefore. I:' h(u)du = I," h(u)du and. consequently. t t - t 5 sl - s, and

t L 2 S l .

We have for u > t.

u-(t-s)

h(u)dvl

The last inequality is equivalent to (-I.-L.l2).

Next: we will show that

- Indeed, for any O 5 z 5 1 - ~ ( s ~ ) / F ( s ) = 1 - F ( t l ) / F ( t ) . define x and y

It is easy to see that y 2 x, i.e.,

(4.4.14) is equivalent to

Denote W(u) = I.V2(t, u - t). It follows from Lemma 4.2 that m ( u ) is a

decreasing function of u. We also have from Lemma 4.2 that, CV&. u - s) 2

L C 5 ( t , u - t ) . Hence,

6' kV2(st u - s)dGs(u)

J:' V ( U ) ~ G , ~ ( U ) .

It is now sufficient to prove that

Then. combining (4.4.19) and (4.4.18), we obtain (4.4.17).

By using the variable transformation 2 = Gs(u): 2 = G v ( u ) on both sides

of (4.4.19). we get

From (4.4.16) and the monotonicity of I.C;(s. u - s ) in u. we see that (4.4.20)

holds and this completes the proof of (4.4.14) and consequently. the proof of

Lemma 4.3.

201

4.5 Proof of the Theorem

To prove the Theorem, we first need to verify the monotonicity of g(X.t).

Since it is difficult to do this directly, we consider the following truncated

problem, which not only leads to the monotonicity of g(X, t ) ; hut also provides

an algorithm for computing g(X. t ) .

Lemma 5.1. Assume that the unit can be repaired at most 1V - 1 times for

some N 2 1. Then, there exists a series of functions gn(A. t ) . n 5 iV. svch

that

s n , the first Sn such that

if no replacement occurred at or before

if none o f the aboue occurs be fore Slv (4.5.1)

is the optimal replacement time, where g:,, (A, u , r , C) G E' (~ , (X . u + 6(1; +

The repair cost limit functions g,(X, t ) are the unique solution of the jol-

lowing equations

where T = inf { t : X 5 Clh(t)} < x, and X is the optimal average cost for

this truncated model. I n addition, g,(X, t ) increases in n and decreases i n t .

Proof. This truncated problem can be formulated as a standard finite stage

dynamic programming problem. In fact.

The monotonicity of g,(X. t ) in t follows from Lemma 4.3. The monotonicity

in n is obvious from (-1.53). The other boundary conditions can be verified

easily.

Q.E.D.

We can now provide the proof of the main theorem.

203

Proof of the Theorem. From (4.4.1) and (4.5.3). we have that lim,,,g,(A. t ) =

s(A 0-

On the other hand, from Lemma 4.3? Lemma 4.4. and comments in be-

tween, L is properly defined on the space R p and is a contraction operator

with contraction factor cr = F(Tu) < 1. Consequently, limn,,gn(X. - ) =

li.m,,, L n ( g o ) ( X : .) = .). Applying the fixed point theorem to contraction

mapping L, we have that q(X..) is a unique fixed point of operator L. i-e..

g(A, .) = L(g) ( A . .).

Therefore. q(X. .) = g ( X . a). and consequently. g(X. .) = L ( g ) ( A . -).

Since the optimal value function g ( t ) of the original problem equals to

g ( A . t) for X = Xb.where h* is the optimal average cost. we obtain that g = L ( g ) .

The monotonicity of g follows from the monotonicity and convergence of

g n ( K *).

To verifv the boundary conditions. notice that gn(X, T ) = 0 and, conse-

- quently, g ( X ? T ) = 0 since g ( A , T ) = limn,, gn(X, T ) . In particular, g(A ' . T) =

0, i.e., g(T) = 0. g ( 0 ) = C, is obtained directly from (4 .4 .1) .

g ( t ) is the optimal value of the maximization problem (4.41) obtained by

applying policy (4.2.1) with parameter X = A'. Therefore. policy (4.2.1) is the

optimal policy. This completes the proof of the Theorem.

Finally. it is easy to show that lim,,,XA = A'. where X i is the optimal

average cost for the truncated problem.

Q.E.D.

Based on Lemma 5.1, and the proof of the main theorem, we propose the

following computational algorithm to compute g ( t ) .

The algorithm.

Step 0. Choose c > 0. Put X L = 0. XI; = Cr/ESI.

T' = h-'(C,/(E&C,)).

Step 1. Put X = ( A L + Xo)/2.

Step2. Put T = h - l ( h / C I ) , g&. t ) = 0.

Step 3. Compute A(,\, t ) = L(gnd1 (A. t ) ) = Ln(go(X. t ) ) . Based on the

contraction property of L. g,(X. t ) converges to g(X. t ) . If 1 lgn+l(A. .) -

g,(XI -) I I < c. go to Step 4. and put g ( h . t ) = g,+l(X, t ) .

Step 4. If I X L r - X L I < go to Step 5. Otherwise, compare g(h.O) and C,.

If g ( A , 0 ) < C,, put X L = X and go to step 1.

If g ( A , 0 ) > C,? put X c = X and go to step 1.

Step 5. Stop. g(X, t ) is the optimal repair cost limit function and X is the

corresponding average cost.

4.6 Special

Case 1. $(t ,C) is a

Cases

deterministic function of ( t . C) .

The optimal policy is a combination of a repair-cost-limit policy for failure

replacement and an age-control-limit policy for preventive replacement. i.e..

there exists a repair cost limit function g(t) . and a control-limit function T ( u ) ,

such that

the first n such that C, 3 g(C,).

S n + ( T I . - I ) if no replacement occurred be fore S,-l

(4.6.1)

is the optimal replacement time, where C, is the virtual age after the n-th

repair, and X is the optimal average cost. Obviously. in this case. I,, is deter-

ministic for given (CL-, , S,) and C, = C(I.k).

The repair cost limit g( t ) satisfies equation g = L(g). where L is defined as

(4.62)

g(0) = C,, g(T) = 0 for T = in f { t : X 5 h(t)CI) .

The age-control-limit function T ( v ) = in f { t : X 5 h( t ) (CJ - Ec[g(zl +

W ) ( t - 4) - C(t) l i ) ) -

Remark 6.1. It is obvious that the above policy is optimal for the following

generalization of this special case: the repair degree 0(t , C ) can be a random

variable of t and C , and both the repair cost and repair degree are observable

at each failure time before repair is carried out.

Case 2. 0(t) = B(t, C ) is a random function of t independent of C.

The optimal policy is a combination of a repair-cost-limit policy for failure

replacement and an age-control-limit policy for preventive replacement. i.e..

the first n such that Cn 3 gi(C,-l. S,).

S n + ( ( I . - ) if no replacement occurred bejore

(4.6.3)

is the optimal replacement time, where g:(zl? r ) E g t ( u . x? C) = E'~(u + 8 ( u +

42).

Case 3. Deterministic repair cost C( t ) and constant repair degree &.

Since this is a special case of Case 1, the optimal policy is further simplified

and the optimal replacement time has the following form:

the first n such that X,, 2 Tl(I.,-1) - L - 1 .

S n L + ( ( ) - ) if no replacement occurred be f we

(4.6.4)

'='a- g ( t ) = / F(u){A - h(u)(Cf - [g ( t +Oo(u - t ) ) - C(u)] ' ) j idu/P(t) . ( 4 6 . 5 )

t

Case 4. Minimal repair model. For 0 = 1, the model is a minimal repair

model. and we have from (4.4.3)

where T = in j { t : X 5 C f h ( t ) } and G t ( x ) is the distribution function of C ( t ) .

Differentiating (4.6.6). we get

with boundary conditions g(X. 0) = C,, g(X. r) = 0. This result coincides with

the result in Makis et. al. (2000) in the case when the repair cost does not

depend on the number of repairs.

The optimal policy has the following form: preventively replace the unit

at age T if no replacement occurred before r. If a failure occurred at t < r. replace the unit if the repair cost C(t) > g(t) , and repair otherwise.

Figure 4.2: Optimal policy for a minimal repair model.

4.7 Example

Consider Special Case 1 with the random repair cost C uniformly distributed

on [O, 0.51. Put c = 0.01. C, = GI = 2, h( t ) = t and

112 when C E [O. 0.3) q t . C) =

?Ve programmed the algorithm in Section 4.5 in MATLAB. We discretized

the time interval using il = 0.02. applied spline interpolation, and iterative[?*

solved the truncated repair problems for N 5 7. It took about 5 minutes on

a Pentiurn I1 366 PC to obtain the results. The value of the optimal average

cost A* = 2.692. The values of the optimal repair cost limit function g ( t ) and

109

the optimal age-preventive replacement limit function T ( t ) are listed in Table

4.1 and the graphs of these functions are in Figure 4.3.

Table 1.1: Optimal control limits g ( t ) and T ( t ) for different values of t.

The optimal policy can be described as follows. The unit starts in state

(u? x) = (0: 0). If the unit does not fail in interval [O. T(O)], it is preventively

replaced a t time T (0).

If the first failure occurs at time xl < T(0) ? and the repair cost is CI: then

a replacement is carried out if Cl 2 g(vl), where vl = xlB(xl,C1). The unit

Figure 4.3: Optimal repair/replacement policy.

is repaired if CI < g(v l ) .

After the first repair. if the operating time +2 of the unit is greater than

T ( q ) - u l ? the unit is preventively replaced at time X I + T ( u l ) - u l . If the

second failure occurs in interval [ x l . x1 + T(wL) - u 1 ) . the repair cost C2 is

estimated and the unit is repaired if C2 < g ( u 2 ) and replaced if C2 2 g ( 0 2 ) :

where 1.72 = U I + 0(u1 + 2 2 , C2)x2? etc.

For minimal repair, the virtual age is the same as the real age, and from

the definition of T( t )? T ( t ) - t = T - t (a straight line with angle - i r / 4 ) , i.e..

the control-limit function T ( t ) is a constant and Figures -4.3 and 4.2 coincide.

Summary of Notation

C(v): repair cost of age t

Cr: failure lost


f ( t ) : density function of the first failure time



V(X): value function for the X maximization problem (with initial age 0)

g(h, t ) : value function for the h maximization problem with initial age t

g ( u ) : residual value of the repaired unit in state (u. 0)

go(o: x): residual value of the unit in operating state ( u . x)

g; ( u . x ! C): residual value of the failed unit before repair. given that the repair

cost is C

g:(u, x, C): residual value of the unit after repair

gz (v , z): mean residual value of the failed unit in state ( u , x) before assessing

the repair cost

G,(c): distribution function of repair cost at virtual age v

L: operator that defines the value iteration

T(v ) : scheduled preventive replacement time at virtual age u

TC(t): total cost incurred up to age t

- T: in f { t : g ( t ) = 0)

: virtual age after the n - th failure

k;: the objective function for the X maximization problem

0: repair degree

A*: optimal average cost

r(ob , {T; 1): optimal stopping time

Chapter 5

OPTIMALITY OF

LEVEL-CROSSING POLICY

FOR A CBM MODEL

Introduction

We consider a maintenance model with partial information about the state of

the system, obtained through monitoring a signal process at equally spaced in-

spection times. .An example of a signal process is the overall vibration level of a

machine that is considered to be a good indicator of the machine condition(e.g.

Mitchell (1981)).

The evolution of the signal process is determined by random factors and

minor maintenance actions between inspections. We assume that a major

failure that requires an overhaul or replacement of the unit. occurs when the

signal process first exceeds a critical level. To model the situation where the

signal process carries only partial information about the machine state. we

assume that the critical level is a random variable independent of the signal

process. In practical situations. a failure may occur even when the signal level

temporarily decreases. which is expressed in this model by an assumption that

the critical levels might be different a t different inspection times. The profit

in the i th period ( between the ith and the ( i + 1)th inspection ) is a random

function of the signal level at the i t h inspection. and it includes the cost of

rninor maintenance in this period. The preventive and failure replacement

costs are given constants. The objective is to find the replacement policy that

maximizes the total expected profit during the machine lifetime.

X distinguishing feature of this model is that we do not assume monotonic-

ity of the signal process. We consider two kinds of maintenance actions and

random critical levels that define major failures. The effect of minor main-

tenance actions between two subsequent inspections on the signal process is

a decrease or increase of the signal level a t the next inspection by a random

amount that represents an improvement or worsening of the machine condi-

tion, respectively. This assumption is similar to the assumption of a general

repair which vvas introduced by Kijima and Sumita (1986), and has been dis-

cussed in detail in Chapter 4. Studies of these kinds of systems have also been

conducted by Stadje and Zuckerman (1991) and Makis and Jardine (1993).

The model also bears certain similarities with the shock models found in the

literature (see e.g. Taylor (1973), Zuckerman (1978) and Stadje (1994)).

Examples of other condition-based maintenance models include a state

space model considered by Christer and Wang et al. (1997) for furnace erosion

prediction and replacement. a counting process model by h e n (1996). and a

proportional hazards decision model developed by Makis and Jardine (1992)

where the hazard function of the system depends on its operating age and

on stochastic covariates that can represent the information obtained through

condition monitoring, such as spectrometric analysis of engine oil, over time.

The chapter is organized as follows. In Section 5.2, we describe the model.

formulate the maintenance decision problem and prove the existence of the

optimal policy. In Sectian 5.3. we will show that under weak monotonicity

assumptions the optimal policy is of a control-limit type. i.e.. replace the unit

if and only if the signal level exceeds certain critical limit. An algorithm for

the computation of the optimal control limit is developed and a numerical

example is given to illustrate the computational procedure.

5.2 Problem Formulation and Existence of the

Optimal Policy

We assume that the signal process is observable a t equidistant points of time

iil? i = 0,1, .. .? and the signal level at time iA is determined by

where .Yo = 1 is the normalized initial level and { d ) is an i.i.d. sequence of

random variables, independent of St. C represents the effect of random factors

and possible minor maintenance actions in the i th period (between the i th and

the ( i+ l ) th inspection). on the signal level at time (i+l)A.O < & < D < +x.

Ci can have the following interpretation. We can write C = dimi, where di > 0

represents the effect of random factors. mi = 1 if there was no maintenance in

the ith period, and mi = B if there was a maintenance. where 0 > 0 is a random

variable. 0 < 1(> 1) can be interpreted as an improvement (or worsening) of

the machine condition by the repair and 0 = 1 represents minimal repair which

has no effect on the signal level. We assume that E(1nC) > 0,i.e.. E(1nXJ >

E ( n 1 ) The major system Failure time T is defined as the first time the

signal process exceeds a critical level, i.e..

where Xn is the critical level at time nA. Since the signal process {&) carries

only partial information about the system and the system can fail even when

the signal level decreases, it is reasonable to assume that the critical levels H,

are random variables. We further assume that (31,) is an i.2.d. sequence with

distribution function F ( . ) , and that 1 < 31, 5 .-I < +m.

The profit in period i (after subtracting the possible minor maintenance

cost in that period) is S(Xi). a random function of the signal level -Yi, - B 5

S(X,) 5 B < +oo; Cp > 0 is the preventive replacement cost and Cp + Cf

is the failure replacement cost, CI > 0. Both Cp and Cf are assumed to be

constants.

The objective is to find the replacement policy that maximizes the total

expected profit during the system lifetime.

Define for n 2 l?

The problem can be formulated as follows. Find

sup Ek; 7

in the class of stopping times relative to the process history {F,), where

Fn = o{(.&, I{T>lJ)r z 5 n)? and a stopping time T' (if it exists), for which the

supremum in (5.2.4) is attained.

Lemma 2.1. ET < +m.

Proof. From the definition of T in (5.2.2),

T = inf{n 3 1 : S, > R,}

= inf(n 2 1 : inti > In?&,)

5 T4 r inf{n 2 1 : ~ ~ I $ n & > lnd}.

Since { S i ) is an i.i.d. sequence with E(ln&) > 0. it follows from Theorem 2.4

in Chow ( e t al. (1971). p.29 that ETe4 < +sc and hence. ET < +x. This

completes the proof.

Q.E.D.

Remark 2.2. If we denote

then it follows from Lemma 1 that E T ( x ) < +m for any x > 0.

In the next lemma. we will prove the existence of the optimal stopping time

for sequence {Y,).

Lemma 2.3. The optimal stopping time r* maximizing EY; exists.

Proof. It follows from Theorem 4.5' in Chow et al. (1971). p.82, that if

E(sup Y,C) < +a, the optimal stopping time exists. We have from (5.2.3):


Q.E.D.

The following Lemma will be useful to find the form of optimal stopping

time for our problem.

Lemma 2.4. [Chow et a/. (1971), Remark. p.lO5] Define

where {Zk} is a homogeneous Markov chain, w, ( . ) and yn(.) are deterministic

functions. Then, the optimal stopping t ime for {CI;,} has the fo~lowing forrn:

and Zo = r .

The optimal stopping time for our problem is determined by the following

theorem.

Theorem 2.5. Denote Z,, = (Sn: I{=,,)). The optimal stopping time r' /or

{E,) is given b y

where

i f z = (x, 0)

- C(zk) = [E(S(-&) I~yk) - C j P ( - Y k b > x k + 1 l - Y k ) ] I { ~ > k )

(5.2.9)

= C ( - Y ~ ) I { T > & } ,

und Zo = z. If z = (I, O ) , the system is in a jaifure state and the current signal

level is x.

Proof. We have from (5 .2 .3 ) .

( E d 10)

I t follows from the definition of Z, in Theorem 2.5 and from (5.2.1). {Z,)

forms a homogeneous Markov chain. Therefore. the optimal stopping time

r' can be found by applying Lemma 2.4, and (5.2.7) is obtained easily from

(3.2.5) and (5.2.10). This completes the proof.

Q.E.D.

Remark 2.6. The form of the optimal stopping time in (5.2.7) has been ob-

tained under weak assumptions regarding the signal process. We have assumed

that E ( l n 4 ) > 0, which is equivalent to the assumption of the monotonicity

of E(ln?i,), but the signal process needs not be monotone. This is important

for practical applications. For example, the overall vibration level typically

does not exhibit monotone behavior, but the process tends to increase on the

average, which can be expressed by the monotonicity assumption regarding

the mean value.

The optimal stopping time in (5.2.7) can be found in the general case by

computing function C ( z ) in (52.8): but the optimal replacement policy is not

necessarily of a control-limit type.

In the next section. we will show that if the expected profit in a period is

a monotone function of the signal level. a control-limit policy is optimal and

we provide an algorithm for finding the optimal control limit.

5.3 Optimal Control-Limit Policy

In this section? we assume that E(S(.Y)IS = x) is a non-increasing function

of x, E(S(S)I.Y = z) and F ( x ) r P(H, 5 x) are continuous. and C(1) > 0.

Denote

where C(&) is defined by (3.2.9). Then, it follows from (5.2.7) and (5.2.5),

that the optimal stopping time T' has the form:

Obviously, Cr(&) -C, is the optimal expected profit for our stopping problem.

In the next lemma. we will show how to obtain V ( x ) .

Lemma 3.1. Define for n 2 1 ,

LJr) = sup E ( I * ~ ) Y ; + C,. (r:r<n)

Then {Vn (I)} satisfies the following system of equations:

,where C(x) is defined by (5.2.9) and G(z) = P(& 5 x ) . Furthermore. C;,(r)

converges to V ( x ) uniformly on [a. -4) for any a > 0.

Proof. The equations in (5.3.4) are easily obtained using the definition

of C,,(r) in (5.3.3) and dynamic programming. Let a be a positive num-

ber. Obviously, for each x > a. (C,(x)} is a non-decreasing sequence. so

that lim,,, C,i (x) exists. Denote T; the optimal stopping time for which the

supremum in (5.3.3) is attained. Since V,(x) 5 V ( x ) for all x. we have

5 B limn+, E ( ( T ( a ) - n ) I { ~ ( a ) > n ) ) = 0.

so that VJz) + V ( x ) and the convergence is uniform. The last equation

follows from Remark 2.2. This completes the p r o d

Q.E.D.

In the nest theorem. me will prove the existence of an c-optimal control-

limit policy for any € > 0.

Theorem 3.2 For ang c > 0. there exist .Ye* 5 +m such that the following i s

an 6 - optimal policy:

replace the unit on failure or at the first time the signal level exceeds .Ye* ,.

whichever occurs first.

Proof. It follows from the continuity and monotonicity of E(S(1)I.Y = x)

and F ( x ) that Vn(x) is continuous and nonincreasing in z for n 2 1. Choose

any e > 0 and define

If .Y,: = +oo for some n, i.e.. I;,(x) > 0 for all x. then C'(x) > 0 for I > 0

and the optimal policy is the policy that replaces the unit only on failure. In

this case, we can define .YE = +m.

Next, we will prove that if Xi < +cc for all n. there exists no 2 1 such

that 0 5 I'(.Y&) 5 6.

We have for n 2 1,

0 < \,'(Xi) - \JAY;)

5 BE((T(xi) - ~ ) I { T ( . Y ; ) > ~ ) )

5 BE( ( T ( S ; ) - ~ ) I { T ( , Y ; --Sn-r+m 0-

The last two inequalities follow from the fact that {Xi} is non-decreasing and

from Remark 2.2.

From (5.3.6). we can see that there is no 2 1 such that 0 < C'(Si) 5 r for

all n 2 no.

Define ,Y; = Xi, and the stopping time r,:

We will show that r, defines an c- optimal control-limit policy. It follows

from the assumptions and from Lemma 3.1 that {VJx)) are continuous and

125

non-increasing in x. non-decreasing in n. and I;,(x) --t V(x) uniformly. so

that V(x) is continuous and non-increasing in x. Since 0 < V ( X 3 5 c. we

have from (5.3.2) and (5.3.7) that T, 5 r* and

EY,. = sup,l, Ek;

= E(C:L;' c(Si)I{~>i) + C : ~ ; ' C ( - ~ , ) I { T > ~ ) ) - Cp

E l , , + c.

so that El, 2 sup,, , El; - e and hence, r, defines an c-optimal policy. -

Q.E.D.

Remark 3.3. Since \'(I) is continuous and non-increasing, the optimal stop-

ping time r' defined by (5.3.2) has the following form:

where .Yt = inf{x : \'(I) = 0). i.e., a control-limit policy is optimal.

Denote .Y* = limn,+, Sl. From (S.3.6): C'(-Y;) 2 0 and limn,,, Ip(.Yi) =

0. From this, and from the continuity of C'(x) , we have that V(S') = 0 so

that ,Ye 2 .Yt. .-\ssume that S' > S t . Then there is mo such that X,; > .I-'

and we have from the monotonicity of I.&,(x), that

which contradicts (5.3.5). Hence, the optimal control limit .Yt = X*.

126

Lemma 3.1 and Theorem 3.2 provide a computational procedure for the

control limit ,Y; for any E > 0. which can be summarized as follows.

Step 1. Choose c > 0, set n = 1. calculate C ( x ) and find

Xi = inf{x > 0 : C ( x ) = 0).

Step 2. Calculate (or estimate) c, = B E ( ( T ( S i ) - n)IiT(,s;,,n)).

If En 5 E , set S: = X; and stop.

Step 3. If E , > e. set n = n + 1. calculate

D ( ) = C ) + i;;-,(ru)F(ra)dG(a)

and

X: = inf {I : C ~ ( L ) = 0).

Go to Step 2.

We now illustrate the computational procedure by the following numerical

example.

Example. Assume that the initial signal level & = 1 and -Yi = Xi- ,d,- mi-

for i 2 1, where {d,) is an i.i.d. sequence, di .- U(1, '2) and {mi) is an i.i.d.

sequence independent of {di), m, describes a minor maintenance action in

period i, mi = 1 if no maintenance was performed and mi = 0.5 if there was a

maintenance, i.e.. maintenance improves machine condition. We assume that

and the random critical level Ui - U ( 2 , 4 ) , where U ( a , b) denotes uniform

distribution on [a, b].

The expected profit in a period given that the signal level S = x is

E(S(.Y)IX = x) = exp (-x/2). the failure penalty cost Cf = 2 and B = 1.

We have from (5.2.9).

Obviously? for 0 < z < 1. P (X < r d m ) = 0. Assume that 1 5 x 5 2. By

conditioning on m and then on d. we get

P(H < rdm) = ~ P ( H n < rd) + fP (3 < +)

= $5: P('H < xz )dz + J: P(31 < 7 ) d ; (5.3.10)

- - 1 J ~ = ( ~ - 2)& = iX + & - i 6 r 2 3 '

Finally, we have the following formula for C ( x ) :

- r / 2 , .E € (0 : 1):

To apply the computational procedure. we first find the expected value and

the variance of in<, where = md. We have

We have considered c = 0.01 and computed the optimal control limits {Xi)

for n = 1, .... 500, which takes about 3 minutes on a 586 PC computer. After 40

steps, the optimal control limit stabilized a t value 2.10. Then using (5.3.9): we

have found an upper bound for ciao, which was 0.00066. Taking into account

other possible computational errors in each computation step, we are quite

sure that the policy with critical level 2.10 is a t least a 0.01-optimal policy.

The computational results are in Table 5.1.

Table 5.1: Optimal control limits for different values of n.

5.4 Conclusions

In this chapter, we have proposed a condition-based maintenance model for sit-

uations where the observed process carries only partial information about the

system and does not necessarily exhibit monotone behavior. The optimization

problem has been formulated as an optimal stopping problem and the struc-

ture of the optimal replacement policy has been found in the general case. We

have shown that under weak monotonicity assumptions, the optimal policy is

of a control-limit type and a computational procedure has been developed For

finding the optimal control limit. Numerical results indicate fast convergence

and the policy is easily implementable. The model is suitable in situations

where the production unit is frequently monitored and information is used for

planning major maintenance activities such as an overhaul or replacement of

the unit.

Summary of Notation

a*: max(a.0)

CI: failure loss


C(.Y,): expected net profit in period n with signal level .Y.

130

D: upper bound of <

F: distribution function of 3C

G: distribution function of <

'H,: random critical level

S(.Y,): profit in period n

T: failure time

I.'(x): optimal expected net profit with signal level x

( x ) : n-th truncated optimal value function

1,: signal level at time iA

.Ye: optimal preventive replacement level

: net profit up to period n

r': optimal stopping time

: random deterioration factor

Chapter 6

A CBM FRAMEWORK

BASED ON HIDDEN

MARKOV MODELS

6.1 Introduction

In recent years, Condition=Based Maintenance (CBM) is gradually gaining

its popularity in the reliabi1ityJmaintenance area from both practitioner and

researcher's perspective (see two recent survey papers on maintenance. Scarf

(1997) and Dekker and Scarf (1998): for more details).

In industrial practice. the engineering aspect of CBM has been undergo-

ing rapid development for decades. Many maintenance information systems

have been developed and are commercially available, with emphasis on con-

dit ion monitoring, fault detection, diagnosis and automation. Typical con-

dition monitoring techniques includes vibration monitoring, and oil analysis.

Equipped with these maintenance information systems. large amount of data

become available. which provides great potential for improving maintenance

performance.

Yet. at present time, most of these systems serve merely as maintenance

databases, used only for producing simple statistics for management report-

ing. Maintenance decision making are in general still based on field experts'

experience, which are normally not quantitatively justified. For those systems

that incorporate maintenance optimization features: the policies in most cases

remain age-base policy type or level-crossing type. which are too rudimentary

with respect to the rich availability of information.

In the research community. CBhl is not a new concept. Relevant work

can be found under the titles of CBhl. information- based maintenance. predic-

tive/proactive maintenance. etc. Yet it seems to me that there is a latency for

the theoretical development in the management aspect of maintenance. espe-

cially in maintenance optimization. -1s a result. a gap between the engineering

aspect and the management aspect of maintenance exists. Fortunately. the

common awareness of CBkI concept now serves as an umbrella that enables

researchers to share and combine strength in this promising area.

It is beneficial to view the general maintenance optimization problems.

which is the main theme of this thesis, from the CBM perspective. The

essence of CBM is simply utilizing available information to support optimal

maintenance decision making. In this sense, any maintenance system has to

be condition-based.

We now summarize this chapter as follows.

In Section 6.2, we provide a literature survey. focusing on various mathe-

matical methodologies and models that are related to CBM optimization.

In Section 6.3, we propose a CBM model which matches the abstract CBM

framework closely. This model is then transformed to a simpler form in Section

6.4, and finally solved in Section 6.5.

6.2 Literature Survey

The French school of "general theory of stochastic processes'' provided the

theoretical foundation for CBhI mathematical modeling and optimization. For

good survey papers see Arjas (1989) and Jensen (1996). While major efforts

in this category are on statistical analysis and filtering of stochastic processes.

there is also a considerable amount of research on maintenance optimization.

which is quite often based optimal stopping theory.

Based on the concept of filtration (as well as subfiltration)! different levels

of information can be considered. Transformation from high to low level infor-

mation can be carried out by applying the " projection theorem". which cor-

responds to estimating system condition based on partial information. Hein-

rich and Jensen (1992) considered in detail an optimal replacement problem

for a two-unit non-repairable system with different information levels; Jiang

and Cheng (1995) applied this approach to single-unit repairable systems. and

conducted policy optimization and policy comparison for a dozen well-known

policies based on the information level each policy can utilize.

The concept of stopping time is defined based on the filtration. The optimal

stopping time implies that full information in the filtration has been utilized.

Thus this approach has advantage over those methods in which policies are

limited to presumed forms, such as well-known age replacement. block re-

placement. and so on. Interestingly enough, many of those well-known policies

derived from intuition are indeed optimal stopping rules with respect to the

properly selected filtration. Standard references on optimal stopping theory

are Chow et al. (1971) and Shiryayev (1978). For its application to reliabil-

ity/maintenance, see e.g. Bergrnan (1978) and Jensen (1989).

Optimal stopping rules often take the control-limit form, which is intuitive,

easy to calculate and easy to implement in practice. In particular, when the

system has certain monotonicity property, the control-limits are easy to obtain

and have the following intuitive meaning, i.e., when the loss (rate) is larger

than the gain (rate), then stop the system, otherwise, repair (or continue) it.

This is the reason that almost all calculated models using optimal stopping

assume certain kinds of monotonicity property. (see eg. . h e n (1983), h e n

and Bergman (1986).

One interesting paper that incorporates both 4Iarkov-modulated process

state estimation and optimal stopping is by Jensen and Hsu (1993). In that

paper, a weak form of monotonicity assumption is used. i.e.. the monotonicity

assumption is replaced by a submartingale property. -4 stopping rule with a

weak sense optimality is derived. i.e.. an optimal stopping time with respect

to a subfiltation. which means less information is utilized. This paper provides

a good example of providing a suboptimal policy when the optimal one is

not easy to generate. Also, it provides a good example of combining state

estimation and replacement optimization.

The level-crossing approach is well accepted by the industrial community.

Even though it may not always provide the optimal solution, it is well under-

stood and is easy to implement.

Level-crossing approach refers to the following scenario: a set of variables?

which are selected using technical considerations, are observed, and a repair or

replacement is initiated whenever any of them exceeds a preset control limit.

Normally, these monitored variables are certain measurements that reflect the

wear or damage degree of the system. Commonly used measurements are

vibration monitoring and oil analysis among others. For properly selected

variables, they provide informative indication about the system's condition.

.A considerable number of level-crossing related models exist. such as dif-

fusion processes (observable) with drifts. and discrete time independent incre-

ment models. The random failure limit can be incorporated to emphasize that

the observed signals are just partial information Jiang et al. (1998). See also

Christer and Wang (1997). where the determination of optimal inspection is

the major concern.

Accelerated aging models can also be thought of as special cases of the level-

crossing model, in which a transformation (maybe random or deterministic)

from age to accumulated stress are used to describe the random environment

effects, see e.g. Doksum (1991).

Marked point processes are suitable for modeling the shock processes where

the underlined counting processes represent the number of occurred shocks,

and the marks represent the damage degree of each shock. For the first work

of such kind see Taylor (1975). where the optimal replacement policy is proved

to have a level-crossing form. See also Arjas (1989) for a comprehensive theo-

retical summary.

Proportional Hazards Model (PHM) is another widely accepted model

which incorporates both age information and condition information (covari-

ates) in the most natural manner. .A few maintenance policies have been

developed for these kinds of models, among which. Makis and Jardine (199%)

derived the optimal replacement policy for a model with equal inspection in-

tervals. Kumer and Westberg (1997) combined this model with a TTT plot

approach.

Some extensions of age-based models. such as group maintenance models.

random repair cost models and general repair models can be also viewed as

CBbI models in the sense that additional randomness is introduced in the

modeling of system deterioration. and consequently. the optimal maintenance

decision is based on the system condition instead of the calendar time. The

group replacement model is an extension from single-unit system to rnulti-unit

system, where components in a system are mutually dependent because any

replacement of component is subject to a fised installation fee in addition to

the cost of replacing each component. For a good survey. see Van der Du-m

Schouten (1996). In this class of models, the age information of all components

forms an entire information database and good maintenance should properly

utilize it.

The random repair cost model is a single-unit system model with random

repair cost as the information additional to its age. As economic considera-

tions are important for maintenance practice, such kind of generalization is

appealing in practice. In addition. under certain monotonicity assumptions

representing the deterioration of the system, it can be proved that the repair-

cost-limit policy is optimal? i.e., when the repair cost exceeds an age-dependent

limit, then replace the system. otherwise. repair it. This cost limit has a very

intuitive meaning, i.e.. it is the residual cost of the system after the repair.

The preventive replacement time can also be expressed in terms of residual

value. i.e.. it is the time when residuai value decreases to zero. See Chapters

2, 3 for more detailed information.

General repair concept was first introduced by Kijima et al. (1988)? and

several maintenance models have been proposed, see e.g. Kijima et al. (1985)

for s periodic replacement policy: Makis and Jardine (1991) for the optimality

of T-policy: and Stadje and Zuckerman (1991) for the optimality of a bang-

bang policy when the repair degree is a decision variable.

This concept generalizes the concepts of minimal repair and replacement.

because it assumes that the general repair improves the condition of the re-

paired system to a certain degree better than that after a minimal repair. but

worse than that after a replacement. A concept of virtual age can then be

defined as a function of the real age and the repair degree, which directly rep-

resents the condition of the system. Therefore it is natural to view the virtual

age concept in the domain of CBLI. see Chapter 5 for a comprehensive opti-

mization model based on general repair, which incorporates features such as

random repair cost. preventive replacement/failure replacement and optimal

stopping.

Time series and state space approach represents a wide and mature area.

including optimal filtering and control which can be applied to CBM area.

A typical paper following this direction is by Christ Sr Wang (1997), where

Kalman filtering is used for predicting the residual life and a suboptimal re-

placement policy is derived based on the prediction. The major feature of this

model is that there are two processes involved, an observation process and a

state process, which suggests a more general framework, partially observable

process modeling and control.

-4 promising approach is the hidden Markov model(H1LllvI). which is also

called partially observable 4larkov decision process (PO bIDP). Developed in

early 60's. it has shown wide range of applications in engineering area. Typical

applications include speech recognition, fault diagnosis, demodulation, robotic

control. artificial intelligence. etc. For standard theoretical reference. see El-

liott (1995). There has been some research works on maintenance optimization

based on HLIM, see eg. Fernandez-Gaucherand et al. (1991) and Smallwood

and Sondik (1973), and Hernandez-Hernandez et aL(l999). However, mainte-

nance models in these works were mainly used as illustrative examples. More

modeling work has to be carried out to transfer the theoretical developments

into practical applicable results.

Besides aforementioned research which can be safely classified as main-

tenance models, there are also interesting works from other areas, such as

survival analysis. sequential analysis. signal processing? diagnosis and esperi-

ment design among many others. see eg. Saaty and Vargas (1998): and Zhou

et aL(1996). Broader survey into literature is highly appreciated for CBSI

research.

In this Chapter. we propose a CBbl framework based on HMhI. The HMhI

model we use has the form of continuous time horizori and discrete observation.

which has not been seen in the literature. We expect that this model will be

a starting point of a continuous effort for CBSI modeling and optimization in

the HMM framework.

6.3 Model Description

We make the following assumptions.

1. System Dynamics: the system operates in one of iV unobservable states

(1,2. ..., N ) = SS? over a continuous time horizon. Denote the state at

time t as St. Then, (&) forms a right continuous homogeneous Markov

chain with transition rates

P ( X h = jlX0 = i) qj i = lim

h < ~ , Z # ~ E S ~ ,

h+O+

The system failure at random time 5 is self-announced, and the hilure

rate in state i is pi c m. i E S".

2. Observations: measurements are taken at discrete times kL. k = 1. '2. ....

with value k; E {I. 2. .... .\.I) = s', satisfying

3. Cost Structure: the profit rate in state i is Ci, and the cost of system

failure from state i is Ki for i E S".

4. Maintenance Actions: preventive replacement and failure replacement are

considered.

5. Objective Criterion: find the preventive replacement policy maximizing

the expected net profit over the system's lifetime.

142

To simplify the presentation. we define the extended state space and ob-

servat ion space

3' = S'U {W) , M' = 1L.I + 1: where

N': Mf represent the failure state and the failure signal, respectively.

it' In addition, we denote for i E SS. qilvt = 0, q ~ ~ ~ = pi , qii = - x j = l + j + i qji -

pi and the state transition matrix

Similarly, denote for i E SS.j E SF'. DApLvt = 1. DiLIfi = DjNl = 0. and the

observation matrix

D = (Dji) , t l tx ,~t . (6.3.4)

Denote also for j E S'. D, = ( D j I , .... DjLvt). and diag(D,) as the matrix

with diagonal equal to Dj, and the remainder of the elements equal to 0.

Denote the net profit rate vector

where Ti = C* - Kipir i E SS.

Finally, we define

Let (0. T! P ) be a complete probability space, ( X t ) , (Y t ) , and ( G t ) be the

(complete) natural filtrations generated by stochastic processes .Yt, I;(, and

both _Yt and x, i.e.,

For an!

xt UX.

i filtration FL, a (&)-stopping tim e T is a random vari abl

R+ u {cc} with {T 5 t ) E 3; for all t E R,.

With the above assumptions and notations. we have the following expres-

sion for the total net profit over time interval (0. t A (1

The optimization problem can now be formulated as follows.

Find an (Yt ) -stopping time T ' . if it exists. maximizing the total expected

net profit over the system's lifetime

where C = {r 1 (Yt) - stopping time).

6.4 Problem Reduction

In this section, we will follow several steps to transform the original optimiza-

tion problem (6.3.9) to the format which is easy to solve.

First, as the objective function of (6.3.9) Zt is not (yt)-adapted7 we turn

to consider the following maximization problem

where Z = E(Zt IYt). is the conditional expectation of Zt with respect to

filtration (Yt), and it is (yt)- adapted.

Directly from the definition of conditional expectation. (see e.g. Elliott

(1992), P3.), we have for r E CY.

EZT = E(E(ZTlyT)) = ~ 2 , . (6 .42)

Therefore, the optimization problems (6.3.9) ! and (6.4.1) are equivalent.

Xow we need the following definition and lemmas to characterize 2,.

Definition 4.1 [Jensen (1959)] A process Z is called a smooth semimartingale

(F-SSM) zf it has a decornposition Zt = Zo + i,' fsds + .\It where ( ft) is a real

progressively measurable process with respect to filtration F , E($ I fs 1 ds) < m

for Vt E R,, E(Zol < m and M = (MJ as a martingale with paths which

are right-contznuoust have left limits and start with Mo = 0. Short notation:

z = ( f : M ) .

Lemma 4.2 [Projection Theorem (Van Schuppen (1977). Bremaud (1981).

and Kallianpur(l980)] Let Z = (1, JM) be an F-SSM, and A = (At) a subfl-

tration of F . T h e n 2 = ( j , M ) is an A-SSM. where

i) . it is A- adapted and it = E ( Z t ]At), Qt E R+:

ii). it is A-progressive with jt = E( f t l A t ) for almost all t E R+: (Lebesgue

measure);

iii). M is A-martingle.

I n addition. if Zo and $; 1 fJds are square integrable and .CI is a square

integrable martingale. then the same properties hold true for the corresponding

t e n s &. 1jJds. and M reespectivelg.

Lemma 4.3 A (YJ-stopping t ime r has the following representation

where a is a n (U,) -stopping t ime, 'H, = CT{(&, I{<>.); k 5 n} , s 5 kL. To

i s o constant and T,, 2 nL is R,-measurable for n 3 1. W e define nrL n 1. Conversely, for any (RJ-s topping time a? conatunt To 2 0 and Rn-measurable

functions T, 2 nL, n = l , 2 . 3 .... T(O: {T, ) ) defined b y (6.4.3) 2s an (YJ-stopping

t ime.

Lemma 4.3 can be proved in exactly the same way as lemma 4.4 in Chapter

4. Intuitively, rr and T, correspond to the preventive replacement between

observation epochs, or at observation epochs, respectively.

With Lemma 4.3, it is clear that the preventive replacement decision is

made only at observation epochs, with r, for immediate replacement, and r~

for planning the preventive replacement before the next observation epoch.

Lemma 4.4 We have the following (yt)-SSM representation for it a-l i-L ( t + L ) L G r . PS

z t = E II I ( ( ~ + ~ I L ~ T 1 lL I - P ~ ( L V ~ ) ~{<,s}ds + m(t), i=O j =O

where P, = E(Is lys ) is obtained iteratively as follows.

Ps = q a l ~ l ~ . E S P ( ( s - [s /L]L)Q) .Vs # iL. and

Proof. First we show that Zt has a (GI)-SSXI representation as follows.

t i = Sol I {z ,=i )~ .~ ,ds + m2 ( t ) ,

= $Is( i )pids+ m2( t ) :

Therefore,

Zt = ~ ~ L $ ~ C i I s ( i ) d ~ - ~ ~ , ~ i $ , L ~ s ( i ) C I X , d ~

= J ~ ~ ~ ~ ~ ( c ~ - ~ ~ K ~ ) I ~ ( i ) d s + m ( i )

= Ji r I,ds + m(t) .

Clearly, from the definition of (y,), we have

where e:, = (0: ..., 0, l ) l , , ~ t .

For kL < t < (k + 1)L.Vk 2 1.

dPt d t - = lim E(It+h - I t l Y ! t / ~ ! ~ ) / h

h+O

= QPt.

Consequently, we have P, = PkL EX P ( ( s - iL )Q) .

For t = kL. Pt corresponds to the Bayesian posterior distribution of the

system state with the prior distribution equal to PC-. and the observation equal

to k;. In fact, For k; = JI i E S S : we have

In matrix form, we have


Q.E.D.

Finally, notice that ( Y J - stopping time T is dominated by failure time

. From the well-known Optional Stopping Theorem (see e.g.. Elliott (1982).

P36.), we have Erh, = Erho = 0. and consequently. the optimization problem

( 6 . 4 . ) is equivalent to the following one, where the martingale part fi, is

removed from it, i.e..

In the next section. we will see that the above problem is ready to be solved

within the Markov framework.

6.5 Optimal Policy

From the proof of Lemma 4.4, we have seen that E ( I t l y t ) forms a piecewise

deterministic process (PDP) with jumps a t discrete times iL, i = 1: 2. ... .As

the consequence of the strong Markov property of PDP (see Davis (1995)):

we may restrict TT and T, to depend only on the posterior distribution a t the

149

last observation epoch without loss of optimality. Therefore. we establish the

dynamic equation as follows,

where we denote V ( P ) as the value function of distribution P.

To further simply the equation, we extend the variable space of the value

function I-' to un-normalized measure space, i.e. IlPll = E:'.!~ P( i ) need not

equal to 1. Define

Then, we have

mau(Vi(qo), V 2 ( q 0 ) ) :

with q, = EXP(sQ)qo For 0 < s 5 L.

Define operator T on function space C:, which is formed by the continuous

functions defined on 1 lqJ I 5 1, and satisfying (6.5.2) as follows,

It is clear that T is a contraction operator. In fact, for 1 lqll = 1:

From this contraction property. we have

V ( q ) = lirn,,,T"(~") ( q ) = limn,, i&+l ( q ) , (6.5.7)

where V1 is defined in (6.5.3).

Consequently. the value function V can be obtained by applying the foi-

lowing algorithm within any accuracy c.

Algorithm.

Step 0. = 0;

Step n. Vn = T(Vn-,);

Stopping Rule. Stop a t K if

Now, with the value function V(q), the optimal policy can be described as

follows:

.At each observation epochs. say time kL E {lL. 2L, ..., nL, ..).

If V(qk ) 5 0. then replace the system immediately.

If bP(qk) > 0, then if VL(qk) 2 CW2(qk), then run the system till

t

r* = nry rnirt(1 rq3ds).

Otherwise, run the system till the next observation epoch. i-e.. time (k + 1 ) L .

Now we are ready to show the convexity property of C'(q), which implies

the convexity of replacement region 0 = {qJL ' (q) 5 O ) , and the convexity of

O2 = {qlV2(q) 0) > 0, which is the region that the system need to be

replaced before the next observation epoch.

Lemma 5.5 CP(q) is a convex function.

Proof. We will use the mathematical induction method.

i). Case K = l . We want to shour that

is convex. In fact: for any positive measures Po, Qo such that I I Po 1 1 . 1 lQo 1 1 5 1.

ii). Case K=n. .Assume the convexity holds for C,(q).

iii). Case K=n+l. We have recursion

The operator "rnau" is convexity preserving operator. 1;: are convex func-

tions from i), ii). J: r ~ d s is linear. and consequently, convex. Finally, d i a g ( D j ) q L

is linear. and therefore conves. Hence. we obtained the convexity of C,+I, i.e..

iv). Case K ---+ +oo. As we have shown, VK + I.' in (6.5.'i), then from

(6.5.12), we get the convexity of the value function V ( q ) .

Q.E.D.

Example.

Suppose a system has 3 working states, and with Q: D and r defined as

follows.

Then, the value function V and the finite-stage value functions C, are shown

in Figure 6.1.

Figure 6.1: Optimal values G',.

Figure 6.2: Optimal preventive replacement policy.

The optimal replacement region and the finite-stage optimal replacement

region are shown in Figure 6.2.

The last part of this section will focus on the optimality of discretized poli-

cies, i.e.. the preventive replacement is not carried out between observations.

Theorem 5.6 Assume that

and the observation is non-trivial, ie., there exists i E S S . j E S' such that

then there ezists an > 0 such that for the system with obseruation interval

L < Z, the optimal policy (for !V 2 2) is a discretized policy, i-e.,

replace the system immediately;

or, run the system till the next observation epoch.

Proof. From the structure of the optimal policy, we notice that the optimal

preventive replacement between observation times occurs only a t T' . i.e.. the

first time such that

P;!' E B, (6.5.16)

where

is clearly an LV - 2 dimensional simplex.

We need to prove that

Once (6.5.18) holds, then for L small enough, the possible loss in one observa-

tion interval 1; rP,ds + 0? and the prior distribution a t the next observation

time is about the same as the present one. Therefore, as long as the value func-

tion is uniformly bounded from zero for P E B, the optimal policy is to run

the unit to the next observation time, and determine whether replace it or not

at that time based on new Y observation.

By the same token, it is clear that we need only to prove that V p B.

156

there exists j . such that

r diag(Dj) P > 0. (6.5.19)

We prove (63.19) by contradiction. Notice that from (6.4.17), xy=l rdiag(D,)P =

xy==, r P = 1. Therefore, we have that V p E B , j E S',

because otherwise. (6.5.19) is true for same j E SY.

But as B is a .V - 2 dimensional simplex in an .V - 1 dimensional linear

space. we have that r and r diag(D,) are parallel for all j E S'. i.e.. V j E SS'

Consequently, V j E S'..

which implies that

This is a contradiction to (6.5.15). which completes the proof.

Q.E.D.

In addition, if the observation matrix is trivial, then it is clear that the

optimal policy is an age replacement policy with preventive replacement time

T* satisfying

T T* = org min rE.YP(tQ) Podt.

This gives the classic age-replacement models a new interpretation as a

trivial case of CBM models.

Summary of Notation

C,: profit rate

D,,: condition probability of the observation y

S: filtration of system state and observation

It: indication function of the system state

K1: failure cost

L: observation interval

P,: estimation of the system state

qji: system state transition rate

ri: net profit rate

Say: system state space

SY: observation space

T: operator that defines value iteration

Cw(P): value function of initial distribution P

\ ( P ) : n-step (optimal) value function of initial distribution P

.&: system state a t time t

X: filtration of the system state

k:: observation at time nL

y: filtration of the observation

Zt: total net profit over up to time t

2,: estimation of the net profit over up to time t

r': optimal stopping time

pi: failure rate at state i

<: failure time

C Y : set of Y-stopping time

Chapter 7

SUMMARY AND FUTURE

RESEARCH DIRECTIONS

7.1 Summary

It is natural to identify the following relationship between maintenance termi-

nologies and mat hemat ical concepts:

Replacement o Stopping Time

Condition Monitoring e Filtration

Information Processing Filtering

CBM Optimization Optimal Stopping.

These correspondences lay the foundation of our mathematical modeling

and optimization for maintenance systems.

A common modeling framework repeat themselves in the models presented

in this thesis. It is our expectation that this common framework, together with

its supporting mathematical techniques will grasp the essence of this subject

better than a pool of concrete models.

Modeling framework

The maintenance models considered in this thesis

following 6 attributes:

1) Time Horizon

2) . Deterioration Dynamics

3). 1Iaintenance .-\c t ions

4). Cost Structure

5). Information Level

6). Optimization Criterion.

can be defined by the

This framework can be also thought as the framework for CBbI without

loss of generality. This is because from CBM's perspective. any maintenance

optimization has to be condition-based.

Optimization procedure

The generic optimization procedure we developed in this thesis can be

summarized by the following 6 steps.

Step 1. X -maximization technique

This technique is applied to average cost and discounted cost criteria for

the following two reasons:

i) . to transform the original objective function to an additive function,

to which results from the optimal stopping theory can be applied.

ii). A large amount of computational complexity is absorbed into the

parameter A. therefore. more insights is gained on the structural results

and less computational effort is needed to obtain numerical results. In

addition, the value of X in the dynamic equation is exactly the optimal

value of the original objective function.

Step 2. Characterization of stopping times of jump processes

This result is developed for general jumping processes in continuous time.

The purpose of its application here is to separate two optimal stopping

problems. failure replacement and preventive replacement, without loss

of opt irnality.

Step 3. Smooth semi-martingale decomposition (SSZVI)

This technique is essentially to separate the informative trend of deterio-

ration with the non-informative randomness (the martingale part), and it

allows one to consider only the trend part without loss of the optimality.

Step 4. Subfiltration SSM projection

This result is used to estimate the true system condition based on the

given information level. Consequently, the maintenance policy with re-

spect to that information level is then optimized based on this estimation

instead of the unobservable system state.

Step 5. Dynamic programming approach

With steps 1-4. the model is equivalently transformed to a Markov model.

and can be solved through dynamic programming approach. Therefore.

the optimal !vIarkov policy is guaranteed to exist and possesses the opti-

malit); in the whole stopping time class.

Step 6. Approximation with Truncated Problem

This is a standard procedure to solve the dynamic programming equation

with continuous time or infinite discrete time horizon,

While not every model needs to go through all the steps, the integration

of these steps provides a general procedure to solve a wide class of problems

using the aforementioned modeling framework.

163

7.2 Future Research Directions

In general, the future reliability/maintenance research will proceed along the

following two mutually-dependent directions:

1. Theoretical development

2. Theorylpract ice interactions

Theoretical development has been the focus of this thesis. One major issue

that was not addressed here is the statistical issue. Certainly, the statistical

issues are of fundamental importance for the maintenance optimization re-

search. The interactions between information processing and control impose

additional challenge. which are of theoretical interests. For relevant research

work related to this aspect. see Pena et al. (2000) on weak convergence of

recurrent and renewal data. and Jensen and Wiedmann (2000) on the analysis

of dependent censoring.

It is natural to think of the adaptive control scheme that merges the statis-

tics and optimization procedures into a recursive one. .-\ recent work by Bur-

netas and Katehakis (1997) found an adaptive control policy that has the

optimal convergence rate to the optimal policy for finite-state Markov Deci-

sion Processes. It would be a very interesting and challenging problem to find

out whether a similar result can be established for the Hidden hlarkov Models.

In addition, the computational issues for HMM are far from trivial, and

there is a definite need for further investigation. It is expected that more

specific structural results can be obtained for HMM optimal stopping problem.

which consequently would reduce the computational burden significantly.

HMhI is appealing also because of its potential of applicability. The fol-

lowing two other practical scenarios. which might also be thought as CBbI in

a broad sense, can be properly modeled by HMLI to certain extent.

1. software reliability/maintenance

2. communication network management.

First. the following duality between software reliabilityfmaintenance and

CBM can be easily identified: profit of software release vs. failure loss: release

time vs. replacement time: reliability growth vs. condition deterioration:

debugging vs. (general) repair: bugs vs. unobservable system defects, etc. I t

is expected that similar modeling framework and optimization techniques can

be applied to this scenario.

In the communication network management scenario, due to the layered,

distributed, and hierarchical nature of networks, the network administrators

constantly face the information of great volume, pet still incomplete, delayed

and error prone. Certain decisions related to controlling and managing the

network have to be made based on the estimation of system's current true

condition instead of the available information directly. Again. the generic

optimization procedure - condition monitoring, information processing, and

decision making, is valid.

Certainly, whether the decision making is an optimal stopping problem or

not depends on the nature of the concrete applications to handle.

Ultimately, the practical application is the driving force for the theoretical

development. But unfortunately. the procedure to transfer the theory to prac-

tice is more complex than it seems to be. Intense additional efforts. including

model interpretation, implementation. verification and validation, are required

to make the optimization models work properly.

Many theoretical developments. such as papers in academic journals or

presentations at conferences! are not immediately available for application.

The most welcome format of these developments from practitioners' point of

view are their implementations as software/hardware systems. So, proper

balance between the efforts of abstraction and user-friendliness have to be

carefully maintained in any researcher's mind.

Bibliography

[I] Arjas E. (1989). Survival models and martingale dynamics. Scand. .I

Statist 16. 177-225.

[2] h e n . T. (1983). Optimal replacement under a minimal repair strategy -

a general failure model. Advances in Applied Probability 15. 198-211.

[3] h e n . T. ( 1996). Condition-based replacement policies-a counting process

approach. Reliability Engineering and System Safety 51. 275-281.

[4] h e n , T. and B. Bergman (1986). Optimal Replacement times - a General

Set-up. Journal o j Applied Probability 23. 432-142.

[5] Bai, D.S. and W.Y. Yun (1986). An age replacement policy with minimal

repair cost limit. IEEE Transactions on Reliability R-35, 452-455.

[6] Bauter. LA.: M. Kijima and 11. Tortorella (1996). h point process model

for reliability of maintained system subject to general repair. Stochastic

Models 12. 37-65.

[7] Barlow R.E.. C. -1. Calarotti and F. Spizzichino (ed.) (1993). Reliabilit y

and Decision Making. Chapman & Hall.

[S] Beichelt, F. (1993). A unifying treatment of replacement policies with

minimal repair. Naval Research Logzstics 40, 51-67.

[9] Berg, M.. h1. Bienvenu and R. Cleroux (1986). Age replacement policy

with age dependent minimal repair. INFOR 24, 26-32.

[lo] Bergman! B. (1978). Optimal replacement under a general failure model.

Advances in Applied Probability 10, 43 1-451.

[ll] Block. H.W.. W.S. Borges and T.H. Savits (1985). Age-dependent mini-

mal repair. Journal of Applied Probability 22, 370-385.

[12] Bremaud, P. (1981). Point Processes and Queues: Mart ingale Dynamics.

Springer-Verlag, Berlin.

[13] Burnetas. A.N. and 4I.N. Katehakis (1997). Optimal adaptive policies

for Markov decision processes. Mathematics of Operations Research 22.

[I41 Chow, Y.S.. H. Robbins. H. and D. Siegmund (1971). Optimal Stopping

Theory. Dover Publication Inc.. New York.

[15] Christer AH.: W. Wang and J.Y. Sharp (1997). A state space condi-

tion monitoring model for furnace erosion prediction and replacement.

European Journal of Operational Research 101, 1-14.

[16] Christer A H . and W. Wang (1995) , -4 simple condition monitoring model

for a direct monitorig process. European Journal of Operational Research

82, 258-269.

[17] Cleroux. R., S. Dubuc and C. Tilquin (1979). The age replacement prob-

lem with minimal repair and random repair costs. Journal o/ the Oper-

ations Research Society of America 2?(6), 1158-1 167.

[18] Dagpunar, J.S. (1998). Some properties and computational results for a

general repair process. Naval Research Logistics 45. 391-405.

[19] Davis. M.H.A. (1993). bIarkov Models and Optimization. Chapman Si

Hall, London.

[20] R. Dekker and P. Scarf (1998). On the impact of optimization models in

maintenance decision making: the state of the art. Reliability Engineer-

ing and System Safty 60, 111-119.

[21] Doksurn, K. (1991). Degradation rate models for failure time and survival

data. C WI Quarterly 4, 195-203.

[22] Drinkwater, R.W. and N.A. J. Hastings (1967). An economic replacement

model. Operational Research Quarterly 18, 12 1-138.

169

Elliott R. (19%). Hidden Markou Models. Springer-Varlag.

Fernandez-Gaucherand, E., A. Arapostat his, and S.I. Marcus (199 1). On

the average cost ooptirnality equation and the structure of optimal poli-

cies for partially observable Markov decision processes. Annals o/ Oper-

ations Research 29, 439-470.

Hartman P. (1964). Ordinary Dzflerential Equations. John Wiley Sr Sons.

Inc., New York.

Halasz, 41.. F. Dub. R. Orchard and R. Ferland (1999). The integrated di-

agnostic system (IDS): remote monitoring and decision support for corn-

mercial aircraft - putting theory into practice. http://ai.iit .nrc.ca/IR-pu blic

/ids /papers/aaai99idspaper.pdf.

Hastings, N.A. J. (1969). The repair limit replacement method. Opera-

tional Research Quarterly 20(3) : 37-349.

Heinrich, G. and U. Jensen (1992). Optimal replacement rules based

different information levels. Naval Research Logistics Quarterly 39 937-

955.

Hernandez-Hernandez, D, S.I. Marcus and P. J. Fard (1999). Analysis

of a risk-sensitive control problem for hidden Markov chains. IEEEE

Transactions on Automatic Control 44, 1093-1 100.

[30] U. Jensen and G.H. Hsu (1993). Optimal stopping by means of point

process observation with applications in reliability Mathematics of Op-

eration Research 18, 645-657.

[3 11 Jensen U. (1996). Stochastic models of reliability and maintenance.

In Reliability and Maintenance of Complex System. Ozekici S . (ed.);

Springer.

[32] Jensen, U. (1989). Monotone stopping rules for stochastic processes in a

semimartingale representation with applications. Optimization 20. 837-

852.

[33] Jensen. U. and d. Wiedmann (2000). Estimation of survival curve under

dependent censoring. . Abstract book of the Second International Con-

ference on Mathematical Methods in Reliability - Methodolodge, Practice

and Interference. Bordeaux, France, July 2000.

[34] Jiang, S. and K. Cheng (1995). On the optimality and comparison of

some standard maintenance policies. Operations Research and Its Appli-

cations. World Publishing Corporation, Beijing.

[35] Jiang X. V. Makis and A.K.S. Jardine (1998). IMA Journal of .Mathe-

matics Applied in Business k Industry 9: 201-210.

[36] Jiang, X., K. Cheng and V. 4Iakis (1998). On the optimality of repair-

171

cost-limit policies. Journal of Applied Probability 35. 936-949.

[37] Jiang, X., V. Makis and A.K.S. Jardine (2001). Optimal RepairfReplacement

Policy for a General Repair Model. Advances in Applied Probability, to

appear.

[38] Kijima, Lf. (1989). Some results for repairable systems with general

repair. Journal of Applied Probability 26, 59-102.

[39] Kijima. hf . ? H. Morimura and Y. Suzuki (1985). Periodical replacement

without assuming minimal repair. European Journal of Operational Re-

search 37. 194-203.

[40] Kijima. .LI. and U. Sumita (1986). A useful generalization of renewal the-

ory: counting processes governed by nonegative Markovian increments.

Journal of Applied Probability 23: 71-58.

[41] Kobbacy K.A.H.. N.C. Proudlove and M.X. Harper (1995). Towards

an intelligent maintenance optimization system. .Journal of the Optimal

Research Society 46, 831-853.

[42] Kumer D. and D. Westberg (1997). Maintenance scheduling under age

replacement policy using proportional hazards modeling and total-time-

on-test plotting. European Journal of Operational Research 99, 507-313.

[43] L'Ecuyer and P.. A. Haurie (1987). The repair vs. replacement problem:

a stochastic approach. Optimal Control Application and Methods 8 , 219-

230.

[44] Love C.E. and R. Guo (1991). Using proportional hazards modeling in

plant maintenance. Quality and Reliability Engineering International 7,

[45] Makis, V. and A.K.S. Jardine (1992a). Optimal replacement in the pro-

portional harzards model. INFOR 30. 172-183.

1461 Makis. V. and A.K.S. Jardine (1992b). Optimal replacement policy for a

general model with imperfect repair. Journal of the Operational Research

Society 43. 111-120.

[47] blakis. V. and A.K.S. Jardine (1993). A note on optimal replacement

policy under general repair. European Journal of Operational Research

69, 75-82.

[48] Makis V., X. Jiang and K. Cheng (2000). Optimal preventive replace-

ment under minimal repair and random repair cost. Mathematics of

Operations Research 35, 111-156.

1491 Mitchell J.S. (1981). An introduction to Machinery Analysis and Moni-

toring. Pennwell Publishing Company? Tulsa, Oklahoma.

Ozekici. S. (ed.) (1996). Reliability and Maintenance of Complex Sys-

tems. Springer.

Park, K.S. (1983). Cost limit replacement policy under minimal repair.

Microelectronics and Reliability 23, 347-349.

Park, K. S. (1985). Pseudodynamic cost-limit replacement models under

minimal repair. Microelectronics and Reliabilzt y 25, 573-579.

Pena. E.. R.L. Strawderman and Y. Hollander (2000). A weak con-

vergence result in recurrent and renewal models. Recent Advances in

Reliability - Methodolodge. Practice and Interference. Limnios Y. and

&I. Nikulin (ed.). Birkhasuser.

Pierskalla. W. and J. Voelker (1976). .-\ survey of maintenance models:

the Control and surveillance of deteriorating systems. Naval Research

Logistics Quarterlg 23. 353-388.

Saaty T. and L. Vargas (1998). Diagnosis with dependent symptoms:

Bayes theorem and the analytic hierarchy process. Operations Research

46, 491-503.

Scarf P. (1997) On the application of mathematical models in mainte-

nance. European Journal of Operational Research 99, 493-506

[57] Scarsini M. and ?vI. Shaked (2000). On the value of an item subject to gen-

eral repair or maintenance. European Journal of Operational Research

[58] Sherif, Y. and iLI. Smith (1981). Optimal maintenance models for systems

subject to failure: a review. Naval Research Logistics Quaterly 38. 47-74.

[59] Shiryayev, A.N. (1978). Optimal Stopping Rules. Springer, New Yorork.

[60] Smallwood S. and E.J. Sondik (1973). The optimal control of partially

observable Markov processes over a finite horizon. Journal of Operations

Research 21 1071-1088.

[61] Van der Duyn Shouten. F.. (1996). Maintenance polices for multicompo-

nent systems: an overview. In Reliability and Maintenance of Complex

System. Ozekici, S. (ed.), Springer.

1621 S tadje, W. and D. Zuckerman (1991). Optimal maintenance strategies

for repairable systems with general degree of repair. Journal of Applied

Probability 28, 384-396.

[63] Stadje, W. (1994)- Maximal wearing-out of a deteriorating system: an

optimal stopping approach. European Journal of Operational Research

73, 472-479.

[64] Taylor, H.M. (l975). Optimal replacement under additive damage and

other failure models. Naval Research Logistics Quarterly 22, 1- 18.

[65] Valdez-Flores, C. and R.M. Feldman (1989). A survey of preventive

maintenance models for stochastically deteriorating single-unit systems.

Naval Research Logistics 36 (4), 4 19-446.

[66] White, D.J. (1989). Repair Limit Replacement. OR Spektrum 11. 143-

149.

[67] Zhang, Z.G. and C.E. Love (2000). A simple recursive blarkov chain

model to determine the optimal replacement policy under general repairs.

Computers 8 Operations Research 27. 32 1-333.

[68] Zhou. H.. L. Qu and A. Li (1996). Test sequencing and diagnosis in

electronic system with decision table. Mzcroelectronics Reliability 36.

116'7-1175.

[69] Zuckerman, D. (1978). Optimal stopping in a semi-Markov shock model.

Journal of Applied Probability 15, 629-634.

modeling and optimization of maintenance · this field with several universal optimization models...

Documents