[ieee ieee 34th annual spring reliability symposium, 'reliability - investing in the...
TRANSCRIPT
RELIABILITY and FINANCIAL ASPECTS
of WEAR OUT IN ELECTRONIC SYSTEMS
by JOHN PETER ROONEY
SENIOR MEMBER, I E E E
THE FOXBORO COMPANY FOXBORO, MASSACHUSETTS
ABSTRACT
Today, there is a worldwide emphasis on the bottom line.
Financial interests dominate most corporations. The major
determinant for the approval of any project is its expected
financial success. Despite protestations that quality has
been re-vitalized in American corporations, the success of
many American corporate presidents and executives is measured
principally in financial terms.
This universal emphasis on finance forces the reliability
practitioner to consider more than just the theoretical or
technological aspects of reliability. Financial matters
affect the practice of the reliability discipline. This is
particularly true when a system is in wear out. Financial
matters must be considered.
This paper presents the experiences of a reliability
practitioner in dealing with both the financial aspects and
the reliability requirements of computer control systems used
in the process control industries as these systems reach
end-of-life. It is not intended that this paper be a
reference work on financial matters; rather it is intended to
show how the present emphasis on financial considerations has
affected the reliability discipline in the process control
plants. Sufficient details of financial matters are provided
to illustrate their impact on the tasks normally performed by
a reliability practitioner. This paper examines roles for
reliability analysts in the economic arena forced upon them
by these changing economic conditions.
- 1. FINANCIAL CO CERNS of the 1990s
So far, the last decade of the century and the millennium
has been a time of turmoil and uncertainty, as old and
established firms disappear through consolidations and mergers. Many established older firms have been gathered
under the aegis of a financial holding firm. Holding firms
stress financial performance, with an emphasis on recent
successes. The bottom line dominates today's corporations ;
the emphasis is on profits. Particularly in American
corporations, chief operating officers ( C E O s ) or presidents
are rewarded for reducing corporate expenses while
maintaining or increasing sales, so that the corporation's
profit is increased. Profit the key measure of success.
While American firms emphasize profits, they pay lip service
to quality. For example, Godfrey and Kolesar state:
The failure of American competitiveness is largely a failure to manage quality, while the key to Japanese success is an almost compulsive and fanatical attention to managing for quality.[*]
Despite a demonstrated need to compete on the quality
front, American firms have assigned higher priority to the
financial aspects of running the firm; their policies show
that financial matters are given higher priorities than the
quality aspects. Numerous corporate financial practices show
the higher priority assigned to financial aspects; three
practices in particular have significant impact on the
reliability of computer systems in end-of-life, or
approaching it. These are: (1) Expenses Reduction (2 ) Outsourcing ( 3 ) Downsizing
Expenses reduction is a brief title to summarize today's
corporation's efforts to reduce all expenditures by a certain
percent each year. Thus, the corporation's financial manager
may elect to examine each category of expense, assign a
reduction percentage goal, (sometimes rather arbitrarily) and
judge the performance of the department or division by its
success in achieving the percent reduction. I f a holding
company has acquired the refinery or mill, the percent
reduction goal is often set arbitrarily in a board room in London or New York.
Certainly, an annual reduction in expenses can be
considered a laudable goal. The human condition being what it
is, the tendency is f o r budgets to grow year by year. This
annual reduction has immediate concern for the reliability
analyst working with a computer system which might be
reaching end-of-life.
Maintenance expenses are often the prime candidate for
this annual exercise in reduction of expenses. Maintenance
expenses are often expended arbitrarily and could be more controllable, as in some years, there is surplus in the
maintenance account, and, in some years there is a deficit.
Implicit in an annual percent reduction in maintenance
budget is the assumption that the failure rate of equipment
per unit of time will be constant. For example, the
assumption may be that the each month's maintenance cost will
be approximately the same. This assumption means that the
budget analyst has considered the equipment to be in the
chance failure rate area, or useful life area, of the bathtub
curve. This constant failure rate means that an average
failure number of modules will fail per unit of time (say, monthly) and therefore, an annual percent reduction can be
applied to the maintenance budget.
There are two statistical fallacies in this assumption.
First, this policy does not effectively consider that an
average means that many modules may fail in the first month
of the period, while few modules may fail in the last month
of the period. Over the entire year, modules may therefore
fail at an average rate, but expenses may vary dramatically
from fiscal period to fiscal period. Implicitly, the
financial analyst expects the same number of failures each
month (or time period). Second, the system's modules may actually be in the wear out portion of the bath tub curve.
This means that the failure rate will be increasing as the
modules grow older.
Assumption of Average.
This assumption of an average number of failures per unit of t i m e is similar to the budget process f o r snow removal in
a New England town. After a record-breaking snow fall in the
Winter of 1993-1994, a relatively small amount of snow fell
in the next season, the Winter of 1994-1995. Boston received
slightly more than 14 inches of snow.
Under the conditions of many corporations' expense
reduction programs, the surplus in the budget for the Winter
of 1994-1995 would mean that the budget for 1995-1996 season
would be cut by the maximum percentage rate. This maximum cut
would be based on the fact that the entire budget was not
consumed in the previous year, so therefore, obviously, the
budgeted amount of money was not needed to perform the
mission. The expense budget for the next year would be
reduced accordingly.
Nature did not cooperate. Snowfall in the Winter of
1995-1996 again broke the record (more than a 100 inches in
the City of Boston), and most Massachusetts towns' snow
removal budget was expended before the season was half over.
This is an example of the difficulties in weather prediction
in the United States and how difficult it is to model
seasonal precipitation.
difficulty of adjusting a budget according to the averages of
a naturally occurring phenomena.
I3, * I This anecdote illustrates the
Constant Failure Rate.
The second fallacy is the implicit assumption that the
failure rate of the modules in the system is constant over
time, i.e. the modules are in their chance failure rate area
or useful life area. In actuality, some of the modules may
already be exhibiting an increasing failure rate, which
indicates that they are reaching end-of-life. This wear out
condition will be masked by the use of the maintenance budget
as the key indicator to the health of the system.
Outsourcing: contracting with a firm outside the
corporation for the performance of certain routine services.
Outsourcing is a common policy in today's corporations.
Corporations make the conscious decision to eliminate
in-house service functions such as janitorial services,
landscaping and snow removal and cafeteria services. In turn,
the corporation purchases these services from an outside
contractor, whose speciality is the efficient delivery of the
services. Outsourcing can remove a substantial number of
workers from the employee rolls. This reduction in force
saves the corporation the cost of employee benefits, such as
health care, for each employee no longer needed.
The corporation gains in two financial measures: (1) profitability, as the profits per employee will go up
(2) reduction in some corporate costs as benefits for
employees is reduced.
Some sectors of the process control industries, (e.g. oil
refineries), have shown a readiness to out-source maintenance
services. If the maintenance personnel no longer work
directly f o r actual customer, the burden on the reliability
analyst is increased. Records of repairs and downtime of the
system are usually no longer kept at the Customerfs site, but
may be maintained at a remote location, if the records are
kept at all. If the records are not kept, then a valuable resource for demonstrating the achieved reliability of the
system is l o s t . If records are kept remotely from the actual
location of the computer system, then generally the records
are not being analyzed by a reliability analyst. If the records are not b e i n g analyzed from a reliability
viewpoint, the opportunity to identify wear out trends early
is lost. The reliability effort has become reactive rather
than proactive.
Downsizinq is the intentional elimination of positions at a corporation through attrition or layoffs. Downsizing is integrally intertwined with outsourcing, and both may occur
simultaneously. There are usually two common features with
attrition due to downsizing:
(1) early retirement for the older workers, and
(2) not filling open positions when an individual quits.
There is a particularly insidious impact on the work of
the reliability practitioner when attrition means the
elimination of older workers. Typically, these older workers
have been servicing the computer control system for literally
decades. They grew up with the system, knew what problems
were encountered in the infant mortality period of the
system's life and they helped to maintain the system through
its useful life. Therefore, these older workers have become
intimately familiar with the history of incidents and repairs
on the computer control system. This intimate knowledge means
that the older workers are able to diagnose the majority of
difficulties fairly rapidly. This rapid diagnosis is a
function of the amount of experience the accumulated by the
older workers.
If, due to a policy of attrition, the older workers leave
a process plant, then the time required to diagnose and
resolve a problem will increase. The plant's management can
interpret this increase in repair time as either a growing
inefficiency in the maintenance department or a decreasing reliability in the process control computer system. If the
increase in repair time can be attributed to the learning curve which new workers must experience, then attrition has
surreptitiously affected the perceived reliability of t he
system.
Management's perception will be that the system requires
longer to repair. If system downtime has occurred and
increased due to inexperienced repair personnel, management's
perception will be that the system has achieved lower availability. Financial factors have affected reliability.
In the factory or mill, the emphasis on financial
successes has resulted in a reduction in staff, which, in
turn, means a reduction in the emphasis on reliability. Capital Budset vs. Operatincl Budqet: f o r the sake of
completeness, one last financial aspect, which affects
systems in wear out, should be considered. For tax purposes and following general accounting practices, budgets are
usually broken down into expenditures of funds for capital
and expenditures of funds for expenses. Capital is mainly
defined as plant and equipment. Capital consists of the
stores, factories, mills and buildings throughout corporate
America. The equipment used to run a mill or factory is also
considered capital. Interest is defined as the return on
investment which is paid to the owners of the capital.
Funds designated for the purchase of capital equipment
are accounted for in the capital budget. The capital budget
accounts for funds used for the purchase of equipment,
buildings and land, while the Operating budget accounts for
funds used to provide material which is normally consumed in
the course of production. In terms of the family automobile, the money spent to
purchase the car is the capital expense, carried in the
capital budget, while the operating expense includes expenditures on gasoline, oil, tires, etc, which are required
to keep the car running.
- 2 . RELIABILITY and WEAR OUT
Wear out is the last portion of the classical bath tub
curve. In the wear out portion of the bath tub curve, or
end-of-life, the equipment exhibits an increasing failure
rate with time. P.D.T. O'Connor describes wear out as the increasing failure rate due to a ' I . . . a failure mode which
does not occur for a finite time, then exhibits an increasing
rate of occurrencett and is tl...characteristic of fatigue
brought about by strength deterioration due to cyclic
loading. It 1 For a many reasons, process control plants are notorious
for using computer control systems beyond their expected
design life. Some of these reasons are the remoteness of
site, international trade barriers and contentment with the
s ta tus quo. "If it's not broken, don't fix it!'' The process
control industries include chemical plants, oil refineries,
textile and paper mills and food and drug manufacturing
plants.
In a similar fashion, there are other diverse factors
which require government agencies to use computer systems
beyond their expected life. Correspondingly, there are many
government programs concerned with the aging of electronic
and other systems. Government agencies must properly address
the maintenance of aging systems. Chockie and Bjorkelo cite
four such programs in their 1992 paper.l6I
U . S . Air Force B-52 Bomber Program
U . S . Navy Ballistic Submarine Program
U . S . Commercial Aviation Industry
Japanese Nuclear Power Industry
These four examples show how diverse wear out is: from
aircraft above in the skies to submarines below in the sea.
i
I
Wherever electronic computer systems are used, certain components or modules reach end-of-life before the other
components o r modules. In a complex computer control system,
modules or components which are heavily stressed are
generally the first to reach wear out. Further, some
components have an inherent end-of-life failure mechanism which means those components will reach a relatively early
end-of-life. Components which have both an early end-of-life
failure mechanism and a usage in a heavily stressed condition
will be among the first to reach end-of-life. An example of
such a component is an aluminum electrolytic capacitor.
Experience has demonstrated that aluminum electrolytic
capacitors, acting as ripple filtering elements in large
power supplies, have both an early end-of-life mechanism and
exposure to heavy stresses. These aluminum electrolytic
capacitors were the first to reach end-of-life in a process control analog computer As a function of the local
ambient temperature, the expected age for end-of-life is
about 10 years. In actuality, the aluminum electrolytic
capacitors were achieving a life ranging between 10 to 18
years.
The case of the aluminum electrolytic capacitors
illustrates the general principles to be followed:
(1) Identify critical components or modules.
(2) Track the field performance of these modules
or components. ( 3 ) Use maintenance records as feedback on both the
critical modules and the general range of modules
in the system.
(4) Compare the performance of the general population
of modules/components with the performance of the
modules/components in the Customer's plant.
(5) Deal with modules or components which are not
performing as well as the general population.
(6) Use Weibull plotting to ascertain if the selected
modules or components are in the wear out mode, i.e.
the shape parameter, f3, greater than 1.0: Wear Out
Area.
Any good reliability textbook will present the method of
rank ordering the data and plotting on Weibull probability
paper. Interestingly, a good reference on Weibull plotting is a book on burn-in.r8I In a two-parameter Weibull, the shape parameter, l3, may range in value from less than unity to
large positive numbers. Any value of jj greater than unity
means that the data show a unit in wear out. Other values of
the shape parameter correspond to various mathematical
functions; for example, when & = 3.44, the Weibull function
approximates the Normal function, so if & = 3.44, then the
extensive theory on the Normal curve can be applied to
develop probability of successful operation of the module
over time.
Experience in the process control industries shows that
electronic control systems are being used beyond their design life. For example, the specified design life for a analog
process control computer system was ten years; at the date of
this writing, some of the older systems have been in use for
some 24 years.
Experience in the process control industries shows that
equipment reaches end-of-life at different times, even though the system start-up date was the same.
All equipment entered service on the same date. Given the
same inception date, some modules will reach wear out sooner
than others due to:
(1) the nature of the components in the module
( 2 ) the kind and extent of stresses imposed upon the
module and
( 3 ) the amount of human intervention with the module.
In general, the greater the amount of human intervention,
the quicker the module will reach end-of-life.
- 3. FINANCIAL ASPECTS of WEAR OUT
In an ideal world, the end user of the control system
would track the fall out of the various modules, determine if
and when the modules are exhibiting an increasing failure
rate, and take remedial action before the overall
availability of the system was affected. This is proactive, not reactive. This means that remedial action would prevent
t h e increasing failure rate of the system's modules from
becoming excessive. Remedial action could mean the
replacement of the module or refurbishment of the module. In the actual world, financial matters have been given priority over reliability issues, and modern corporations have lost
their reliability edge.
In this paper, t h e logical approach of handling the
financial aspects of wear out under the three headings of: (1) Expenses Reduction ( 2 ) Outsourcing ( 3 ) Downsizing
A brief anecdotal section on capital budgets versus
expense budgets is included for completeness.
Expenses reduction generally assumes a constant failure
rate for the modules in t h e process control system.
The financial manager will attempt to reach the percent
reduction goal in each fiscal. period. With a system with
modules reaching end-of-life, reliability theory states that
failure rate for certain modules will not be constant with
time. Their failure rate will be increasing. With an
expenses reduction program in place, the first indication of
wear out will be increased expenditures for older equipment.
By its very nature, the category of increased expenditures
is a l a g g i n g indicator. Therefore using the repair expenses
ledger has an indicator of wear out means that the
corporation is taking a reactive rather than a proactive
stance. The overall reliability and availability of the
system can be threatened.
If inflation alone is considered, the price of the
replacement module is often many times greater than the
original equipment price. Remember that a substantial number
of the systems were purchased in the late 1970s. Financial analysts check records and the nhighll 1990 price, alone, is
often enough to inhibit the purchase of the replacement
module. Recall that the financial person is measured by
goals of percent reduction in expenses.
Outsourcinq: contracting with a firm outside the
corporation for the performance of certain routine services.
This corporate policy, while good for the bottom line, means
that the work for the reliability analyst is typically
increased. This increase in work is due to the necessity of
contacting another group (the outside contractor) and
examining records at another site. Often enough, the process control plant, e . g . the oil refinery, will eliminate the
corporate reliability person (Downsizing) and will rely
entirely on the outside contractor for reliability analyses.
All equipment entered service on the same date. Given the
same inception date, some modules will reach wear out sooner
than others due to:
(1) the nature of the components in the module (2) the kind and extent of stresses imposed upon the
module and
( 3 ) the amount of human intervention with the module.
In general, the greater the amount of human intervention,
the quicker the module will reach end-of-life.L71
- 3. FINANCIAL ASPECTS of WEAR OUT
In an ideal world, the end user of the control system
would track the fall out of the various modules, determine if
and when the modules are exhibiting an increasing failure
rate, and take remedial action before the overall
availability of the system was affected. This is proactive,
not reactive. This means that remedial action would prevent
the increasing failure rate of the system's modules from
becoming excessive. Remedial action could mean the
replacement of the module or refurbishment of the module. In
the actual world, financial matters have been given priority
over reliability issues, and modern corporations have lost
their reliability edge.
This paper handles the financial aspects of wear out
under three headings:
(1) Expenses Reduction
( 2 9 Outsourcing ( 3 ) Downsizing
A brief anecdotal section on capital budgets versus
expense budgets is included for completeness.
Expenses reduction generally assumes a constant failure
rate for the modules in the process control system. The
financial manager will attempt to reach the percent reduction
goal in each fiscal period. With a system with modules
reaching end-of-life, reliability theory states that failure
rate for certain modules will not be constant with time.
Their failure rate will be increasing. With an expenses
reduction program in place, the first indication of wear out
will be increased expenditures for older equipment. By its
very nature, the category of increased expenditures is a lagging indicator. Therefore, using the repair expenses
ledger as an indicator of wear out means taking a reactive
rather than a proactive stance. The overall reliability and availability of the system can be threatened.
If inflation alone is considered, the price of the
replacement module is often many times greater than the
original equipment price. Remember that a substantial number
of the systems were purchased in the late 1970s. Financial
analysts check records and the llhighgl 1990 price, alone, is
often enough to inhibit the purchase of the replacement
module. Recall that the financial person's success is
measured by goals of percent reduction in expenses. Outsourcinq: contracting with a firm outside the
corporation for the performance of certain routine services.
This corporate policy, while good for the bottom line, means
that the work for the reliability analyst is typically
increased. This increase in work is required by the necessity
of contacting another group (the outside contractor) and
examining records at another site. Often enough, the process
control plant, e.g. the oil refinery, will eliminate the
corporate reliability person (Downsizing) and will rely
entirely on the outside contractor f o r reliability analyses.
When the original equipment manufacturer is called in,
typically when the maintenance budget is lfbustedll due to an"
increased failure rate, the manufacturer's reliability
analyst will have to deal with two different groups:
(1) the owner of the system suffering wear out and
(2) the outside contractor performing the maintenance.
This divided attention means that two groups have to be
convinced that the equipment is in end-of-life and remedial
action has to be approved by the two different groups.
Further, the outside maintenance contractor typically has
contracts with many different process control plants. This
may mean that trends and early indicators of wear out may be
ignored by the contractor. Reliability analyses will suffer.
Downsizinq, the intentional elimination of positions at a
corporation, results in the loss of experienced personnel.
These experienced personnel are more efficient in the solving
of troubleshooting problems, and therefore can reduce the
expenses of such actions. Because of this reduced efficiency,
it appears to the financial managers that the reliability of
the system is degrading due to:
(1) increased time required to repair the system and
(2) possible concomitant increase in downtime.
With the first wave of failures of equipment in wear out,
say, the power supplies, the financial analysts will express concern with the budget-busting increase in expenses. Power
supplies are expensive. Without a reliability crew, more
failures of the power supplies just be considered a
statistical quirk. Reliability analysts would view the
increasing failure as an early warning indicator of wear out.
Due to the reliance on financial records as an indicator of the health of the system, the entire program has become
reactive. Financial records are lagging records-it is
inherent that tracking the costs of repair actions and
replacement modules must lag the actual events by a
substantial amount of time. Due to outsourcing, downsizing and expense reduction, the
remaining maintenance personnel at the process control plant
may not even be aware that the equipment is exhibiting an
increasing failure rate. End-of-life issues would be
ignored!
CaDital Budqet vs. Operatinq Budset: Investment in a
system or module is limited by the corporation's financial
rules. Financial rules, especially if a holding company is
involved, often require an exaggerated high potential rate of
return on investment. Financial managers may require, for
example, a rate of return on investment which is equal to
more than twice the corporation's cost of capital.
Refurbishment: if a proactive team approach has
discovered that modules are experiencing an increasing
failure rate, then remedial action is indicated. With the
analog computer systems of my experience, the regulated power
supplies were often the first to reach end-of-life. Rather
than replacing the entire supply, the most cost effective
method of re-vitalizing the supply was to replace the
aluminum electrolytic capacitors which filter the ripple
voltage. Replacement of the capacitors is considered
refurbishment.
Is refurbishment is a capital expense or an operating
expense?
If a capital expense, the purchase of the refurbishment
services and the equipment involved may be delayed by the
Byzantine capital investment rules. If an operating budget
expense, the cost of such a large service will generally
overrun the allocated operating expense budget. Neither
situation is easy to handle.
Migration: Many process control computer manufacturers
have plans to migrate from the older, analog process
computers to the versatility of modern, digital process control computers. These migration schemes permit replacement
of the control portion of the analog system while permitting
the input/output modules to remain untouched. Experience has
shown that the new equipment required for migration can be
accounted for in either the capital equipment budget or the
operating expenses budget. It depends upon the creativity of the analyst.
- 4 . CONCLUSIONS
This paper has demonstrated that the wear out of older
process control systems must be considered from both the
reliability dimension and the financial viewpoint. In
today's highly competitive world village, the success of any
replacement or refurbishment project will depend upon correctly dealing with both aspects of the situation.
REFERENCES
Ralph Estes, The Tyranny of The Bottom Line, (New
E 2 ] A. Blanton Godfrev and Peter J. Kolesar. "Role of
York, The Free Press, 1995)
Quality in Achieving Worla-Class Competitiveness1', in Gettin the U.S. Back on Track, Edited by Martin K. Starr, (New York: W. W. Norton. 1988) p. 216
L31 David Laskin, Bravinq the Elements, The Stormy History of American Weather, (New York, Bantam Doubleday Dell Publishing Group, 1996)
The Story of 1816, The Year Without A Summer, (Newport, Rhode Island, Seven Seas Press, Inc. 1983)
[ * I Henry Stommel & Elizabeth Stommel, Volcano Weather,
r 5 1 Patrick D.T. O'Connor, Practical Reliability Enqineerinq, John Wiley & Sons, New York, 1981, pp.44-45.
r61 Alan Chockie and Kenneth Bjorkelo, "Effective Maintenance Prac-fices to,Manage System Agingf1, P r o c . Ann. Reliability & Maintainability Symposium, 1992, pp 166-170.
Proc. 33rd IEEE Ann. Spring Reliability Symposium, Boston Section IEEE, April 20, 1995
L 8 1 Finn Jensen & Niels Erik Petersen, Burn-in, An Enqineerinq Approach to the Desiqn and Analysis of Burn-in Procedures, John Wiley & Sons, New York, 1982, pp.44-45.
c71 John Peter Rooney, "Wear Out In Electronic Systems"
rpspi
BIOGRAPHY
John Peter Rooney is a Consulting Reliability Engineer in the Corporate Quality Assurance Group, at The Foxboro Company, where he began work in June, 1970. The Foxboro Company, 88 years old, was one of the winners of the first Massachusetts Quality Award, (the Armand V. Feigenbaum Award) and is also IS0 9001 Certified.
Mr. Rooney, a Native New Yorker, received a B.E.E. from Manhattan College in 1965. In 1969, he was awarded an M . S . in Electrical Engineering from Newark College of Engineering, now New Jersey Institute of Technology. Mr. Rooney has worked on an MBA at Boston University, and is presently working on his MA in History at Bridgewater State College. John first appeared before IEEE Annual Spring Reliability Symposium in 1981. In 1993 and again in.1996, John seryed as an Examiner for the Massachusetts Quality Award. He is a member of the American Catholic Historical Association and a Senior Member of the IEEE. He is also a veteran, United States Navy.