icarus: design and deployment of a case-based reasoning system for locomotive diagnostics

10
ICARUS: design and deployment of a case-based reasoning system for locomotive diagnostics Anil Varma*, Nicholas Roddy Information Technology Laboratory, General Electric Corporate Research and Development, PO Box 8, Schenectady, Niskayuna, NY 12301, USA Received 1 December 1998; accepted 1 August 1999 Abstract Locomotives, like many complex modern machines, are equipped with the capability to generate on-board fault messages indicating the presence of anomalous conditions. Such messages tend to be generated in large quantities, and are dicult and time consuming to interpret manually. This paper presents the design and development of a case-based reasoning system for diagnosing locomotive faults using such fault messages as input. The process of using historical repair data and expert input for case generation and validation is described. An algorithm for case matching is presented, along with some results on pilot data. # 1999 Elsevier Science Ltd. All rights reserved. Keywords: Case-based reasoning; Locomotives; Diagnosis; Feature extraction; Feature weighting; Fault codes 1. Introduction There is a recent move in industry towards support- ing equipment servicing as a means of augmenting tra- ditional revenue sources such as those generated by equipment sales with limited warranties and sub- sequent parts supply. This is especially applicable in the case of heavy machinery which, due to its design complexity, is often best serviced by the manufacturer. Examples of such cases include gas turbines, aircraft engines and locomotives. With the emergence of long- term service contracts for such equipment, it is essen- tial that the manufacturers minimize the cost of service by proactive on-board and o-board monitoring and diagnosis. Within this context, this paper describes the development of ICARUS (Intelligent Case-based Analysis for Railroad Uptime Support) a case- based reasoning tool for o-board locomotive diagno- sis for use by GE Transportation Systems. Locomotives are complex electromechanical systems, and are equipped with the capability to monitor their state and generate fault messages in response to anom- alous conditions of varying severity. Since removing a locomotive from a track for repair (or powering down a gas turbine or removing an airplane engine from wing) is an extremely expensive and disruptive pro- cedure, it is desirable that 1. problems occurring on the equipment while in oper- ation are accurately identified so the repair can be scheduled in keeping with the severity of the pro- blem, and 2. problems with the equipment are completely ident- ified so the time in the repair shop is utilized not merely in fixing one problem, but in releasing an overall healthy machine. ICARUS was designed to reason with the fault codes generated by locomotives during operation. In ad- dition, it was a requirement that ICARUS should be able to build up its information base quickly for rapid deployment and have the capability to learn as new in- formation became available. It was also required that the tool be applicable across locomotive models and fleets with little modification. This project presented three challenges that were Engineering Applications of Artificial Intelligence 12 (1999) 681–690 0952-1976/99/$ - see front matter # 1999 Elsevier Science Ltd. All rights reserved. PII: S0952-1976(99)00039-1 www.elsevier.com/locate/engappai * Corresponding author. Tel.: +1-518-387-5130; fax: +1-518-387- 6104. E-mail address: [email protected] (A. Varma).

Upload: anil-varma

Post on 05-Jul-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ICARUS: design and deployment of a case-based reasoning system for locomotive diagnostics

ICARUS: design and deployment of a case-based reasoningsystem for locomotive diagnostics

Anil Varma*, Nicholas Roddy

Information Technology Laboratory, General Electric Corporate Research and Development, PO Box 8, Schenectady, Niskayuna, NY 12301, USA

Received 1 December 1998; accepted 1 August 1999

Abstract

Locomotives, like many complex modern machines, are equipped with the capability to generate on-board fault messagesindicating the presence of anomalous conditions. Such messages tend to be generated in large quantities, and are di�cult andtime consuming to interpret manually. This paper presents the design and development of a case-based reasoning system for

diagnosing locomotive faults using such fault messages as input. The process of using historical repair data and expert input forcase generation and validation is described. An algorithm for case matching is presented, along with some results on pilotdata. # 1999 Elsevier Science Ltd. All rights reserved.

Keywords: Case-based reasoning; Locomotives; Diagnosis; Feature extraction; Feature weighting; Fault codes

1. Introduction

There is a recent move in industry towards support-

ing equipment servicing as a means of augmenting tra-

ditional revenue sources such as those generated by

equipment sales with limited warranties and sub-

sequent parts supply. This is especially applicable in

the case of heavy machinery which, due to its design

complexity, is often best serviced by the manufacturer.

Examples of such cases include gas turbines, aircraft

engines and locomotives. With the emergence of long-

term service contracts for such equipment, it is essen-

tial that the manufacturers minimize the cost of service

by proactive on-board and o�-board monitoring and

diagnosis. Within this context, this paper describes the

development of ICARUS (Intelligent Case-based

Analysis for Railroad Uptime Support) Ð a case-

based reasoning tool for o�-board locomotive diagno-

sis for use by GE Transportation Systems.

Locomotives are complex electromechanical systems,

and are equipped with the capability to monitor theirstate and generate fault messages in response to anom-alous conditions of varying severity. Since removing alocomotive from a track for repair (or powering downa gas turbine or removing an airplane engine fromwing) is an extremely expensive and disruptive pro-cedure, it is desirable that

1. problems occurring on the equipment while in oper-ation are accurately identi®ed so the repair can bescheduled in keeping with the severity of the pro-blem, and

2. problems with the equipment are completely ident-i®ed so the time in the repair shop is utilized notmerely in ®xing one problem, but in releasing anoverall healthy machine.

ICARUS was designed to reason with the fault codesgenerated by locomotives during operation. In ad-dition, it was a requirement that ICARUS should beable to build up its information base quickly for rapiddeployment and have the capability to learn as new in-formation became available. It was also required thatthe tool be applicable across locomotive models and¯eets with little modi®cation.

This project presented three challenges that were

Engineering Applications of Arti®cial Intelligence 12 (1999) 681±690

0952-1976/99/$ - see front matter # 1999 Elsevier Science Ltd. All rights reserved.

PII: S0952-1976(99 )00039 -1

www.elsevier.com/locate/engappai

* Corresponding author. Tel.: +1-518-387-5130; fax: +1-518-387-

6104.

E-mail address: [email protected] (A. Varma).

Page 2: ICARUS: design and deployment of a case-based reasoning system for locomotive diagnostics

typical of real-world requirements deviating from text-book theory. First, diagnostic cases for the case basewere not readily available, and had to be constructedby mining historical repair records. Their accuracy wasnot guaranteed, and case validation became an essen-tial activity in itself. Very limited expert knowledgeand time were available to fully validate the cases.Association of fault codes to speci®c repairs was di�-cult due to the standard railroad practice of multiplerepairs on the same visit, as well as some uncertaintyabout the accuracy of the date of repair. Finally, thecontinuous nature of the fault logs made casting a caseas a ®nite feature vector di�cult.

This paper ®rst presents the current operating scen-ario and the nature and availability of the data. Itthen presents details of the process for generatingmeaningful cases. A new feature-extraction andweighting algorithm is described as well as the resultsobtained from its implementation. This work is thenrelated to prior activity in the literature. Finally somelessons learned from the project about CBR designand deployment are discussed.

2. Overview of the current process

Locomotive fault logs are accumulated on-board thelocomotive and are periodically uploaded to a centraldatabase for access in case a diagnostic need arises.Highly skilled ®eld engineers at General ElectricTransportation Systems have acquired expert knowl-edge that enables accurate diagnosis of locomotiveproblems from an examination of the fault log. Whilethis provides positive evidence of the diagnostic signi®-cance of fault logs, the volume of logged data makes itimpossible to rely on human examination alone for re-liable and consistent identi®cation of locomotive pro-

blems on many hundreds of locomotives on a dailybasis.

A case-based approach was considered as this wascognitively closest to the procedure used by the expertsduring diagnosis. A rule-based approach was discour-aged for several reasons. Some of these may be out-lined as follows.

1. Accurate rules existed for only a small percentageof the locomotive's failure modes. The other failuremodes happened in a manner too varied to capturein rules.

2. Frequent con®guration changes and upgrades makerule-based approaches hard to maintain.

3. Capturing knowledge as cases appeared to be thebest approach for maintaining knowledge in aremote diagnostics environment where the diagnos-tic personnel were not necessarily all experts.

4. There was a realization that many more patternsmay exist in the fault log data than any one personwas aware of or could create rules for. There was apreference for a learning approach for identifyingthese from case data.

5. It was desirable to deploy a functional system fairlyrapidly, from concept to pilot operation in less thana year. However, experts had severe time constraintswhile there was considerable historical data thatcould potentially be mined for cases.

The objective of the project, thus was to build a toolthat could take the fault log shown in Fig. 1 as inputand output the top N repair codes with associated con-®dence values.

2.1. Historical fault log and repair data

A sample fault log is shown in Fig. 1. While theactual data constituting the log has been changed or

Fig. 1. Sample locomotive fault log.

A. Varma, N. Roddy / Engineering Applications of Arti®cial Intelligence 12 (1999) 681±690682

Page 3: ICARUS: design and deployment of a case-based reasoning system for locomotive diagnostics

masked, the essential features of the log are present.The ®rst two columns identify the company that isoperating the locomotive and the speci®c locomotiveID, respectively. The next column indicates the dateon which the fault occurred. The fault itself is ident-i®ed by a `fault code', a unique alphanumeric label as-sociated with that fault. Next, a time stamp indicatesthe beginning and end of the fault message. The nextfew columns marked with `X' represent a `snapshot' ofsome important operating parameters of the locomo-tive at the time the fault occurred. Representativeexamples of such parameters would be speed, operat-ing temperatures, pressure readings, whether certainswitches are on or o�, etc. Finally, a short textdescription associated with the fault code is displayed.

The fault log represents an arbitrarily long datastream with new data ¯owing in every day.Locomotive experts intimately familiar with the engin-eering speci®cs of the locomotive are able to look atthe log alone and usually identify patterns that may in-dicate problems. It is worthwhile noting that multiple

problems may be occurring simultaneously in the loco-motive, and will register a presence in the fault log.However, not all problems are critical enough to stopthe basic operation of the locomotive, and these ac-cumulate till they are attended to during a regular ser-vice visit.

Since there was no direct association from individualfault codes to actual locomotive problems requiringrepair, this information was sought to be implicitlyacquired. A source of data used for this purpose wasthe repair database maintained by the manufacturer.The nature of the repair log was as shown in Fig. 2.

The basic operation of the tool required use of thefault log as input, and then recommending a repairaction with associated con®dence. For reference, therewere about 600 distinct faults that could be logged and700 repair actions that could be taken. In addition, therepair actions could be logged in the database up to aweek later than the actual physical repair.

An approach was de®ned wherein candidate caseswould be generated to `seed' the case base by mining

Fig. 2. Sample locomotive repair Log.

Fig. 3. Process for case acquisition.

A. Varma, N. Roddy / Engineering Applications of Arti®cial Intelligence 12 (1999) 681±690 683

Page 4: ICARUS: design and deployment of a case-based reasoning system for locomotive diagnostics

historical fault log and repair records. These caseswould then be minimally validated by format checkingand checking for missing data. It was acknowledgedthat many of these cases could be diagnostically incor-rect. This may be due to the fact that either an incor-rect repair was performed that did not actually addressthe problem that was causing activity in the fault logsor that the repair was incorrectly dated, or that therewere multiple problems not all of which wereaddressed.

3. Process of de®ning and acquiring cases

The ®rst task was to build candidate cases from his-torical fault log and repair records. A program waswritten to interleave the repair log with the fault log.A 2-year data window was chosen for prototype casegeneration, with data gathered for over 200 locomo-tives. The process of raw case generation was as fol-lows.

1. A particular repair type (diagnosis) was chosen forcase collection.

2. All locomotives' repair records were sequentiallyscanned for occurrence of that repair.

3. Every time that repair was encountered on a loco-motive, a case was generated that contained thefault log contents for the N days preceeding thatrepair.

This process is shown graphically in Fig. 3.Each case was labeled by the repair code Ð the

intended diagnosis. Multiple cases were collected foreach repair code to capture the di�erent fault codescenarios leading to that repair being the diagnosis.This process was repeated for many repair codes forwhich case-base coverage was required.

Each case so produced was still beset with certainproblems. These included the following ones.

1. Insu�cient or missing fault log data: since the faultlog is fairly continuous in nature, gaps of manydays in the log indicated missing data rather thanabsence of faults.

2. Multiple repairs on the same day: there were manyinstances of three to ®ve repairs performed on thesame day. The process described above resulted inmany cases with identical fault logs but associatedwith di�erent repairs.

3. Overlapping fault logs for repairs: if there were mul-tiple repairs within N days of each other, theyshared a common portion of the fault log for thoseN days.

For the conditions listed above, heuristics were used toweed out cases that could possibly `contaminate' thecase base. Cases with missing fault logs were elimi-

nated. Due to the large quantities of historical dataavailable, it was possible to restrict case selection toonly those instances where there was only one repairon a given day. Nothing was done to correct for situ-ation 3 since it was reasoned that the e�ect of overlap-ping fault logs would be reduced with appropriate casefeature weighting if su�cient cases for each of theoverlapping repairs were collected.

Each case was reviewed for feasibility by an experi-enced ®eld engineer. The task of the ®eld engineer wasto review and eliminate any case in which the fault logwas obviously not related to the repair performed.This task of saying yes or no placed a lower cognitiveburden on the experts as compared to verifying eachcase as a `gold' standard for that kind of repair.About 500 cases were collected by following this pro-cedure.

4. Feature extraction

Much CBR work has implicitly assumed the avail-ability of a ®nite number of indices by which tocharacterize a case. This is not always true, however,as evidenced by the ultrasonic rail inspection systemapplication reported in (Jarmulak et al., 1997). Thiswas certainly not true of the basic case structure here.The N-day fault log constituting each case could con-tain from zero to an indeterminate number of occur-rences of each fault code. The total number of faultcodes occurring in the case could potentially vary fromone to over 700. It was evident that some featureextraction was necessary to identify indexes that wouldunify the case representation and make case matchingpossible. There were a number of options for featurerepresentation. Cases could be matched or distin-guished by taking into account the following factors.

1. Presence/absence of fault codes.2. Fault code frequency.3. Combinations of fault codes.4. Time-based trends in fault code occurrences, i.e. if

fault code frequency increased leading up to therepair.

5. Anomalous indicators in the parameter data, i.e. ifany of the continuous parameters were out of speci-®cation

6. Sequence information in fault code occurrence, i.e.if fault codes repeatedly occurred in a certainsequence.

However, there was another consideration constrainingthe choice of features. This was the fact that the caseswere constructed by assuming that a certain causal re-lationship existed between fault logs and repairs datawhen their timeline was overlaid. There was no initialevidence as to the degree of error associated with the

A. Varma, N. Roddy / Engineering Applications of Arti®cial Intelligence 12 (1999) 681±690684

Page 5: ICARUS: design and deployment of a case-based reasoning system for locomotive diagnostics

assumptions underlying this approach. For this reason,it was decided to keep feature extraction fairly basictill such a determination could be made.

4.1. Fault cluster generation

Fault combinations were selected as the feature ofchoice for case representation. A variety of tests werecarried out on historical fault log data to determinethe maximum number of fault codes that appeared tooccur repeatedly in combination on a given day. Theanalysis appeared to indicate that more than fourfaults seldom occurred repeatedly in combination intest data. As a result, the following approach wasadopted. Each case was polled for a list of distinctfault codes occurring before the repair with which itwas associated. The list of distinct faults was used togenerate combinations (or `fault clusters') as follows.

Distinct faults contained in case 1: A,B,C,D1-Clusters: A, B,C,D2-Clusters: AB, AC,AD,BC,BD,CD3-Clusters: ABC,ABD,ACD,BCD4-Clusters: ABCD

This process was carried out for all the cases. A masterlist of fault clusters of each size was maintained. Eachcase was now indexed in terms of its features Ðnamely Diagnosis+fault clusters. An example isshown in Fig. 4.

The objective of this exercise was to generate a com-plete list of candidate fault clusters of size four or less.Using this list, the next step was to determine which ofthis exhaustive list of clusters represented valuable re-petitive patterns with diagnostic signi®cance. This com-puter-intensive approach was adopted since there wasvirtually no expert opinion available to guide the selec-tion of diagnostically useful fault patterns. Consideringall possibilities of size four and under allowed theweighting algorithm to consider a wide variety of clus-ter candidates in a reasonable time period.

4.2. Fault cluster weighting

Due to the inexact case creation procedure as wellas the knowledge that there was not a one-to-one map-ping between faults in the fault log and repairs, a pro-

cess was created to assign weights to fault clusters,based upon the cases in the case base. As new caseswere continually being added, the system was designedto operate in two modes. In the `learn' mode, it cali-brated the signi®cance of available fault clusters basedupon all the cases in the case base. During the `diag-nose' mode, it used the weighted clusters as indicatorsto match the features of the incoming case with themost appropriate stored case.

The learn mode involves learning a weightvalue $ [0,1] for each fault cluster. The weight isintended to be representative of each cluster's abilityto isolate a speci®c repair code. If a cluster onlyappears in cases of a speci®c repair code, it has aweight of 1. On the other hand, if that cluster occurswith an evenly distributed frequency in cases of mul-tiple repair codes, its weight is appropriately lowered.A cluster is required to repeat a certain number oftimes before it is assigned a non-zero weight. Afterweight assignment, clusters below a certain weightthreshold are assigned a weight of zero. The process isshown in pseudo code below.

Generating [1±4] sized combinations of all faults in acase results in generating a lot of candidate fault clus-ters. In practice, thresholding resulted in only a smallpercentage of fault clusters emerging as signi®cant inthat they had non-zero weights. This was consistentwith the approach of examining a wide variety ofoptions to learn the features that were signi®cant. Ingeneral, the number of single and double fault clustersthat emerged as signi®cant was larger than the numberfor three- and four- sized clusters. The average weightthough followed the inverse relation. Three- and four-sized clusters had higher average weights than one-

Fig. 4. Case representation with fault clusters.

A. Varma, N. Roddy / Engineering Applications of Arti®cial Intelligence 12 (1999) 681±690 685

Page 6: ICARUS: design and deployment of a case-based reasoning system for locomotive diagnostics

and two-sized clusters. When faults appeared repeat-edly in three- and four-sized combination they wereusually strong indicators of diagnostic signi®cance.The weight assignment can approximately be recog-nized as the maximum conditional probability of anyrepair for a given fault cluster over all the repairs itoccurs before.

Weight of Fault Cluster F clusteri

� Max j�P�Repairj=F clusteri��Ricci and Avesani (1995) and Howe and Cardie (1997)propose a similarity metric that varies in a context sen-sitive manner. In the application here, this may be rel-evant to ¯eets with the same basic engine type butdi�erent system con®gurations, and consequently adi�erent fault code pattern. A probabilistic indexingapproach for CBR is suggested in (Faltings,1997).

5. Case matching

Once the weights for fault clusters had beenacquired, case matching was straightforward. A newdiagnosis was requested by identifying the locomotivethat was experiencing problems. The fault log databasewas queried for fault codes occurring in the N dayspreceding the diagnosis request.

The degree of match between a new and stored casewas calculated as

�S Weights of common clusters between stored and new case�2�S Weights of Clusters in stored case� � �S Weights of Clusters in new case�

The repair code associated with the case with the high-est degree of match was the diagnosis returned by thesystem.

6. Case validation

The case base currently contains cases for diagnos-ing over 80 repair codes. The number of cases associ-ated with each repair code varies from three to 70.Leave-one-out testing was performed to test the per-formance of the case base. In this process, one casewas removed from the case base and formed the test-ing set. All other cases, as part of the training set,were used to learn fault cluster weights. The case-basewas then used to match and retrieve the top threerepair codes in response to the left-out case. If therepair code associated with the testing case was in thetop-3 set, the diagnosis was declared a success. Thisprocess was repeated with every case in the case basebeing the testing set once.

The accuracy of the case base was then tabulated assuccess % by repair code. There were a few repaircodes where the case-base was consistently unable tocorrectly diagnose even in the top three predictionsmore than 10±15% of the time. The cases associatedwith these repair codes were referred back to thedomain experts. In most cases it was discovered thatthe repair codes were such that the fault codes couldnot be expected to predict them. In other instances,

Fig. 5. Increase in diagnostic accuracy over repair codes with increasing number of cases.

A. Varma, N. Roddy / Engineering Applications of Arti®cial Intelligence 12 (1999) 681±690686

Page 7: ICARUS: design and deployment of a case-based reasoning system for locomotive diagnostics

the same repair situation was classi®ed under threedi�erent repair codes. Once these were uni®ed, the ac-curacy of diagnosis on that repair code increased. Theproblem of case-base maintenance and dealing withcon¯icting cases is discussed in Racine and Yang(1997).

This process of case validation was a necessary clo-sure to the initial approach of gathering cases thatwere approximately accurate. This avoided requiringthe time of domain experts to verify each case in thebeginning. Now only a focused number of cases wererequired to be examined, that did not appear to beconsistent with each other.

7. Results of experiments

Accuracy was measured only for repair codes thathad over 10 cases associated with them. After remov-ing repair codes deemed undiagnosable through theprocess in the previous section, accuracy on repaircodes ranged from 23 to 94%. Overall accuracy wasaround 80%, assuming that the correct diagnosisappearing in the top three diagnoses given by the sys-tem was regarded as a success. In general, the follow-ing trends were observed.

1. Repair codes associated with a greater number ofcases had a higher diagnostic accuracy rate. This re-lationship is shown in Fig. 5.

2. Accuracy increased as fault clusters of increasingsize were used. The relative increase in accuracy wassmall but signi®cant. Accuracy using fault clustersof size 1 was about 60±65%.

The repair codes for which cases were collected wereprioritized on the basis of their relative frequency ofoccurrence. The data showed the proverbial 80-20behavior where a few repair codes constituted the ma-jority of the repair types. Based upon this information,the diagnostic coverage of ICARUS over the repaircode space is shown in Table 1.

8. Related work

There are quite a few examples of CBR beingapplied to diagnosis. This section discusses a few that

have addressed problems similar to the task at hand,from an application as well as a design viewpoint. Aninteresting comparison of the CBR approach in gen-eral with discriminant analysis is presented in(Nakhaeizadeh, 1993) where an inductive CBR algor-ithm provided a 30% increase in accuracy whenapplied to the task of gearbox quality diagnosis. Ageneral framework for constructing troubleshootingcase-bases is provided by Trott and Leng (1997).Deters (1995) focuses on diagnosing multiple indepen-dent faults and proposes an incremental retrievalapproach to address it. An early system is reported byDattani et al. (1996), where CBR is applied to guidingrepair actions for British Airways aircraft by matchingto fault messages. Jarmulak et al. (1997) present a sys-tem that uses CBR for a rail inspection application.Their system uses image data as input and shares thelimitation that this is not easily expressed as a featurevector. They use a hybrid rule + case-based systemfor image classi®cation with no adaptation. They alsomention the need to periodically identify cases in thecase base that never match well as possibly `bad' casesor noisy images. Acorn and Walden (1992) report onSMART Ð a CBR help-desk system developed forCOMPAQ. In contrast to the semi-automated, miningapproach to creating cases described above, thisdescribes a more conventional case-population process,where senior engineers were designated as case-builderswith a daily review process. Correctness of the cases isnot an issue. However, their observations that theycould go `live' before having a complete and correctknowledge base is in agreement with the authors' ex-perience as well. Hennessy and Hinkle (1992) reporton CLAVIER Ð one of the early commercial CBRapplications. A key aspect motivating its developmentis described as the inability of the operators to articu-late good rules. This, again, is consistent with theauthors' experience. CLAVIER's role as corporatememory that increases in quality and quantity with useis very much in line with the role expected ofICARUS. Kitano et al. (1993) with the SQUAD sys-tem highlight the role of CBR systems as maintainersof corporate knowledge. Since they report dealing withover 20,000 cases, they describe a well-managed,human-intensive process for case collection, ®lteringand quality control. They use a system of abstractionhierarchies to create neighborhood relationships

Table 1

Diagnostic coverage of CBR tool

All cases Repair codes with >10 cases each Repair codes with >30 cases each

Number of repair codes diagnosed 80 16 5

Total % of yearly problems caused by this set of repair codes 75% 50% 32%

A. Varma, N. Roddy / Engineering Applications of Arti®cial Intelligence 12 (1999) 681±690 687

Page 8: ICARUS: design and deployment of a case-based reasoning system for locomotive diagnostics

between attributes. Interestingly, they also appear to

use a system of combination generation. The motiv-

ation, however seems to be to enumerate a sql-type

query for each type of attribute value below in the

hierarchy from where the user has speci®ed the attri-

butes. Bonzano et al. (1997) describe an approach

towards `introspective' learning of feature weights in

CBR, recognizing that standard CBR matching func-

tions can be extremely sensitive to noise and irrelevant

features, and suitable weight vectors are not always

available. While this concept is recognized in the

approach in this paper, we do not make an e�ort to

adjust feature weights in response to incorrect retrie-

val. This arises from the premise that the error could

lie with the case itself, and consequently consistent

incorrect retrieval is used as an indication of a defec-

tive case rather than incorrect feature weighting.

CBR applications in similar domains have been

reported under the INRECA project (Klaus-Dieter et

al., 1995). INRECA uses induction to extract a de-

cision tree to guide the user but uses CBR to handle

unknown values. An application of INRECA to robot

diagnosis uses a combination of causal rules, decision

trees and weighting factors for knowledge represen-

tation. This is consistent with future development

plans for ICARUS, where a hybrid rule/case based

system is eventually envisaged. In reporting a related

application to steam turbines, Georgin et al. (1995)

also use the KATE software, utilizing induction. In

another application to CFM56 engines, use of legacy

data to create initial cases with ongoing integration of

model based knowledge is described. A concurrent

bene®t of the case-building activity mentioned here is

creation of a knowledge management process that

helps highlight high-occurrence failure modes through

a systematic cleaning and analysis of data. This has

been the case with ICARUS as cost-bene®t analysis of

fault generation from a diagnostic point of view is

being revisited with a goal of generating more mean-

ingful faults on the locomotive.

Netten and Vingerhoeds (1995) describe an appli-

cation of CBR to fault diagnosis on board trains. The

problem description appears to be close to the context

of the work here. The authors use a network structure

to represent the symptom-fault relationships, and prior

probability factors to guide the diagnosis. ICARUS,

being an o�-board system estimates conditional prob-

abilities from data and converts them into feature

weights. Stehr (1995) uses CBR in conjunction with de-

cision tree induction and neural networks for extract-

ing domain knowledge from cylinder-head design

cases. The author notes the relative ease with which a

CBR approach can deal with dynamic data, which is

typical of engineering systems.

9. Institutional deployment and cost-bene®t analysis

ICARUS development was started in early 1998. A

®rst prototype was deployed on pilot fault log data

from 35 locomotives in October of the same year. A

team of ®ve diagnostics experts independently examine

the fault logs daily and arrive at a diagnosis. These

conclusions are compared with the output of

ICARUS. This constituted the validation phase of the

tool. An integrated recommendation was delivered to

the railroad, based on this activity. The primary ben-

e®t of identifying problems is that the locomotive can

be better scheduled for repair, and unscheduled fail-

ures leading to a mission loss are avoided. In most

instances, a recommendation is kept open until feed-

back is obtained from the repair shop as to the actual

work done. If this feedback corresponds to the top

three repair recommendations of the CBR tool, the

case is closed and declared as a success. If not, then it

is classi®ed as a failure. One ®eld engineer at GE

Transportation systems has been assigned the primary

responsibility of running and validating the tool each

day.

Both successful and failed attempts at diagnosis are

examined in greater detail by the tool design team,

including the authors. In many cases, experts point

out that certain problems are not well manifested in

fault logs, and cases relating to these repairs are

removed from the case-base. The primary driver for

seeking expert input is when the case base is unable

to predict a repair code at a greater than 50% accu-

racy despite having more than 10 cases for it. This

focused approach helps to minimize expert time

requirements.

Some early successes in diagnosing problems in the

pilot program have helped management to allocate

increasing resources to the project. The biggest bene®t

is that new incoming cases (that are not mined from

historical data, but are based upon daily analysis) are

of much higher quality due to expert validation, and

have contributed to increasing the accuracy of the

tool. It is estimated that a savings of a few thousand

dollars could be realized per locomotive per year just

by optimizing its repair actions based upon an accu-

rate understanding of the failures occurring on board.

Over 600 locomotives are expected to be monitored in

1999±2000; this adds up to a considerable sum. There

is a considerable productivity bene®t as well, as a

limited sta� of up to 10 experts will be required to

monitor the 600 locomotive ¯eet and tools like

ICARUS can considerably reduce the e�ort required

for diagnosis.

A. Varma, N. Roddy / Engineering Applications of Arti®cial Intelligence 12 (1999) 681±690688

Page 9: ICARUS: design and deployment of a case-based reasoning system for locomotive diagnostics

10. Discussion

A number of lessons were learned in the course ofthis project. Some of these may be listed as follows.

1. The availability of high-quality cases cannot betaken for granted. In complex domains especially, itbecomes increasingly di�cult to ®nd expertise thatwill certify cases as fully correct. In this applicationthe tool designers were especially mandated not torely on `preconceived' expert knowledge to guidethe case base development Ð but rather to learnfrom the data. Many experts freely acknowledgedthat there was possibly much more hidden in thedata than they had expertise over.

2. Case representation can be a design issue. In mostcase-based applications, the identi®cation of a casefeature vector arises naturally from the way the caseexists. Out of many possible features that couldcharacterize the cases in this application, fault com-binations were one of the features identi®ed asbeing potentially signi®cant and were chosen forcase representation. Feature weighting was able toconsider and eliminate a majority of fault clustersas being diagnostically insigni®cant.

3. The assumption that each case contained data as-sociated with only one diagnosis was not valid inthis case. The extent to which multiple, simul-taneous problems were being manifested in the faultcodes was not known. Again, this was addressedpartly by choosing candidate cases carefully andsubsequently by feature weighting. Only fault clus-ters that consistently occurred in cases of a particu-lar repair code were assigned high weights by thefeature-weighting algorithm.

4. Commonly recognized advantages of a case-basedreasoning approach like quick deployment, capa-bility for continuous learning, low knowledge-elici-tation needs and a measure of explanation, stoodtrue in this application. The system was developedin about eight months and is currently running onlive pilot data. While the initial case seeding wasbased on historically mined cases, future cases thatare being added are of much higher quality sincethe ®nal resolution of problems will be accuratelytracked and veri®ed from the ®eld. The portabilityof the case-based approach was vividly demon-strated once the system had been developed.Business focus required that ICARUS be applied toa model of locomotives di�erent from the data onwhich the system was developed. In less than amonth, new seeding cases were acquired, featureweights were recomputed and another version ofICARUS was released. This was in comparison to arule-based approach which would have requireddevelopment from scratch. For reference, previous

rule-based approaches for locomotive models hadtaken few years to develop. Incremental learningwas especially important in this application.Locomotives undergo frequent hardware and soft-ware changes. A case-based approach could adaptto this, given su�cient quantity of cases.

5. Learning weights for case features can help makeimplicit knowledge explicit. In many cases, a physi-cal explanation could be attached to particularfaults occurring together. This provided an ad-ditional means to occasionally check the knowledgein the case base.

6. As yet, there is no concept of adaptation in theworking of ICARUS.

7. Finally, the ability to come up with a working,albeit incomplete, system early on was vital in main-taining management and user involvement in theapplication.

11. Conclusions and future work

ICARUS will be deployed for providing monitoringsupport to over 600 locomotives in 1999±2000. Asmore cases are added to the system, the authors areexploring additional features by which to characterizecases to sustain and improve the system's accuracy.Preliminary results incorporating fault occurrencetrends as case features have shown evidence of posi-tively impacting diagnostic accuracy. As can beexpected, this too has a stronger e�ect on certainrepair codes as compared to others.

In conclusion, this paper has presented a case-basedapplication for diagnosis that employs a variety of pre-and post-processing techniques to transform historicaldata into a format suitable for a case-based approach.It presents a `propose and verify' approach towardscase generation, where candidate cases of approximatediagnostic accuracy are generated and case perform-ance metrics are used to isolate cases that may needexpert validation. Feature weights are learned from thedata. The application is such that a complete andaccurate diagnostic formulation with any approach ispractically impossible. Experience has shown that acase-based approach has been able to contribute sig-ni®cantly towards capturing a tractable amount ofknowledge, even if approximately, and consequentlyreducing the load of the diagnostics expert.

Acknowledgements

The authors would like to acknowledge DavidGibson at GE Transportation Systems for businessand data support as well as domain expertise.

A. Varma, N. Roddy / Engineering Applications of Arti®cial Intelligence 12 (1999) 681±690 689

Page 10: ICARUS: design and deployment of a case-based reasoning system for locomotive diagnostics

References

Acorn, T., Walden, S., 1992. SMART: Support management auto-

mated reasoning technology for Compaq customer service. In:

Proceedings of AAAI-92. AAAI Press, MIT Press, Cambridge,

MA.

Bonzano, A., Cunningham, P., Smyth, B., 1997. Using introspective

learning to improve retrieval in CBR: a case study in air tra�c

control. In: Leake, D., Plaza, E. (Eds.), Case-Based Reasoning

Research and Development, Lecture Notes in Computer Science,

1266. Springer-Verlag, New York, pp. 291±302.

Dattani, I., Magaldi, R.V., Bramer, M.A., 1996. A review and evalu-

ation of the application of case-based reasoning (CBR) technol-

ogy in aircraft maintenance. In: Macintosh, A., Cooper, C.

(Eds.), Applications and Innovations in Expert Systems IV.

SGES Publications, UK, pp. 189±203.

Deters, R., 1995. Case-based diagnosis of multiple faults. In: Veloso,

M., Aamodt, A. (Eds.), Case-Based Reasoning Research and

Development, Lecture Notes in Computer Science, 1010.

Springer-Verlag, New York, pp. 411±420.

Faltings, B., 1997. Probabilistic indexing for case-based prediction.

In: Leake, D., Plaza, E. (Eds.), Case-Based Reasoning Research

and Development, Lecture Notes in Computer Science, 1266.

Springer-Verlag, New York, pp. 611±622.

Georgin, E., Bordin, F., Loesel, S., McDonald, J.R., 1995. Case-

based reasoning applied to fault diagnosis on steam turbines. In:

Watson, I. (Ed.), Progress in Case-Based Reasoning, Lecture

Notes in Computer Science, 1020. Springer-Verlag, New York,

pp. 175±184.

Hennessy, D., Hinkle, D., 1992. Applying case-based reasoning to

autoclave loading. IEEE Expert 7 (5), 21±26.

Howe, N., Cardie, C., 1997. Examining locally varying weights for

nearest neighbor algorithms. In: Leake, D., Plaza, E. (Eds.),

Case-Based Reasoning Research and Development, Lecture

Notes in Computer Science, 1266. Springer-Verlag, New York,

pp. 455±466.

Jarmulak, J., Kerckho�s, E., Veen, P., 1997. Case-based reasoning in

an ultrasonic rail-inspection system. In: Leake, D., Plaza, E.

(Eds.), Case-Based Reasoning Research and Development,

Lecture Notes in Computer Science, 1266. Springer-Verlag, New

York, pp. 43±52.

Kitano, H., Shimazu, H., Shibata, A., 1993. Case-method: a method-

ology for building large-scale case-based systems. In: Proceedings

of the Eleventh National Conference on Arti®cial Intelligence,

pp. 303±308.

Klaus-Dieter, A., Auriol, E., Bergmann, R., Breen, S., Dittrich, S.,

Johnston, R., et al., 1995. Case-based reasoning for decision sup-

port and diagnostic problem solving: the INRECA approach. In:

Bartsch-Sporl, B., Janetzko, D., Wess, S. (Eds.), Proceedings of

the Third Workshop of the German Special Interest Group on

CBR, pp. 63±72.

Nakhaeizadeh, G., 1993. Learning prediction of time series. A theor-

etical and empirical comparison of CBR with some other

approaches. In: Richter, M., et al. (Eds.), Proceedings of

EWCBR'93. Springer-Verlag, Berlin.

Netten, B.D., Vingerhoeds, R.A., 1995. Large-scale fault diagnosis

for on-board train systems. In: Veloso, M., Aamodt, A. (Eds.),

Case-Based Reasoning Research and Development, Lecture

Notes in Computer Science, 1010. Springer-Verlag, New York,

pp. 67±76.

Racine, K., Yang, Q., 1997. Maintaining unstructured case bases. In:

Leake, D., Plaza, E. (Eds.), Case-Based Reasoning Research and

Development, Lecture Notes in Computer Science, 1266.

Springer-Verlag, New York, pp. 523±564.

Ricci, F., Avesani, P., 1995. Learning a local similarity metric for

case-based reasoning. In: Veloso, M., Aamodt, A. (Eds.), Case-

Based Reasoning Research and Development, Lecture Notes in

Computer Science, 1010. Springer-Verlag, New York, pp. 301±

312.

Stehr, J., 1995. CBR and machine learning for combustion system

design. In: Veloso, M., Aamodt, A. (Eds.), Case-Based Reasoning

Research and Development, Lecture Notes in Computer Science,

1010. Springer-Verlag, New York, pp. 98±108.

Trott, J.R., Leng, B., 1997. An engineering approach for trouble-

shooting case bases. In: Leake, D., Plaza, E. (Eds.), Case-Based

Reasoning Research and Development, Lecture Notes in

Computer Science, 1266. Springer-Verlag, New York, pp. 178±

189.

A. Varma, N. Roddy / Engineering Applications of Arti®cial Intelligence 12 (1999) 681±690690