towards adaptive scheduling of maintenance for cyber

16
Towards Adaptive Scheduling of Maintenance for Cyber-Physical Systems Alexis Linard and Marcos L. P. Bueno Institute for Computing and Information Sciences Radboud University Nijmegen P.O. Box 9010, 6500 GL Nijmegen, The Netherlands {A.Linard,M.Bueno}@cs.ru.nl Abstract. Scheduling and control of Cyber-Physical Systems (CPS) are becoming increasingly complex, requiring the development of new tech- niques that can effectively lead to their advancement. This is also the case for failure detection and scheduling component replacements. The large number of factors that influence how failures occur during operation of a CPS may result in maintenance policies that are time-monitoring based, which can lead to suboptimal scheduling of maintenance. This paper investigates how to improve maintenance scheduling of such com- plex embedded systems, by means of monitoring in real-time the crit- ical components and dynamically adjusting the optimal time between maintenance actions. The proposed technique relies on machine learning classification models in order to classify component failure cases vs. non- failure cases, and on real-time updating of the maintenance policy of the sub-system in question. The results obtained from the domain of printers show that a model that is responsive to the environmental changes can enable consumable savings, while keeping the same product quality, and thus be relevant for industrial purposes. Keywords: model-based scheduling, predictive maintenance, machine learning, cyber-physical systems 1 Introduction Due to the growing complexity of Cyber-Physical Systems [11,19], many tech- niques have been proposed to improve failure detection and scheduling compo- nent replacement [17,21]. Indeed, new needs in terms of reliability and safety have appeared with the new applications of such systems. That is the reason why leading-edge technology manufacturers seek to design more robust and re- liable systems [12]. Currently, a major issue in many industrial settings is how to correlate failure occurrences and the maintenance actions performed in order to prevent breakdowns. Modeling the failure behavior of many components in advance is an intricate task. Indeed, we claim that maintenance actions are fre- quently scheduled with fixed intervals that are suboptimal and implemented to the detriment of productivity and efficiency.

Upload: others

Post on 04-Dec-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards Adaptive Scheduling of Maintenance for Cyber

Towards Adaptive Scheduling of Maintenance

for Cyber-Physical Systems

Alexis Linard and Marcos L. P. Bueno

Institute for Computing and Information SciencesRadboud University Nijmegen

P.O. Box 9010, 6500 GL Nijmegen, The Netherlands{A.Linard,M.Bueno}@cs.ru.nl

Abstract. Scheduling and control of Cyber-Physical Systems (CPS) arebecoming increasingly complex, requiring the development of new tech-niques that can effectively lead to their advancement. This is also thecase for failure detection and scheduling component replacements. Thelarge number of factors that influence how failures occur during operationof a CPS may result in maintenance policies that are time-monitoringbased, which can lead to suboptimal scheduling of maintenance. Thispaper investigates how to improve maintenance scheduling of such com-plex embedded systems, by means of monitoring in real-time the crit-ical components and dynamically adjusting the optimal time betweenmaintenance actions. The proposed technique relies on machine learningclassification models in order to classify component failure cases vs. non-failure cases, and on real-time updating of the maintenance policy of thesub-system in question. The results obtained from the domain of printersshow that a model that is responsive to the environmental changes canenable consumable savings, while keeping the same product quality, andthus be relevant for industrial purposes.

Keywords: model-based scheduling, predictive maintenance, machinelearning, cyber-physical systems

1 Introduction

Due to the growing complexity of Cyber-Physical Systems [11, 19], many tech-niques have been proposed to improve failure detection and scheduling compo-nent replacement [17, 21]. Indeed, new needs in terms of reliability and safetyhave appeared with the new applications of such systems. That is the reasonwhy leading-edge technology manufacturers seek to design more robust and re-liable systems [12]. Currently, a major issue in many industrial settings is howto correlate failure occurrences and the maintenance actions performed in orderto prevent breakdowns. Modeling the failure behavior of many components inadvance is an intricate task. Indeed, we claim that maintenance actions are fre-quently scheduled with fixed intervals that are suboptimal and implemented tothe detriment of productivity and efficiency.

Page 2: Towards Adaptive Scheduling of Maintenance for Cyber

Exclusively relying on experts to build models that can describe the behaviorof machines has been recently recognized as a limiting feature [14, 23]. There-fore, the use of machine learning techniques to construct such models has beeninvestigated [3, 10, 16, 18]. However, using such techniques in order to update themaintenance scheduling of a CPS in real time has so far not been explored. Themain difficulties in this case relate to finding an appropriate predictive model,and then defining a procedure for updating the timing conditions. Predictivemodels can be used instead of costly sensors intended to provide informationabout the state of the machine at any moment, which is an additional reasonwhy we introduce machine learning techniques to maintenance scheduling. Theultimate goal would be to develop embedded systems capable of dynamicallyscheduling their own maintenance. To that end, we aim to define a procedure forupdating in real time when maintenance should be performed. Our work is basedon an experiment carried out in partnership with industry, specifically in the do-main of printers. In addition, we consider the scope of automatic maintenance,where the intervention of human beings is no longer needed.

In this study, we investigate to what extent machine learning can help toimprove fixed maintenance scheduling of complex embedded systems. The con-tributions of this paper are as follows. First, we propose using machine learningtechniques, in which the embedded system learns to distinguish between failurevs. non-failure cases using data related to critical components of the CPS. Thiscan be done by monitoring critical components in real time. Next, we proposean algorithm to dynamically adjust the timing of maintenance actions. Thisalgorithm uses timed automata [2], which is the formalism used to model themaintenance policy. Indeed, timed automata can provide an intuitive represen-tation of the maintenance policy and its real-time update is proceeded by usinginformation on the overall printer at any moment. Thus, our main contributionsare (a) the use of decisions from a data-driven model to dynamically sched-ule maintenance, and (b) the use of timed automata to formally describe andanalyze the proposed algorithm. Naturally, we only select relevant features todetermine if the printer is working properly – that is to say, if we can sched-ule the maintenance actions later – or not. To that end, we consider a set ofrealistic, industry-based scenarios and simulations to provide evidence that areduced amount of maintenance can be done while achieving similar productquality [8, 9]. The considered scenarios have been implemented in Uppaal [4],which is another relevant practical-oriented contribution of this paper.

The remainder of this paper is organized as follows. In Section 2 we presentthe industrial problem that motivated our approach. In Section 3, we definethe key concepts associated with model-based scheduling and classification tech-niques used to separate the failure from the non-failure cases of a Cyber-PhysicalSystem, as well as discuss the related literature. In Section 4 we explain our ap-proach to updating when to trigger maintenance and the experiments done, usinga model-checking tool and data about large-scale printers. Finally, we discuss theresults.

Page 3: Towards Adaptive Scheduling of Maintenance for Cyber

2 Case Study

Large-scale printers are cyber-physical systems made of a large number of com-plex components, the interaction of which is often challenging to understand.Among their main components are the printheads, which are composed of thou-sands of printing nozzles. These are designed to jet ink on paper according tospecifications concerning, for example, jetting velocity and direction. During theoperation of these industrial printers, nozzles can behave inadequately with re-spect to the demanded task, e.g. by jetting incorrect amounts of ink or jettingin an incorrect direction. If that is the case, a nozzle is considered to be failing.

Failing nozzles can be repaired by performing one or more maintenance ac-tions, including for example different types of cleaning actions. The maintenanceactions are executed automatically, e.g. the printer cleans its own nozzles. In thiscontext, determining the appropriate moment to execute each nozzle-relatedmaintenance action is crucial to achieving a proper balance of conflicting objec-tives, such as productivity, machine lifetime, and final product quality. However,the number of individual nozzles, their physical architecture, the substantialnumber of variables that can potentially be correlated to them and the defini-tion of printing quality, make particularly difficult to manually construct modelsthat express all the potentially relevant correlations among these variables andnozzles. Ultimately, this creates a challenge when designing maintenance poli-cies, since they are intended to be either too conservative or tolerant, otherwiseone or more of the mentioned requirements could be seriously degraded.

A solution that is sometimes used to construct a policy consists of performinga large set of tests involving different usages of the CPS, aiming to derive time-based maintenance policies. These policies are based on monitoring, hence theydo not suffer so much from the drawbacks of fixed-time-based strategies [17].However, these policies rely on time counters that are not directly related tothe state of machine parts, e.g. the time elapsed since the last finished printingtask and the period that the printer stands idle. This implies that the onlyfailure behaviors that can be explicitly captured by these rules are those thatwere previously seen during the tests, usually only accounting for a fraction ofall possible machine statuses. Hence, unseen failure behavior cannot be handledproperly by such maintenance strategies, which is likely since these machinescan be used with a wide range of parameter combinations. In other words, suchpolicies are prone to perform blindly on at least some situations, which can leadto too many or too few maintenance actions.

Nowadays, industrial printers record large amounts of data about the stateof their components over time, so a data-driven approach seems feasible. A data-driven approach can provide evidence that helps to decide on whether the currenttime parameters are adequate or not, given the current state of the CPS. In orderto represent time parameters and system states, a state machine appears to bean appropriate formalism. In this study we show how to dynamically update theparameters of a maintenance scheduler, which is based on timed state machines.This formalism is used in order to capture the intuition that the CPS movesbetween different states during its operation, in a time-dependent manner. A

Page 4: Towards Adaptive Scheduling of Maintenance for Cyber

significant advantage of combining a statistical approach and state machines isthat the real-time updating of timed parameters requires no human intervention,since correlations between potentially relevant variables can be learned algorith-mically. In addition, the involvement of a failure behavior model is substantiatedin our case, since there is no sensor available to directly provide, at any moment,information on the state of the nozzles. That is the reason why such a modelcould provide the desired outcome whenever required.

3 Background

In this section, we present the different concepts on which our definition of anadaptive scheduler is based. First, we describe how to model and verify mainte-nance schedulers. Then, we discuss the machine learning techniques used in ourpaper and to what extent the need for them is relevant for real-time updatingof scheduling. Finally, we discuss related literature.

3.1 Modeling Maintenance Strategies

State machines are abstract machines with a wide range of applications such as inprocess modeling, software checking and pattern matching. They are composed ofa set of states and transitions between states. They can be used in our industrialcase study, that is to say modeling maintenance actions schedules of CPS, sinceit is possible to gather as many states as different maintenance actions plusone or more states representing when no maintenance action (MA) is beingperformed in the CPS. Transitions between the states would take place whena given maintenance is started and completed [1]. As shown in Figure 1, themaintenance policy of a system can be represented intuitively by means of aspecific type of state machine: a timed automaton (TA).

stby ma1ma2

[time between two ma1]

[time between two ma2] [duration of ma1]

[duration of ma2]

Fig. 1: Maintenance policy of a component represented by a simplified TA.

A timed automaton [2] is a finite-state machine extended with a finite set ofreal-valued clocks constraints. During the execution of a timed automaton, clockvalues increase at the same speed. A timed automaton has also clock guards,that enable or disable transitions and constrain the possible behaviors of theautomaton. Furthermore, when a transition occurs, clocks can be reset.

Page 5: Towards Adaptive Scheduling of Maintenance for Cyber

The particular TA presented in Figure 1 is a way of modeling a given main-tenance strategy. In our case, we assume that maintenance actions are executedsequentially. In order to establish that the derived maintenance strategy achievesan optimal trade-off, many techniques exist, including model-checking of real-time systems [9]. We used the tool Uppaal [4] to evaluate time-monitoring-basedmaintenance strategies. More details about how this tool was used are presentedin Section 4.2.

3.2 Classification Techniques

This study relies on the use of classification techniques [13], also known as su-pervised learning, belonging to the field of Machine Learning. These techniquesconsist of learning models from data. They are designed to classify instancesinto a set of possible classes, according to the values of the attributes of eachinstance. To that end, we used Weka [29], a suite of implemented learning algo-rithms. Among the possible classifiers that can be used for the classification task,we considered in particular: bayesian networks, naives Bayes classifiers, decisiontrees, random forests and neural networks.

The main reason for the choice of the classifiers listed above is related to ourcase of study. Indeed, most of them can be considered as white box classifiers, inthe sense that it is easy to interpret their provided outcomes. This is particularlyimportant in the context of industrial cases, because such a readable model canbe more insightful than a black box oracle.

In order to learn a classifier from log data, labeled data from each possi-ble class is needed. In the case of learning the failure behavior of the CPS ofinterest (i.e. nozzles of large-scale printers), we assume that nozzles are eitherfailing or not failing. Thus, each instance is composed of a number of featurescorresponding to relevant machine components and the class feature indicatingthe occurrence of a failure or not.

actual→f !f

predicted↓

f A B!f C D

Pf =A

A+ CRf =

A

A+BP!f =

B

B +DR!f =

B

C +D

Fig. 2: Confusion matrix and evaluation criteria for the classes failure (denotedby f) and non-failure (denoted by !f).

In machine learning, an important goal when learning classifiers is that ofproperly generalizing to new data. To meet this goal, overfitted models should bedetected, since they tend to perform very well on the data used during learning.This issue is often dealt by using the n fold cross-validation evaluation. Cross-validation with n folds consists of first dividing the dataset into n disjoint sets,then learning the model on the n−1 first folds and evaluating the learned model

Page 6: Towards Adaptive Scheduling of Maintenance for Cyber

on the last fold. Then, learning is done on folds 2 to n and evaluation is done onfold 1, and so on. The part of the data used to learn a model is usually calledtraining data, while the part used to evaluate it is usually called test data. Oneach fold, a classifier is typically evaluated by comparing the predicted classprovided by the classifier and the original class, as seen on each instance fromthe test set. The results are placed in the confusion matrix shown in Figure 2,and from this matrix the metrics precision P and recall R are computed. Afterprocessing all the folds, the mean of precision and recall is taken, correspondingto the final result of cross-validation.

Fig. 3: The process of collecting log data for a CPS component, learning classifiermodels, and using the outcomes as input to reschedule maintenance.

Once a classifier has been trained, it is possible to use it in order to classifynew unlabeled instances. The process is as follows: in real-time, new data aboutthe same features used to train the model are recorded and represent the stateof the CPS at instant t. Such state of the system corresponds to an instance tobe classified by the model. Once the outcome is provided, a decision is taken bythe real-time updating method controlling the schedule of maintenance actions.The overall process of learning and using the classifier in our case is describedin Figure 3.

3.3 Related work

Extensive research has been done in the field of timed automata [2], including al-gorithmic and computability aspects [5, 7] and model-checking oriented research[9]. To a lesser extent, research has been developed in the context of passivelearning of TA from data [28], and learning sub-classes of TA known as eventrecording automata [15]. In the context of CPS, real-time online learning of

Page 7: Towards Adaptive Scheduling of Maintenance for Cyber

TA has been investigated [22], however not related to predictive maintenance.With regard to predictive maintenance, timed automata were applied to modelthe operating modes in the case of duration of tasks in manufacture scenario[27]. Statistical approaches to model the failure behavior of CPS have alreadybeen considered [20, 24], but none of them combined the use of TA to modelmaintenance actions together with machine learning models.

4 Proposed Method and Experiments

In this section, we describe our method and the results of our experiments.Indeed, our hypothesis is that less maintenance can be done, and fewer or asmany failures can occur. Hereinafter, we present the protocol observed and thedata used in order to dynamically schedule nozzle-related maintenance1.

4.1 Learning Nozzle Failure Behavior

The first step towards dynamic scheduling of nozzle maintenance actions is tobuild a failure behavior model describing how the nozzles of printers are likely tofail. In order to do so, we rely on all the logs of a set of 8 printers of the year 2015.The main advantage of the logs we have at our disposal is that a lot of metrics andnozzle-related factors are monitored continuously. Moreover, we also dealt withlabeled data with data representing the nozzles at moments when they wereconsidered as failing or not. The failing measure is the conjunction of severalmetrics related to the print quality. We stress again the key role of a failurebehavior model, since labeled data are costly to obtain: the model reproducesthe outcome provided by printing test pages, a process that inevitably leads toa loss of productivity on the overall printer. We thus trained the correspondingmodel by selecting relevant features. To this end, we benefited from the expertiseof engineers related to the field, who indicated to us the possible relevant features.

In Table 1, we present the results achieved for failure detection (f). We statethe results in terms of precision and recall. Precision (P ) represents the pro-portion of failure cases as correctly classified. Recall (R) reflects the proportionof caught failure cases among the cases of failures and non failures. We alsopresent the results achieved for the other class (non-failure – !f). Both resultsare important since both outcomes are used by our scheduler i.e to advance orpostpone maintenance. The classifiers have been trained with 117k instances toclassify, the same features, and evaluated with a cross-validation of 10 folds. Theset was divided into 10% of instances belonging to the failure class and 90% tothe non-failure class.

In our experiments, we consider the results above as good enough to beconsidered as reliable. Moreover, we trained several classifiers (among others,decision trees, naive Bayes classifiers, neural networks, etc.), and the decisiontree always performed the best. A decision tree is an interesting way of modelingthe failure behavior of a component thanks to its high understandability. As a

1 Experimental data available upon request.

Page 8: Towards Adaptive Scheduling of Maintenance for Cyber

Classification Algorithmf !f

P R P R

Decision Tree 0.788 0.631 0.951 0.977Random Forest 0.626 0.579 0.943 0.953Bayesian Network 0.683 0.809 0.973 0.949Naive Bayes 0.329 0.424 0.918 0.882Multilayer Perceptron 0.652 0.224 0.903 0.984

Table 1: Quality of the best classification models trained with the data.

consequence, we consider that we can safely schedule nozzle-related maintenanceactions using the outcomes of the built Decision Tree.

In this case, we have used the J48 implementation of the C4.5 algorithm tolearn it [25]. This algorithm builds a model from the training set using informa-tion entropy. It iteratively builds nodes choosing the attribute that best splitsthe current sample into subsets, using information gain. The attribute havingthe greatest information gain is chosen to make the decision, that is to say tobe used as the next decision node. Of course, once a subset is only composedof instances belonging to the same class, no further node is created, but a leafinstead, standing for the final class of all the instances belonging to the subset.

The fact that a decision tree can be implemented easily by a succession ofif conditions is another reason to consider it as premium model for industrialpurposes. The lower quality of the results for the class failure is due to the lownumber of instances belonging to this class.

Fig. 4: Nozzle failure rate in function of the time since last MA.

4.2 Scheduling Nozzle-Related Maintenance Actions

The idea of dynamic scheduling of maintenance actions is to rely on the outcomeof a predictor (component failure behavior model that classifies the system atinstant t into two classes, failure or not failure) to put forward or backward themoment when to trigger it.

Page 9: Towards Adaptive Scheduling of Maintenance for Cyber

Algorithm 1 Dynamic Scheduling of a Maintenance Action

– q : query made periodically– y : the reducing factor for the timing of the MA– z : the increasing factor for the timing of the MA– tσ : the timing where the MA is usually triggered– currentBoundary : the current timing when the MA will be triggered– MAXBoundary : the maximum acceptable time to wait until triggering the MA

1: for each q do

2: prediction← failureBehaviorModelQuery()3: if prediction = failure then ⊲ Advance the MA schedule4: currentBoundary ← currentBoundary × y%5: else ⊲ Postpone the MA schedule6: currentBoundary ← currentBoundary × z%

7: if currentBoundary is reached then

8: triggerMaintenance()9: currentBoundary ← tσ OR (currentBoundary + tσ)/2

As presented in Algorithm 1, we first of all consider the actual maintenancepolicy of the printer. Our idea is to query the classification model built previ-ously in order to get the outcome desired. In case a failure is detected by ourclassification model, then the timing for all the maintenance actions triggeringis decreased with a given rate y. Otherwise, if no failure is detected in the fewmoments before maintenance actions are triggered, then the timed boundary toenable a maintenance to occur is postponed with a given rate z. Assuming theuse of a Decision Tree as classifier, we can notice here that the parameters y andz standing for how much to increase or decrease the maintenance clock guardscan be function of the confidence factor provided with each outcome performedby the classifier. This confidence factor is based on the error rate of each leaf inthe tree, hence it consists in a metric to assess how a provided outcome can beconsidered as reliable or not.

Finally, we distinguish two possibilities in our algorithm concerning how toreset the timed boundaries once the related maintenance has been performed.The first one consists in resetting the clocks guards to their initial values, e.g. de-fined in the original TA inferred from the specifications. The second one consistsin resetting the future timed boundary by computing the average between thelast used limit and the actual moment when the maintenance has been launched.In such a way, we assume that after several runs of the algorithm, the value com-puted will tend to the ideal time to wait between two maintenance actions. Bothoptions are presented in Section 4.3 and the benefits between those two variantsare compared.

4.3 Simulations using Uppaal-SMC

In order to make simulations and evaluate the benefits of our dynamic scheduler,we used the Uppaal-SMC [8] modeling tool. Uppaal is a toolbox for verificationof real-time systems represented by one or more timed automata extended with

Page 10: Towards Adaptive Scheduling of Maintenance for Cyber

integer variables, data types and channel synchronization. The relevance of usingUppaal-SMC in our case lies in the possibility of running simulations of thesystem specified as a set of probabilistic timed automata.

One of the challenges of this work and its practical case study is to validatethe failure behavior model built using machine learning techniques, and to useit in order to schedule maintenance actions. Of course, we had to find a wayof integrating such a model in a tool like Uppaal, in order to benefit from theintegration of the representation of the maintenance strategy (gathered fromspecifications of the component, and using TA), the decision tree providing in-sight on the failure behavior of the nozzles (learned prior to its implementation inUppaal following description in Section 4.1), and the function that dynamicallyupdates the triggering of maintenance (presented in Section 4.2).

Fig. 5: Designing components by state-machines in Uppaal.

As shown in Figure 5, the whole system is composed of 3 components, all ofthem modeled using state-machines:

1. The printer usage: this models how the printer is currently being used. Itis composed of 3 states, printing, standby (e.g. waiting for a print job) andsleeping (for long periods out of use). It is important to model the printerusage since the way maintenance actions are scheduled depends on the us-age of the printer. Indeed, print jobs will never be interrupted to perform

Page 11: Towards Adaptive Scheduling of Maintenance for Cyber

maintenance. It is similar when the printer is in sleeping mode, since lessmaintenance is required when the printer is not in use.

2. The scheduler : this models the maintenance actions and under what con-ditions they are performed. In our case, we focuses especially on 3 nozzle-related maintenance actions. Inside the specifications of the scheduler, aninner function is defined reproducing the outcome of the decision tree aswell as the values set to the refreshed timed conditions (see Algorithm 1).The scheduler reflects the specifications of the nozzles under which mainte-nance is supposed to be performed.

3. The updater : this calls – with a given frequency – the function refresh-ing the scheduler from the outcome of the classification model. This func-tion consists of a call to the decision tree implemented in Uppaal as anembedded function, itself consisting of a succession of if conditions repre-senting the overall structure of the nozzle failure behavior. Moreover, thisfunction also updates the timed conditions for the triggering of mainte-nance actions (such as limitMA1 upper, limitMA1 lower, limitMA3 upper,limit usageTimer lower and limit usageTimer upper in Figure 5).

4.4 Results

Experimental Parameters. In order to measure the benefits of our dynamicscheduler compared to a static scheduler, we relied on several metrics. First, byusing our designed models in Uppaal and making simulations, we were able tocompute how many times each state has been visited e.g. how many maintenanceactions have been triggered. That is the reason why we made 10 runs with avirtual duration of 600k seconds (approximately 1 week) for each configurationwe wanted to test. From those runs, we were able to find out the behavior ofour scheduler in the long term, since the average of the runs is computed over10 weeks of simulation. We were finally able to compare our results with a static

scheduler (no call to classification model) and a dynamic scheduler. In the caseof the dynamic scheduler, we distinguish 3 cases: the first (Dynamic Scheduler

I ) resets, once maintenance is triggered, the values of the timed conditions totheir initial values e.g. defined in the specifications (currentBoundary ← tσ).The second (Dynamic Scheduler II ) aims, after a maintenance action has beenperformed, to average the timed conditions between the past boundary and whenthe MA has really been done (currentBoundary ← (currentBoundary+tσ)/2).The last one (Dynamic Scheduler III ) computes the parameters y and z as afunction of the confidence factor provided by the classification model.

In order to run experiments with Uppaal, several parameters have been set,such as the refreshing frequency (60 seconds), the variation of how much topostpone or advance maintenance actions (from 0.5% to 10%), the usage of theprinter (the printer goes to sleeping mode every 8 hours and during at least 2hours; after being printing during more than 2min., then, the printer can stopprinting and go into standby mode; after being without printing during morethan 1min., then, the printer can start printing again) and the nozzle behavior

Page 12: Towards Adaptive Scheduling of Maintenance for Cyber

(a) Results achieved for MA1 (b) Results achieved for MA3

(c) Results achieved for MA2

Fig. 6: Results concerning Dynamic Scheduler I and II. The value correspondingto a variation of clock guards of 0% stands for the Static Scheduler.

(a nozzle is likely to fail every 20 minutes). We can also mention maintenance-related additional information and settings:

– Maintenance action #1 (MA1): according to the usage of the printer, usuallytriggered every 40sec – 3.5hrs. Maximum acceptable threshold set: 5hrs.

– Maintenance action #2 (MA2): usually triggered when the system is goingto and coming back from sleeping mode.

– Maintenance action #3 (MA3): according to the usage of the printer, usuallytriggered every 15min. – 3.5hrs. Maximum acceptable threshold set: 5hrs.

Page 13: Towards Adaptive Scheduling of Maintenance for Cyber

Fig. 7: Overall MA performed and ratio of decisions classified as failure. Thevalue corresponding to a variation of clock guards of 0% stands for the Static

Scheduler. The vertical line shows a possible optimal variation of the clocksguards.

Results. We give a graphical representation of the number of the achieved main-tenance actions performed per week in function of the variation of the degreeof clock increase/decrease, as well as the expected nozzle failure rate in Figure6a to 6c (for Dynamic Scheduler I and II ). The value corresponding to a vari-ation of clock guards of 0% stands for the static scheduler, since parameters yand z equaling 0 means no change of the clock guards. We also give the resultsof the Static Scheduler and the Dynamic Scheduler III in Table 2. The resultsare stated for each type of MA by number of MA performed (#MA) and theexpected failure rate when the maintenance is performed (FRMA). The expectedfailure rate is computed from the distribution shown in Figure 4, since the in-formation providing the nozzle failure rate as a function of the time since lastmaintenance has been computed from the logs. We also present for each sched-uler the proportion of cases classified as failures by our failure behavior model(Cf in Table 2 and ratio of output failures in Figure 7).

Discussion. From the results achieved, we can see that in some settings, thenumber of maintenance actions performed can decrease. Nonetheless, while thenumber of maintenance actions is decreasing, the expected nozzle failure rateslightly increases, yet still within acceptable values. We can refer to the firstdynamic scheduler and to MA3 with an optimal decrease of three times as fewmaintenance actions performed. The counterpart is an increase of the expectedfailure rate by 2 points. We can also see that in some cases, especially MA2, ourscheduler has no influence on the triggering of this specific type of maintenance.

Scheduler #MA1 FRMA1#MA2 FRMA2

#MA3 FRMA3Cf

Static 1205.6 6.1% 28.5 20.5% 290.3 13.8% 8.7%Dynamic III 806.6 8.2% 29.1 25.6% 378.1 12.3% 9.5%

Table 2: Number of maintenance actions triggered using Dynanic Scheduler III.

Page 14: Towards Adaptive Scheduling of Maintenance for Cyber

Indeed, MA2 is an example of usage-based maintenance, whereas our methodonly modifies timed-based maintenance. According to the results achieved by theDynamic Scheduler III e.g. advancing or postponing the clock guards using theconfidence factor of the Decision Tree, we can see that less MA1 is performed butthe expected failure rate when MA1 is triggered is increased by 2 points, whereasMA3 is done more often. Furthermore, according to the number of cases detectedas failure cases by our classifier as well as the number of actions performed(whatever the type), we can see that in some cases, less maintenance is done andfewer failures are detected by our model, in particular when using a variationof how much to postpone or advance maintenance actions of 1 or 2%. This isshown in Figure 7 by a red line. This result proves our assumption stating thatwe can at the same time save maintenance and improve the print quality. Finally,for some settings and independently of the scheduler, our dynamic schedulersperform more actions than a static scheduler.

5 Conclusion

In this paper, we describe a new method for dynamic maintenance scheduling.We expect that our method can be generalized to other domains. The majornovelty we bring, to the best of our knowledge, is a method involving machinelearning techniques by using the decision of a classifier in order to put forward orpostpone when maintenance should be triggered, i.e. how to update schedulingof automatic periodic maintenance defined by a TA.

According to our results, we can conclude that our dynamic scheduler reducesthe number of actions performed in different settings. We also note that the priceto pay in order to do less maintenance is a slight increase of the expected failurerate. Thus, depending on how critical the component is, our technique can reducemaintenance costs for a negligible increase in the risk of breakdown. In our case,it entailed an insignificant loss of print quality. Moreover, in some settings, wecould reduce the failure rate as well as the number of maintenance performed.We also believe that our technique can be particularly interesting in the case ofthe unavailability of sensors that provide direct information about componentfailures. The strength of an embedded decision model is its availability at anytime. Furthermore, we expect that, applied to other real systems, our techniquecould achieve similar results to those found in our simulation.

With regard to further work, we think that within the scope of our casestudy, we can extend the current experiment not only to nozzles but also toother related components, or at least components that share related mainte-nance actions. Furthermore, in future, we will look into the use of fault trees[26]. Indeed, we believe that a fault tree pattern can be used to model inter-actions between several components, and how a failure can propagate from onecomponent to others. We also hope that extending such a technique to the useof real-time automata can enhance the schedulability and control of CPS. Wecan also enhance the scheduler by taking into account the timing occurrencesof anomalies [21]. Finally, we could orientate the choice of the machine learning

Page 15: Towards Adaptive Scheduling of Maintenance for Cyber

techniques used towards stream mining tools and algorithms [6], which wouldadditionally offer the possibility of updating the failure behavior model the morenew labeled data are available. Then, it could be possible to deal with unseenevents or combination of parameters, and keep an accurate model throughoutthe life of the CPS.

Acknowledgments Thanks to Lou Somers and Patrick Vestjens for providingindustrial datasets as well as required expertise related to the case of study.This research is supported by the Dutch Technology Foundation STW underthe Robust CPS program (project 12693).

References

1. Abdeddaım, Y., Asarin, E., Maler, O.: Scheduling with timed automata. Theoret-ical Computer Science 354(2), 272–300 (2006)

2. Alur, R., Dill, D.L.: A theory of timed automata. Theoretical computer science126(2), 183–235 (1994)

3. Arab, A., Ismail, N., Lee, L.S.: Maintenance scheduling incorporating dynamics ofproduction system and real-time information from workstations. Journal of Intel-ligent Manufacturing 24(4), 695–705 (2013)

4. Bengtsson, J., Larsen, K., Larsson, F., Pettersson, P., Yi, W.: Hybrid Systems III:Verification and Control, chap. UPPAAL — a tool suite for automatic verificationof real-time systems, pp. 232–243. Springer Berlin Heidelberg (1996)

5. Bengtsson, J., Yi, W.: Lectures on Concurrency and Petri Nets: Advances inPetri Nets, chap. Timed Automata: Semantics, Algorithms and Tools, pp. 87–124.Springer Berlin Heidelberg (2004)

6. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: Moa: Massive online analysis. J.Mach. Learn. Res. 11, 1601–1604 (2010)

7. Bouyer, P., Brihaye, T., Markey, N.: Improved undecidability results on weightedtimed automata. Information Processing Letters 98(5), 188–194 (2006)

8. Bulychev, P., David, A., Larsen, K.G., Mikucionis, M., Poulsen, D.B., Legay, A.,Wang, Z.: Uppaal-smc: Statistical model checking for priced timed automata. arXivpreprint arXiv:1207.1272 (2012)

9. Burns, A.: How to verify a safe real-time system: The application of model checkingand timed automata to the production cell case study. Real-time systems 24(2),135–151 (2003)

10. Butler, K.L.: An expert system based framework for an incipient failure detectionand predictive maintenance system. In: Proceeding of the International Confer-ence on Intelligent Systems Applications to Power Systems. pp. 321–326. Orlando,Florida, USA (1996)

11. Cardenas, A.A., Amin, S., Sastry, S.: Secure control: Towards survivable cyber-physical systems. 2013 IEEE 33rd International Conference on Distributed Com-puting Systems Workshops pp. 495–500 (2008)

12. Derler, P., Lee, E.A., Vincentelli, A.S.: Modeling cyber–physical systems. Proceed-ings of the IEEE 100(1), 13–28 (2012)

13. Flach, P.: Machine learning: the art and science of algorithms that make sense ofdata. Cambridge University Press (2012)

Page 16: Towards Adaptive Scheduling of Maintenance for Cyber

14. Fumeo, E., Oneto, L., Anguita, D.: Condition based maintenance in railway trans-portation systems based on big data streaming analysis. Procedia Computer Sci-ence 53, 437–446 (2015)

15. Grinchtein, O., Jonsson, B., Leucker, M.: Learning of event-recording automata. In:Formal Techniques, Modelling and Analysis of Timed and Fault-Tolerant Systems,pp. 379–395. Springer (2004)

16. Gross, P., Boulanger, A., Arias, M., Waltz, D.L., Long, P.M., Lawson, C., Ander-son, R., Koenig, M., Mastrocinque, M., Fairechio, W., et al.: Predicting electricitydistribution feeder failures using machine learning susceptibility analysis. In: Pro-ceedings of the 21st National Conference on Artificial Intelligence. vol. 21, pp.1705–1711. Boston, Massachusetts, USA (2006)

17. Hashemian, H.M., Bean, W.C.: State-of-the-art predictive maintenance techniques.IEEE T. Instrumentation and Measurement 60(10), 3480–3492 (2011)

18. Kaiser, K.A., Gebraeel, N.Z.: Predictive maintenance management using sensor-based degradation models. Systems, Man and Cybernetics, Part A: Systems andHumans, IEEE Transactions on 39(4), 840–849 (2009)

19. Lee, E.A.: Cyber physical systems: Design challenges. In: Proceedings of the 11thIEEE International Symposium on Object Oriented Real-Time Distributed Com-puting. pp. 363–369. Orlando, Florida, USA (2008)

20. Li, H., Parikh, D., He, Q., Qian, B., Li, Z., Fang, D., Hampapur, A.: Improvingrail network velocity: A machine learning approach to predictive maintenance.Transportation Research Part C: Emerging Technologies 45, 17–26 (2014)

21. Maier, A., Niggemann, O., Eickmeyer, J.: On the learning of timing behavior foranomaly detection in cyber-physical production systems. In: Proceedings of the26th International Workshop on Principles of Diagnosis. pp. 217–224. Paris, France(2015)

22. Maier, A.: Online passive learning of timed automata for cyber-physical productionsystems. In: Proceedings of the 12th IEEE International Conference on IndustrialInformatics. pp. 60–66. Porto Alegre, Brazil (2014)

23. Niggemann, O., Biswas, G., Kinnebrew, J.S., Khorasgani, H., Volgmann, S., Bunte,A.: Data-driven monitoring of cyber-physical systems leveraging on big data andthe internet-of-things for diagnosis and control. In: Proceedings of the 26th Inter-national Workshop on Principles of Diagnosis. pp. 185–192. Paris, France (2015)

24. Nowaczyk, S., Prytz, R., Rognvaldsson, T., Byttner, S.: Towards a machine learningalgorithm for predicting truck compressor failures using logged vehicle data. In:Proceedings of the 12th Scandinavian Conference on Artificial Intelligence. pp.205–214. Aalborg, Denmark (2013)

25. Quinlan, J.R.: C4. 5: programs for machine learning. Elsevier (2014)26. Ruijters, E., Stoelinga, M.: Fault tree analysis: A survey of the state-of-the-art in

modeling, analysis and tools. Computer Science Review 15, 29–62 (2015)27. Simeu-Abazi, Z., Bouredji, Z.: Monitoring and predictive maintenance: Modeling

and analyse of fault latency. Comput. Ind. 57(6), 504–515 (Aug 2006)28. Verwer, S., Weerdt, M., Witteveen, C.: Efficiently identifying deterministic real-

time automata from labeled data. Machine Learning 86(3), 295–333 (2011)29. Witten, I.H., Frank, E., Trigg, L.E., Hall, M.A., Holmes, G., Cunningham, S.J.:

Weka: Practical machine learning tools and techniques with java implementations(1999)