maximum likelihood failure diagnosis in finite state machines … · 2010-02-19 · existing...

Maximum Likelihood Failure Diagnosis in

Finite State Machines under Unreliable Observations

Eleftheria Athanasopoulou, Lingxi Li, and Christoforos N. Hadjicostis

Abstract

In this paper we develop a probabilistic methodology for failure diagnosis in finite state machines based on a

sequence of unreliable observations. Given prior knowledge of the input probability distribution but without actual

knowledge of the applied input sequence, the core problem we consider aims to choose from a pool of known,

deterministic finite state machines (FSMs) the one that most likely matches the given sequence of observations.

The problem becomes challenging because of sensor failures which may corrupt the output sequence by inserting,

deleting, and transposing symbols with certain probabilities (that are assumed known). We propose an efficient

recursive algorithm for obtaining the most likely underlying FSM, given the possibly erroneous observed sequence.

The proposed algorithm essentially allows us to perform online maximum likelihood failure diagnosis and is applicable

to more general settings where one is required to choose the most likely underlying hidden Markov model (HMM)

based on a sequence of observations that may get corrupted with known probabilities. The algorithm generalizes

existing recursive algorithms for likelihood calculation in HMMs by allowing loops in the associated trellis diagram.

We illustrate the proposed methodology using an example of diagnosis (classification) of communication protocols.

Index terms: Finite state machines, failure diagnosis, maximum likelihood model classification, insertions,

deletions, transpositions, discrete event systems, probabilistic automata.

I. INTRODUCTION

Failure diagnosis is an important aspect in modern system and network operation, particularly in applications

that are life-critical and require high reliability (e.g., medical, transportation or military systems). In this paper

we focus on diagnosis based on discrete event system (DES) formulations. More specifically, we consider failure

diagnosis in systems that have discrete state spaces and event-driven evolutions (such systems take state transitions

only when a certain set of discrete events occurs). Any large-scale dynamic system, such as a computer system, a

manufacturing system, a chemical process, or a semiconductor manufacturing process can be modeled as a DES

at some level of abstraction [1]. Much work has been done in failure diagnosis of discrete event systems (DESs)

including deterministic diagnosis [2]–[7], probabilistic diagnosis [8], [9] or diagnosis in stochastic finite automata

This material is based upon work supported in part by the National Science Foundation under NSF Career Award No 0092696 and NSF ITR

Award No 0426831, and in part by the Air Force Office of Scientific Research under Award No AFOSR DoD F49620-01-1-0365URI. Any

opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the

views of NSF or AFOSR.

The authors are with the Coordinated Science Laboratory and the Department of Electrical and Computer Engineering, University of Illinois at

Urbana-Champaign, IL 61801–2307, USA. Corresponding author: C. N. Hadjicostis, 357 CSL, 1308 West Main Street, Urbana, IL 61801–2307,

USA (e-mail: [email protected]).

[10], [11]. Related problems of failure diagnosis and conformance testing of communication systems/protocols

(modeled by finite state machines) are explored in [12]–[14].

In this paper, we develop a recursive algorithm for maximum likelihood calculation of two (or more) finite

state machines (FSMs) based on a sequence of possibly erroneous observations. This algorithm is very general

and can be applied in many settings, such as the evaluation problem in hidden Markov models (HMMs) [16],

the parsing problem in probabilistic automata (PA) [20], [21], and the trellis-based decoding of variable length

codes (VLC) [24], [25]. In this paper, we focus on failure diagnosis applications and motivate our approach by

considering the following problem: given two known deterministic finite state machine models (one corresponding

to the fault-free version of the underlying system and the other corresponding to a faulty version of the system), we

want to determine which of the two competing models has most likely produced a given sequence of observations.

We assume that the input sequence is unknown but the a priori input distribution is known; the diagnoser needs

to make its decision based on the observed outputs which are associated (not necessarily exclusively) with FSM

transitions. The additional challenge in our diagnosis formulation is that the observed sequence may be corrupted

due to sensor malfunctions. For example, the information that the sensors provide may be corrupted due to inaccurate

measurements, limited resolution, degraded sensor performance because of aging, or hardware failures. In this work,

we are interested in unreliable sensors that may cause outputs to be deleted, inserted or transposed with certain

known, possibly time-varying probabilities.

One way to view the above problem is in terms of Figure 1: the system under diagnosis is driven by a randomly

generated input sequence xLs1 =< x[1], x[2], . . . , x[Ls] > (with known a priori input probability distribution1).

The output sequence yLs1 =< y[1], y[2], ..., y[Ls] > that is generated can become corrupted due to sensor failures;

hence, the observed sequence zL1 =< z[1], z[2], ..., z[L] > may be erroneous and its length L will generally not

equal the output sequence length Ls (L != Ls). Given an observed sequence zL1 =< z[1], z[2], ..., z[L] >, our

goal is to decide which model has most likely generated this observed sequence. The assumption of known input

statistics essentially reduces our problem to classification between two (or more) known hidden Markov models

(HMMs) based on the observation of an output sequence [16], [17]. What makes our task here more challenging

than traditional classification of Markov models are the sensor failures which may corrupt the true output sequence

yLs1 . Note that (although unlikely in a diagnosis scenario) the assumption that the distribution of the input fed to

each FSM is the same can be easily relaxed (we can have different inputs feeding to each FSM model as long as

their a priori probability distributions are known). Thus, the assumption that we are given deterministic FSMs with

known input statistics is not restrictive because, as an alternative, we can start with HMMs and perform model

classification based on our (possibly corrupted) observations.

If the observed sequence was not corrupted, i.e., if it was the same as the output sequence, the likelihood that the

observed sequence comes from a particular model can be calculated as the sum of the probabilities of all possible

1The input sequence could be white (i.e., each input could be generated with identical distribution at each time step) or it could be generated

based on some underlying Markov model. The latter case leads to a more complicated probabilistic description (HMM) but can be handled

essentially using the same techniques (applied, however, on the more complex HMM).

UncertaintySensor!inducedSystem under

Diagnosis

Fault!freeor Faulty?

Diagnoser

Input ObservedOutputSequence Sequence SequencexLs

1 yLs1 zL

1

Fig. 1. Problem formulation.

state sequences that are consistent with the observations. This can be done using a recursive algorithm similar

to the forward algorithm [16], which solves the evaluation problem in HMMs and is used frequently in speech

recognition applications (e.g., in [16]–[19]). Given a model and an observed sequence, the (standard) evaluation

problem consists of computing the probability that the observed sequence was produced by the model. When

there are multiple competing models, these probabilities can be used to choose the model which best matches the

observations. The forward algorithm is also used in pattern recognition applications [20], [21] to solve the syntax

analysis or parsing problem, (i.e., to recognize a pattern by classifying it to the appropriate generating grammar),

and in bioinformatics [22], [23] to evaluate whether a DNA or protein sequence belongs to a particular family of

sequences.

When sensor failures are possible, several output sequences may correspond to a given observed sequence and

one would need to identify all possible state sequences and the probabilities with which they agree with both the

underlying FSM model and the observations. If the output sequence is corrupted by deletions, the standard forward

algorithm is insufficient because there are potentially infinite output sequences that agree with a given observed

sequence. To address this inability of the standard forward algorithm, we propose in this paper a recursive algorithm

that allows us to efficiently compute the probability that a given FSM model matches the observed sequence: every

time a new observation is made, the algorithm simply has to update the information it keeps track of and can output

on demand the probability that a given model has produced the sequence observed so far. The recursive algorithm

we develop relates to (and generalizes) recursive algorithms for the evaluation problem in HMMs [16], the parsing

problem in probabilistic automata (PA) [20], [21] and the trellis-based decoding of variable length codes (VLC)

[24], [25]; all of these existing techniques can be modified to deal with some types of sensor failures but, unlike our

algorithm, cannot handle deletions or other combinations of sensor failures that lead to loops of silent transitions

in the associated trellis diagram. We elaborate on the relationship of our approach to these earlier approaches in a

discussion in Section VI.C, after we describe our recursive algorithm in detail.

The contribution of this paper is three-fold: (i) We formulate a maximum likelihood diagnosis problem where

systems are modeled by FSMs and the diagnosis decision is based on an observed, possibly corrupted sequence.

(ii) We construct an observation FSM to capture all possible output sequences that are associated with the observed

sequence. (iii) We propose a recursive algorithm that can be used online to compute the probability that a given

FSM model matches the observed sequence. The recursion is in terms of the number of observations in the sequence

and our algorithm extends existing likelihood evaluation algorithms because it can handle loops of silent transitions

that appear in the trellis diagram due to output deletions.

The remainder of the paper is organized as follows. In Section II we introduce the necessary notation for our

development. In Section III we formulate the diagnosis problem and present the sensor failure model that includes

insertions and deletions. In Section IV we propose a way to construct an observation FSM that captures the

observed sequence along with the pairs of compatible output sequence and error patterns (and their corresponding

probabilities). In Section V we propose an algorithm to compute the likelihood of the observations given the model

and in Section VI we develop an efficient recursive version of this algorithm. In Section VII we present an example

of failure diagnosis in a communication protocol. Conclusions and future work are discussed in Section VIII.

II. PRELIMINARIES

A deterministic finite state machine (FSM) model S can be described by a six-tuple S = (Q, X, Y, !, ", q0),

where Q is the finite set of states (without loss of generality we will also denote each state by its index, i.e.,

Q = {0, 1, ..., |Q| " 1}, where |Q| denotes the size of Q); X = {x1, x2, ..., x|X|} is the finite set of inputs;

Y = {y1, y2, ..., y|Y |} is the finite set of outputs; ! is the state transition function; " is the output function; and q0

is the initial state. The state q[n + 1] at time epoch n + 1 of the FSM is specified by its state q[n] at time epoch n

and its input x[n + 1] via the state transition function ! as q[n + 1] = !(q[n], x[n + 1]). The output of the FSM is

associated with the FSM transition and is specified via the output function " as y[n + 1] = "(q[n], x[n + 1]). Note

that the FSMs we consider here are event-driven and we use n to denote the time epoch between the occurrence

of the nth and (n + 1)st input.

We assume for simplicity2 that the inputs applied to the given FSM are chosen according to a probability

distribution determined by the current FSM state. Thus, the FSM behaves as a homogeneous Markov chain, i.e.,

a Markov chain in which the transition probabilities are not a function of time [35]. This Markov chain can be

obtained from the given FSM by assigning to each transition a probability that depends on the probabilities of the

inputs that cause it. If we denote the state transition probabilities by ajk = P{(q[n + 1] = j) | (q[n] = k)}, the

state transition matrix of the Markov chain associated with the given system is captured by A = (ajk)0!j,k!|Q|"1.

Note that, in order to keep subsequent notation clean, the rows and columns of all matrices and vectors are indexed

starting from 0 (not 1). The state transition matrix A captures how state probabilities evolve in time via the evolution

equation

![n + 1] = A![n], (1)

where ![n] is a |Q|-dimensional vector whose jth entry denotes the probability that the Markov chain is in state j

at time step n. Note that the columns of the |Q|# |Q| nonnegative matrix A sum to 1; similarly, the |Q|-dimensional

2Although not discussed in this paper, more complicated input statistics can also be handled perhaps by enlarging the state space of the

resulting HMMs. Alternatively, as we discuss in the next section, one could start with a probabilistic description of HMMs as opposed to FSMs

driven by inputs with known statistics.

probability vector ![n] has elements that are nonnegative and sum to 1.

To make the connection with HMMs more transparent, we will denote the FSM state at time step n by a |Q|-

dimensional binary indicator vector q[n] which has exactly one nonzero entry with value equal to “1.” This single

nonzero entry denotes the state of the system (i.e., if the jth entry of q[n] equals “1,” then the FSM is in state j

at time step n). If input xi is applied at time step n + 1 (i.e., x[n + 1] = xi), then the state evolution of the system

can be captured by an equation of the form

q[n + 1] = Aiq[n], (2)

where Ai is the |Q|# |Q| state transition matrix associated with input xi. Specifically, Ai is such that each of its

columns has at most one nonzero entry with value “1” (i.e., matrix Ai has a total of at most |Q| nonzero entries,

all with value “1”). A nonzero entry at the (j, k)th position of Ai denotes a transition from state k to state j

under input xi. (Clearly, the constraint that each column of Ai has at most one nonzero entry simply reflects the

requirement that there can only be at most one transition out of a particular state under a particular input.) Note

that if the inputs applied to a given FSM are statistically independent from one time step to another and that their

probability distribution is fixed so that, at any given time step n, from a given state q[n], input x[n + 1] = xi takes

place with probability pi where!|X|

i=1 pi = 1, the probability pi of input xi does not depend on the current state

of the FSM and the corresponding matrix A can be written as

A =|X|"

i=1

piAi, (3)

where Ai is the state transition matrix3 corresponding to input xi.

Since the diagnoser has no access to the inputs that drive the FSM but only observes the (possibly corrupted)

outputs it produces, the FSM state sequence is not completely known and the resulting probabilistic system is an

HMM [16]. An HMM is described by a five-tuple (Q, Y, !, ", "[0]), where Q is the set of states, Y is the set of

outputs, ! captures the state transition probabilities, " captures the output probabilities associated with transitions,

and "[0] is the initial state probability distribution vector. We define the |Q|# |Q| matrix A! , associated with output

# $ Y of the FSM, as follows: an entry at the (j, k)th position of A! captures the probability of a transition from

state k to state j that produces output #. Note that"

!#Y

A! = A. (4)

The joint probability of the state at time step n and the observations y[1], . . . , y[n] is captured by the vector "[n]

where the entry "[n](j) denotes the probability that the HMM is in state j at step n and y[1], . . . , y[n] have been

observed. More formally, the vector "[n] is defined as "[n](j) = P (Q[n] = j, Y n1 = yn

1 ), where the capital letters

denote random variables and the small letters denote values of these variables. When y[n + 1] becomes available

3The assumption that the input distribution is independent of the system state implies that each Ai is a state transition matrix that has exactly

one nonzero (“1”) entry at each of its columns.

at time epoch n + 1, we can update the joint probability of the state of the HMM and y[1], . . . , y[n + 1] as

"[n + 1] = Ay[n+1]"[n]. (5)

Note that "[n + 1] is not necessarily a probability vector; its jth entry denotes the joint probability of observing

y[1], . . . , y[n], y[n + 1] and being in state j at time step n + 1. If we normalize "[n + 1] to "![n + 1] so that its

entries sum to one, then "![n + 1](j) is the conditional probability that the HMM is in state j at time step n + 1

given the observation of y[1], y[2], . . . , y[n + 1], i.e., "![n + 1](j) = P (Q[n + 1] = j | Y n+11 = yn+1

1 ).

The above discussion described how starting from a given deterministic FSM with known input statistics, one

can develop a corresponding HMM and use it to perform failure diagnosis. Although in this work we assume we

are given deterministic FSMs with known input statistics, our development can just as easily be applied to the case

where we start with HMMs (possibly with different number of states) and our goal is to perform model classification

based on our observations.

III. PROBLEM FORMULATION

In this section, we describe in detail the problem we consider in this paper, including the system under diagnosis,

the sensor failure model, and the maximum likelihood formulation.

A. System Under Diagnosis

Using the notation introduced in the previous section, the fault-free and faulty FSM models that we are interested

in are given by S1 = (Q, X, Y, !1, "1, q10) and S2 = (Q, X, Y, !2, "2, q20) respectively. The set of states, the set of

inputs, and the set of outputs are taken to be identical for both FSMs and our focus is on detecting changes in the

state transition functionality of the FSM (this would be the case, for instance, if one aims to detect an incorrectly

implemented version of a given FSM). Note that these assumptions are not essential and can be relaxed in a

straightforward manner (in fact, as mentioned in the introduction, one could even use the techniques we develop

to classify different HMMs, e.g., different FSM models driven by different input distributions). Apart from the

uncertainty in the observed sequence due to sensor failures (which will be discussed next), there is uncertainty due

to the fact that the input sequence is not exactly known.

Example 1: We are given two competing FSM models, S1 and S2 (corresponding to the fault-free and the

faulty model respectively); these models will be used as a running example throughout the paper. The left parts

of Figures 2 and 3 show the state transition diagrams of FSMs S1 and S2 with four states Q = {0, 1, 2, 3}, three

inputs X = {x1, x2, x3}, and three outputs Y = {a, b, c}. Each transition is labeled as xi | #, where xi $ X

denotes the input that drives the FSM and # $ Y denotes the output produced by the FSM. For simplicity, we

assign equal a priori probabilities to the inputs, i.e., each input has probability 1/3 of occurring. The right parts of

Figures 2 and 3 show the HMMs that correspond to FSMs S1 and S2 under this input distribution. Each transition

in the HMM is labeled as p | #, where p denotes the probability of the transition and # $ Y denotes the output

2

3

1

2 3

2

3

1

3

1

2

1

x |c

x |c

x |c

x |c

x |c

x |a

x |a

x |a

x |b

x |b

x |bx |b

1/3 |c

1/3 |c

1/3 |c

1/3 |c

1/3 |a

1/3 |b

1/3 |b

1/3 |a

1/3 |c 2/3 |b1/3 |a0 1

23 3 2

10

Fig. 2. State transition diagram of fault-free FSM S1 (left) and HMM corresponding to S1 (right).

2

x |a3

x |a1

x |a3

x |ax |b2

1

x |a1

x |b

x |b2

2

x |cx |c3

1

1/3 |b

1/3 |a

2/3 |a

x |c3 1/3 |c

1/3 |a1/3 |b

1/3 |b

2/3 |c

1/3 |a

1/3 |b

x |b0 1 1

2 23 3

0

Fig. 3. State transition diagram of faulty FSM S2 (left) and HMM corresponding to S2 (right).

produced. Note that according to the notation of the previous section the matrices A1,! , # $ Y , for the HMM that

corresponds to S1 are the following:

A1,a =

#

$$$$$$%

0 0 0 0

1/3 1/3 0 0

0 0 0 0

1/3 0 0 0

&

''''''(, A1,b =

#

$$$$$$%

0 0 0 0

0 0 0 2/3

0 1/3 0 0

0 0 1/3 0

&

''''''(, A1,c =

#

$$$$$$%

0 1/3 1/3 0

0 0 1/3 0

0 0 0 0

1/3 0 0 1/3

&

''''''(. !

B. Sensor Failure Model

The output sequence yLs1 =< y[1], y[2], ..., y[Ls] > produced by the system under diagnosis may become

corrupted due to sensor failures; hence, the observed sequence zL1 =< z[1], z[2], ..., z[L] > may contain erroneous

information and, in general, its length will not be equal to the output sequence length (i.e., L != Ls). We consider

errors due to transient conditions that occur independently at each observation step with certain (known) probabilities

that could depend on the observation step, i.e., the probability of a particular transient error could vary as a function

of the observation step. (Note that the case of permanent errors is a special case of what we consider here, because

in that case the probability of sensor failures would be equal to one at all observation steps.) We also make the

reasonable assumption that given the output sequence, sensor failures at a particular observation step are statistically

independent of sensor failures at other observation steps and of the system model (in particular, sensor failures are

independent of the inputs that drive the system).

We start by focusing on deletions and insertions; later on, we introduce another type of error, namely transpo-

sitions. If an output # $ Y is deleted by the sensors, then we do not observe anything, i.e., the deletion causes

# % $, where $ denotes the empty label. Similarly, if an output # $ Y is inserted by the sensors, then we observe

#, i.e., the insertion causes4 $ % #.

For notational purposes, the sensor failures are captured by the set of failures F = D & I , where D =

{d!1 , d!2 , ..., d!|D| | #1, #2, ...,#|D| $ Y } is the set of |D| possible deletions and I = {i!1 , i!2 , ..., i!|I| |

#1, #2, ...,#|I| $ Y } is the set of |I| possible insertions. We also define a function out which allows us to recover

the corresponding output in the set Y given a deletion or an insertion, i.e., out(d!) = # and out(i!) = # for

d! $ D and i! $ I . The following example is provided to clarify our notation. We use E to denote the error

pattern, i.e., the sequence of non-erroneous or erroneous events. The following example is provided to clarify our

notation.

Example 1 (continued): For our example, we assume that the set of outputs is given by Y = {a, b, c} and that

the set of failures is F = {db, ia}, i.e., sensors may delete b from the output sequence or they may insert a to the

output sequence. Suppose that the observed sequence is zL1 =< a b c b c a c >. Some possible output sequences

yLs1 are the following: < $ b c b c a c >'< b c b c a c >, < $ b c b b b c a c >'< b c b b b c a c >, and

< b a b b c b b c b a b c b >. For example, the second possible output sequence corresponds to the following error

pattern: E = {insertion of a, no error, no error, no error, deletion of b, deletion of b, no error, no error, no error}.

Note that the set of possible output sequences corresponding to the observed sequence has infinite cardinality. !

Insertions and deletions occur with probabilities that vary as a function of the observation step. We assume that

deletion d! occurs with known probability pd! [m] at observation step m; similarly, an insertion of # (i! $ I) occurs

with known probability pi! [m] at step m when # is observed. The probability for the absence of sensor failures at

a particular observation step depends on the observed output at that particular step and is complement to the total

probability of sensor failures (so that the sum of the probabilities of sensor failures and the absence of errors is

one). In Section IV, we elaborate on the probability model we use for sensor failures, and describe explicitly how

to capture concisely the set of all possible output sequences that correspond to the observed sequence under the

sensor failure model, as well as how to calculate the probability of sensor failures that corrupt each possible output

sequence to the observed sequence.

4A related model for the case of unreliable observations due to transmission errors is presented in [36] by introducing an unreliable mask

function. The work in [37] considers communication in channels with insertions and a bounded number of deletions. Our failure model here

captures those cases and also allows for an infinite number of deletions. Note, however, that our primary motivation here is to handle errors

caused by sensor failures.

C. Likelihood Calculation

Given the observed sequence zL1 =< z[1], z[2], ..., z[L] > and assuming known probability distributions for the

input and initial state of the system under diagnosis, and for the sensor failures, our objective is to compare the

probability that the system under diagnosis is fault-free against the probability that it is faulty. More specifically,

to minimize the probability of incorrect diagnosis,5 we need to use the maximum a posteriori probability (MAP)

rule, i.e., we need to compare

P (S1 | zL1 ) >

< P (S2 | zL1 ). (6)

Clearly, if the probability of S1 given the observed sequence is larger than the probability of S2 given the observed

sequence, we should declare that the system under diagnosis is fault-free, whereas if it is smaller we should declare

that the machine is faulty (see [38] for more details). Our formulation is essentially a model selection task, where

the candidate models correspond to the fault-free and the faulty operation of the system under diagnosis.

Assuming known priors for FSMs S1 and S2 given by P1 and P2 respectively (with P1 + P2 = 1), the above

comparison can be reduced to

P (zL1 | S1) · P1

>< P (zL

1 | S2) · P2. (7)

Therefore, our task reduces to calculating the probability of the observed sequence given that the system under

diagnosis is S, i.e., the likelihood P (zL1 | S) of the observations given S, where S is either S1 or S2. If sensor

failures were not present, the observed sequence would be the same as the output sequence and the likelihood of

the observed sequence given S could be calculated as the sum of the probabilities of all possible state sequences

that are consistent with the observations. To ensure this consistency, we would need to identify the possible state

sequences that agree with both the observations and S. With sensor failures, however, several output sequences

correspond to the observed sequence and, for each one, we would have to identify all consistent state sequences and

their associated probabilities. Note that if we use E to denote a sensor failure pattern, then we can write P (zL1 | S)

as follows

P (zL1 | S) =

"

E,yLs1

P (yLs1 | S) · P (zL

1 , E | yLs1 , S) (8)

="

E,yLs1

P (yLs1 | S) · P (zL

1 , E | yLs1 ) (9)

Notice that P (zL1 , E | yLs

1 , S) = P (zL1 , E | yLs

1 ) because, given the output sequence, the error pattern and the

observed sequence are jointly independent of the model S. This follows from our reasonable assumption that sensor

failures are independent of the underlying system.

In the next section we develop a concise representation of all possible output sequences that may produce the

observed sequence along with the probabilities P (zL1 , E | yLs

1 ) for a given zL1 .

5Other criteria (e.g. Neyman-Pearson) can also be used but are not discussed here due to space limitations.

IV. OBSERVATION FSM

Due to sensor failures, the set of possible output sequences that match a given observed sequence may have

infinite cardinality (as shown in our example at the end of Section III.B). In this section we present a compact

way to represent the infinite set of possible output sequences, along with the probability of each possible sequence

leading to the observed sequence. In the next section, we explain how to use this representation to efficiently check

which of the possible output sequences are consistent with the underlying FSM model (S1 or S2).

We will represent the set of all possible output sequences yLs1 that correspond to the observed sequence zL

1

by the allowable behavior of an FSM that we call the observation FSM. More specifically, the observation FSM,

denoted by So = (Qo, Xo, Yo, !o, "o, qo0), has L + 2 states, starts from initial state qo0, and transitions to a new

state every time an observation is made. Notice that, by construction, the observation machine produces no outputs

(Yo = () and "o is a mapping that maps to the empty output. The set Qo = {qo0, qo1, ..., qoL+1} ' {0, 1, ..., L+1}

represents the set of states with qo0 ' 0 being the initial state. The set of inputs Xo is the union of the set of

outputs Y of the system under diagnosis, the empty label $, and the failures F , i.e., Xo = Y & {$}& F . The state

transition function !o is defined by the following three steps:

(i) Starting from state 0, So transitions to a new state every time an observation occurs, i.e.,

!o(m, z[m + 1]) = m + 1, m = 0, 1, ..., L" 1. (10)

(ii) From state L, So transitions to state L + 1 under input $, i.e., !o(L, $) = L + 1 and there is a self-loop under

input $ at the last state, i.e., !o(L + 1, $) = L + 1. (This (L + 1)st state of the observation machine can be thought

of as its only accepting state.)

(iii) We account for sensor failures by adding transitions that correspond to errors as follows:

• For each state m $ Qo\L + 1, add a self-transition under input d! for all outputs of S that are allowed to

be deleted (i.e., for all d! $ D). In other words, for each state m $ Qo (except L + 1), let !o(m, d!) = m,

)d! $ D.

• For each state m $ Qo\L + 1 with a valid input z[m + 1] = # $ Y (so that !o(m, z[m + 1]) = m + 1), if #

is allowed to be inserted (i.e., if i! $ I with out(i!) = # = z[m + 1]), add a transition under insertion i! so

that !o(m, i!) = m + 1.

If there were no sensor failures then FSM So would be constructed following only the first two steps of the

previous procedure. In order to account for sensor failures, we introduce the third step and, as a result, the observation

FSM So may have additional one-step transitions (due to insertions) and self-loops (due to deletions). From the

structure of the observation FSM So we can construct the (deterministic) state transition matrix that corresponds

to each input of So, i.e., Ao,! for all # $ Y , Ao," for the empty label, Ao,d! for all d! $ D, and Ao,i! for all

i! $ I . (The observation machine for the observed sequence in Example 1 can be seen in Figure 4; this figure is

discussed in detail in the example that follows.) Equivalently, the transition function !o is defined for each input

#$ $ Xo as follows

For m = 0, 1, ..., L " 1, !o(m, #$) =

)******+

******,

m + 1, if #$ = z[m + 1],

m, if #$ $ D,

m + 1, if #$ $ I and out(#$) = z[m + 1],

undefined, otherwise.

!o(L, #$) = L, if #$ $ D,

!o(L, $) = L + 1,

!o(L + 1, $) = L + 1.

(11)

As mentioned earlier, the observation FSM So captures all possible output sequences that may get corrupted to

the observed sequence, as well as the probability with which a possible output sequence results to the observed

sequence. We assume that sensor failures at a particular observation step occur independently from other observation

steps. This implies that the probability of sensor failures depends on the particular state of So (which is equivalent

to an observation step).

Next, we explain how we assign probabilities to the transitions of the observation FSM So, which, in effect,

results in a Markov chain that captures the probability P (zL1 , E | yLs

1 ) where E denotes a given error pattern and

yLs1 denotes the matching output sequence. Since deletions can occur at any step, state m of So (at observation

step m) has a self-loop with probability of occurrence pd! [m] for each # such that d! $ D. An insertion of # may

only occur at observation steps where # is actually observed, i.e., when a forward transition of So has input #.

In such case, i! $ I is assigned probability pi! [m], where m corresponds to the observation step at which # may

have been deleted (this implies that z[m+1] = #). Fault-free (normal) transitions are assigned probabilities so that

from each state of So the sum of the probabilities of transitions leaving that state is equal to one.

Example 1 (continued): In this example, we assume that the observed sequence is z71 =< a b c b c a c > and

we construct the observation FSM So shown in Figure 4. In fact, we can also use a regular expression [1] to capture

the set of all possible output sequences which may have resulted to z71 as < b%(a & $) b% b b% c b% b b% c b% (a &

$) b% c b% >, where b% denotes the Kleene closure of b. For the purposes of this example, we assume that after we

have observed z51 =< a b c b c > the sensors become more susceptible to noise; thus, we assign probability pdb [m]

for deletion db and probability pia [m] for insertion ia as

pdb [m] =

)+

,pdb , for m = 0, 1, 2, 3, 4

p$db, for m = 5, 6, 7

pia [m] =

)+

,pia , for m = 1, 2, 3, 4

p$ia, for m = 5, 6, 7

where p$db> pdb and p$ia

> pia . The resulting Markov chain corresponding to FSM So is shown in Figure 5 and

2 3 4 5 6 7 810

db db db db db db db db

a, ia b c b c a, ia c !

!

Fig. 4. State transition diagram of the observation FSM So in Example 1.

2 3 4 5 6 7 810

1pdb pdb pdb pdb pdb p!dbp!db

p!db

1 ! pdb 1 ! pdb 1 ! pdb 1 ! pdb 1 ! pdb 1 ! p!db1 ! p!db

1 ! p!db

Fig. 5. Markov chain corresponding to FSM So in Example 1.

its state transition matrix is given by

Ao =

#

$$$$$$$$$$$$$$$$$$$$$%

pdb 0 0 0 0 0 0 0 0

1 " pdb pdb 0 0 0 0 0 0 0

0 1 " pdb pdb 0 0 0 0 0 0

0 0 1 " pdb pdb 0 0 0 0 0

0 0 0 1 " pdb pdb 0 0 0 0

0 0 0 0 1 " pdb p$db0 0 0

0 0 0 0 0 1 " p$dbp$db

0 0

0 0 0 0 0 0 1 " p$dbp$db

0

0 0 0 0 0 0 0 1 " p$db1

&

'''''''''''''''''''''(

.

!

In the previous example, if we follow the trajectory db db a db b c b c ia c db $ we discover a “matching”

output sequence yLs1 =< a b c b c c > that is associated with the error pattern E={deletion of b, deletion of b, no

error, deletion of b, no error, no error, no error, no error, insertion of a, no error, deletion of b}. Note that for this

particular E and yLs1 the probability P (z7

1 , E | yLs1 ) can be read by simply taking the product of the corresponding

probabilities in the trajectory db db a db b c b c ia c db $.

Notice that the matrix Ao of the example does not include the probability of insertion; for instance, Ao(1, 0) =

pia + (1 " pdb " pia) = 1 " pdb and Ao(6, 5) = p$ia+ (1 " p$db

" p$ia) = 1 " p$db

. In fact, matrix Ao does not

contain all the information that is necessary to assign probabilities P (z71 , E | yLs

1 ). To ensure that we have this

information, we next construct the (probabilistic) state transition matrices capturing the probabilities of transitions

corresponding to each sensor failure separately.

The (probabilistic) state transition matrix Ao of the observation FSM So can be written as

Ao ="

!!#Xo

Ao,!!

="

!#Y

Ao,! + Ao," +"

d!#D

Ao,d! +"

i!#I

Ao,i!

="

!#Y

Ao,! + Ao," + Ao,e,

(12)

where Ao,! denotes the (probabilistic) state transition matrix corresponding to input # $ Y (Y is the set of outputs

of the FSM under diagnosis), Ao," denotes the (probabilistic) state transition matrix corresponding to the empty

string $, Ao,d! for d! $ D denotes the (probabilistic) state transition matrix corresponding to deletion d! $ D,

Ao,i! for i! $ I denotes the (probabilistic) state transition matrix corresponding to insertion i! $ I , and

Ao,e '"

d!#D

Ao,d! +"

i!#I

Ao,i! (13)

is the state transition matrix corresponding to sensor failures. Since sensor failures are assumed to be independent

between observation steps, we can compute the (probabilistic) state transition matrices corresponding to deletions

and insertions by setting

Ao,d!(m, m) = pd! [m] ·Ao,d!(m, m) = pd! [m], m = 0, 1, ..., L, d! $ D,

Ao,i! (m + 1, m) = pi! [m + 1] ·Ao,!(m + 1, m), m = 0, 1, ..., L" 1, i! $ I,(14)

and keeping all other entries of Ao,d! and Ao,i! zero. Note that transitions captured by Ao,d! correspond to

self-loops in the observation machine and transitions captured by Ao,i! correspond to one-step forward arcs. The

(probabilistic) state transition matrix corresponding to the empty label $ has only the following nonzero elements

Ao,"(L + 1, L) = 1 "Ao,e(L + 1, L) "Ao,e(L, L),

Ao,"(L + 1, L + 1) = 1,(15)

which ensures that the last state of So (state L+1) is an accepting state (an absorption state in the Markov chain).

To compute the (probabilistic) state transition matrix Ao,n corresponding to normal transitions, we make sure that

the only nonzero elements of Ao,n correspond to one-step forward arcs. Hence, Ao,n satisfies the following equation

Ao,n(m + 1, m) = 1 "|Q|"1"

j=0

Ao,e(j,m), m = 0, 1, ..., L " 1. (16)

Given the matrix Ao,n that represents the probabilities of normal transitions, we can use Ao,! (i.e., the deterministic

state transition matrix corresponding to input #) to derive the matrix that captures the probabilities of normal

transitions for each normal input # $ Y by performing element-wise multiplication as follows:

Ao,!(k, j) = Ao,n(k, j) ·Ao,!(k, j), )j, k = 0, 1, ..., L. (17)

The reasons for establishing the above notation for the transition matrices corresponding to each input of the

observation machine become more obvious with the construction of FSM H which is presented in the next section.

V. LIKELIHOOD CALCULATION UNDER SENSOR FAILURES

We ignore probabilities for now and focus on identifying all possible state sequences of S that can produce a

possible output sequence (that can correspond to the observed sequence, as captured by So). To do that, we use

FSMs S and So to construct a new FSM H defined as H = (QH , XH , !H , qH0), where QH contains subsets of

states in Qo # Q, XH = Xo (recall that Xo = Y & {$} & F ), and qH0 = (qo0, q0). FSM H has no outputs and it

is generally non-deterministic with !H defined for each input #$ $ XH as follows:

!H((m, j), #$) =

)*********+

*********,

-

&xi#X s.t. #(j,xi)=!!

(!o(m, #$), !(j, xi)), if #$ $ Y,

(!o(m, #$), j), if #$ $ I & {$},-

&xi#X s.t. #(j,xi)=out(!!)

(m, !(j, xi)), if #$ $ D,

undefined, otherwise.

(18)

Notice that FSM H , obtained in the above construction based on FSMs S and So, is neither the standard product

nor the parallel composition of S and So. The states of H can be described in terms of pairs of the form (m, j),

where m denotes the state of the observation FSM So and j denotes the state of S. The union of states is used

in Equation 16 because FSM H is a non-deterministic machine and its current state may consist of a set of

states. For example, at observation step m and state j of FSM S, if #$ $ Y and if both x1 and x2 satisfy the

constraint under the union (i.e., "(j, x1) = "(j, x2) = #$), then the set of possible states for FSM H is given by

{(!o(m, #$), !(j, x1)), (!o(m, #$), !(j, x2))}. The state transition diagram of FSM H has a special structure that

becomes more apparent if we draw it so that states of the form (m, 0), (m, 1), ..., (m, |Q|" 1), for m $ Qo, are in

a column and states of the form (0, j), (1, j), ..., (|Qo| " 1, j), for j $ Q, are in a row. From now on, since each

forward transition corresponds to a new observation, we will call each column of the transition diagram a stage

to reflect the notion of the observation step (see Figures 6 and 7 which are also discussed in more detail in an

example that follows).

After obtaining the set of sequences that are consistent with both So and S, the next step is to assign probabilities

to the transitions of H and hence construct a probabilistic FSM H with (probabilistic) state transition matrix AH .

The state transition matrix AH has dimension (L + 2)|Q| # (L + 2)|Q| and can be obtained as the sum AH ="

!!#XH

AH!! , where each matrix AH!! captures the probabilities of transitions associated with a particular input

#$ $ XH = Y &{$}&I&D. If we arrange the states of H as (0, 0), (0, 1), ..., (0, |Q|"1), (1, 0), (1, 1), ..., (1, |Q|"

1), . . . , (L + 1, 0), (L + 1, 1), ..., (L + 1, |Q| " 1), the probabilities of the transitions associated with # $ can be

obtained via the following state transition matrices:

AH,!! =

)***+

***,

Ao,!! *A!! if #$ $ Y

Ao,!! * I if #$ $ {$} & I

Ao,!! *Aout(!!) if #$ $ D,

(19)

where A! captures the probabilities of transitions in S that output #. In the above, A*B represents the Kronecker

0,0

0,1

0,2

0,3

1,0

1,1

1,2

1,3

2,0

2,1

2,2

2,3

3,0

3,1

3,2

3,3

4,0

4,1

4,2

4,3

5,0

5,1

5,2

5,3

6,0

6,1

6,2

6,3

7,0

7,1

7,2

7,3

8,0

8,1

8,2

8,3

db

a

ia

Fig. 6. State transition diagram of H1 in Example 1.

product6 of matrices A and B and its use is justified by our choice of ordering for states in H and by the fact that,

given the output sequence, the error pattern and the observed sequence are statistically independent of the inputs

of S. For instance, considering that deletions may occur only when the system under diagnosis is in a state which

can produce at least one of the outputs that can be deleted, we take the Kronecker product of Ao,d! and A! , which

results in a matrix that captures the probability of each transition in H that is associated with a d! and # (this

probability is obtained by multiplying the corresponding probability of a deletion in the observation machine So

and an input that outputs # in the underlying machine S). Note that since insertions may occur at any time, we

take the Kronecker product of Ao,i! with the identity matrix I. The overall probabilistic state transition matrix AH

is given by

AH ="

!#Y

Ao,! *A! + Ao," * I +"

d!#D

Ao,d! *A! +"

i!#I

Ao,i! * I. (20)

Note that for #$ = d! $ D, we sometimes write Ao,d! *A! = Ao,!! *Aout(!!). In particular, if we consider the

transition from state (i, j) to state (k, l), the resulting state corresponds to the (k|Q|+ l, i|Q|+ j)th entry in matrix

AH .

Example 1 (continued): FSMs H1 and H2 have 4 · 9 = 36 states each, which we name as pairs (m, j) for

m $ Qo and j $ Q. The structures of FSMs H1 and H2 are indicated in Figures 6 and 7 respectively (the dashed

6The Kronecker product [39] of an N1 ! M1 matrix A with an N2 ! M2 matrix B is denoted by A"B and is defined as the partitioned

matrix

A" B =

#

$$$$%

!00B !01B ... !0(M1!1)B

!10B !11B ... !1(M1!1)B

......

. . ....

!(N1!1)0B !(N1!1)1B ... !(N1!1)(M1!1)B

&

''''(,

where !jk is the entry at the jth row, kth column position of matrix A. Note that A " B is of dimension N1N2 ! M1M2.

0,0

0,1

0,2

0,3

1,0

1,1

1,2

1,3

2,0

2,1

2,2

2,3

3,0

3,1

3,2

3,3

4,0

4,1

4,2

4,3

5,0

5,1

5,2

5,

6,0

6,1

6,2

6,3

7,0

7,1

7,2

7,3

8,0

8,1

8,2

8,3

Fig. 7. State transition diagram of H2 in Example 1.

arcs represent transitions due to sensor failures — the majority of the inputs are not indicated in the figures for

clarity). Note that all transitions in FSMs H1 and H2 follow either a forward direction (that spans one column) or

a vertical direction. For example, FSM H1 (Figure 6) takes a transition from state (0,2) to state (1,0) under input

a (i.e., S1 moves from state 2 to state 0 and produces a which is observed). The transition from state (0,2) to

state (1,2) under input ia represents the insertion of a. In this case, the diagnoser observed a although FSM S1

did not move from state 2 (and it did not produce a). Finally, the transition from state (0,2) to (0,3) under input

db represents the deletion of b: FSM S1 took a transition from state 2 to state 3 and produced b, however, the

diagnoser did not observe b because it was deleted by the sensors.

The (probabilistic) state transition matrices for FSMs H1 and H2 are given by

AH1 = Ao,a *A1,a + Ao,b *A1,b + Ao,c *A1,c + Ao," * I + Ao,db *A1,b + Ao,ia * I,

AH2 = Ao,a *A2,a + Ao,b *A2,b + Ao,c *A2,c + Ao," * I + Ao,db *A2,b + Ao,ia * I.

For example, the transition from state (0,2) to state (0,3) under input db in FSM H1 (Figure 6) has probability of

occurring pdb · A1,b(3, 2), which is equal to the probability that FSM S1 took a transition from state 2 to state 3

producing b and then output b was deleted by the sensor (i.e., error db occurred). Note that this is exactly the entry

(3, 2) of Ao,db *A1,b. The transition from state (0,2) to state (1,2) under input ia in FSM H1 has probability of

occurring pia , which is equal to the probability that FSM S1 did not take any transition but a was inserted by the

sensor (i.e., error ia occurred). This is exactly the entry (6, 2) of Ao,ia * I. Finally, the normal transition under

input a from state (0,2) to state (1,0) is assigned probability (1 " pdb " pia) · A1,a(0, 2), which is equal to the

probability that S1 took a transition from state 2 to state 0 (producing output a) and no sensor failure occurred.

This corresponds to entry (4, 2) of Ao,a *A1,a. !

The behavior of FSM H captures behavior that is consistent with the system under diagnosis and is also a prefix

of any of the possible output sequences. Note that the accepting states of H are of the form (L + 1, j), j $ Q,

and capture the behavior that is consistent with both the system under diagnosis and any of the possible output

sequences. The (probabilistic) state transition matrix AH will most likely be such that the entries of each column

do not sum to one; however, we can easily build a proper Markov chain H $ by modifying H , so that the assigned

transition probabilities of each column sum to one. More specifically, we can append to H a new state qin which

represents the inconsistent state (i.e., if H is in state qin, then the observations are not consistent with the system

under diagnosis and the sensor failures that are allowed). To achieve this, we can add a transition from each state

of FSM H to the inconsistent state qin, with probability such that the sum of the transition probabilities leaving

that particular state is equal to one; we also add a self-loop at state qin with probability one.

The resulting Markov chain H $ has |QH! | = (L + 2) · |Q|+ 1 states. The only self-loops in H $ with probability

one are those in the consistent states (of the form (L + 1, j)) and in the inconsistent state (qin). In fact, due to the

particular structure of H $ (and given that there is a nonzero probability to leave the vertical loop at each stage), the

consistent and inconsistent states are the only absorbing states, while the rest of the states are transient. Therefore,

when the absorbing Markov chain H $ reaches its stationary distribution, these absorbing states are the only states

with nonzero probabilities (summing up to one). We are interested in the stationary distribution of H $ so that we

can account for output sequences yLs1 of any length that correspond to the observed sequence zL

1 . (Recall that

without sensor failures we have Ls = L.)

More formally, we arrange the states of H $ in the order (0, 0), (0, 1), ..., (0, |Q|" 1), (1, 0), (1, 1), ..., (1, |Q|"

1), ..., (L + 1, 0), (L + 1, 1), ..., (L + 1, |Q| " 1), qin. Let !H! [0] be a vector with |QH! | entries, each of which

represents the initial probability of each state of H $. We are interested in the stationary probability distribution of

H $ captured by

!H! = limn'(

!H! [n] = limn'(

AnH! · !H! [0], (21)

where the state transition matrix AH! of H $ is in its canonical form given by

AH! =

#

% T 0

R I

&

( . (22)

Recall that the state transition matrix AH (without state qin) has dimension (L + 2)|Q| # (L + 2)|Q|. Matrix T

consists of the first (L+1) · |Q| rows of AH and of the first (L+1) · |Q| columns of AH and captures the behavior

of the transient states of H $; the (|Q|+1)# (L+1) · |Q| matrix R captures the transitions from the transient states

to the absorbing states; 0 is a (L + 1) · |Q|# (|Q|+ 1) matrix with all zero entries; and I is the identity matrix of

dimension (|Q| + 1)# (|Q|+ 1). Note that, since H $ is an absorbing Markov chain, the limit limn'( AnH! exists

and it is given by

limn'(

AnH! =

#

% 0 0

(I " T )"1R I

&

( , (23)

where (I " T )"1 is called the fundamental matrix [15].

The only nonzero entries of !H! are those that correspond to the consistent states and the inconsistent state, i.e.,

the absorbing states. In fact, the probability that H $ ends up in a consistent state is equal to the complement of the

probability that H $ ends up in the inconsistent state which is also equal to the probability of the observed sequence

zL1 given the FSM model S, i.e.,

P (zL1 | S) =

|QH |"1"

j=L·|Q|

!H! (j) = 1 " !H!(|QH! |). (24)

Proposition 1: The likelihood of the observed sequence zL1 given model S in the presence of sensor failures (as

defined earlier) is given by

P (zL1 | S) = 1 " !H!(|QH! |), (25)

where !H! is the stationary distribution of the absorbing Markov chain H $ (which can be constructed from model

S and the observation machine So).

To gain some more intuition, let us consider the case of reliable sensors, where the arcs in H capture only

normal one-step forward transitions (except for the self-loops of the accepting states). Let !H! [0] be a vector with

|QH! | entries each of which represents the initial probability of each state of H $. Let !H! [L+1] represent the state

probabilities of H $ after L + 1 steps. Then,

!H! [L + 1] = AL+1H! · !H! [0], (26)

where AL+1H! denotes the matrix AH! raised to the (L+1)st power. The probability of the observed sequence given

FSM S would be given by the sum of the entries of the state probability vector at step L + 1 corresponding to the

accepting (consistent) states, i.e.,

P (zL1 | S) =

(L+2)·|Q|"

j=(L+1)·|Q|+1

!H! [L + 1](j)

=(L+2)·|Q|"

j=(L+1)·|Q|+1

(AL+1H! · !H! [0])(j).

(27)

Note that, in this case, AnH! ·!H! [0] = AL+1

H! ·!H! [0] for n + L+1 due to the transient nature of the first L stages

and the self-loops that occur with probability one in the consistent and inconsistent states.

VI. RECURSIVE LIKELIHOOD CALCULATION UNDER SENSOR FAILURES

In this section we exploit the structure of matrix AH! (which captures the transition probabilities of H $) to

perform the posterior probability calculations in an efficient manner. We first define the following submatrices

which will be used to express AH! .

• Matrices Bm,m+1, m = 0, 1, ..., L, capture the transitions from any state of H $ at stage m to any state of H $

at stage m + 1. They can be obtained from AH! as Bm,m+1(k, j) = AH!((m + 1) · |Q| + k, m · |Q| + j),

where k, j = 0, 1, ..., |Q|" 1.

• Matrices Bm, m = 0, 1, . . . , L, capture the vertical transitions (i.e., transitions from stage m to the same stage)

and account for deletion errors. They can be obtained from AH! as Bm(k, j) = AH! (m · |Q|+ k, m · |Q|+ j),

where k, j = 0, 1, ..., |Q|" 1. (Note that if deletions occur at each observation step with the same probability,

then Bm = B, m = 0, 1, ..., L, for some constant matrix B.)

• CT is a row vector with entries CT (j) = 1 "!|QH |"1

k=0 AH(k, j), for j = 0, 1, ..., |QH|" 1, i.e., CT ensures

that the sum of each column of AH! is equal to 1.

We should note here that an alternative way to compute the block matrices Bm,m+1 and Bm directly, without

the help of H $, is to use the following equations:

Bm(k, j) ="

"d!! #D s. t."(j,out(d!! ))=k

(pd!! [m] · A!! (k, j)),

Bm,m+1(k, j) = (1 ""

"d!! #D s. t.

"(j,out(d!! ))$=%

pd!! [m]) · Az[m+1](k, j),(28)

where k, j = 0, 1, ..., |Q|" 1 and pd!! [m], A! were defined earlier. Notice here that matrix Bm,m+1 captures the

forward transitions, from one stage to the next, which can be either due to insertions or due to normal transitions

(without sensor failures).

With the above notation at hand, we have the following block decomposition for matrix AH! :

AH! =

#

$$$$$$$$$$$$$$$$$$%

B0 0 0 ... 0 0 0 0

B0,1 B1 0 ... 0 0 0 0

0 B1,2 B2 ... 0 0 0 0...

...... ...

......

......

0 0 0 ... BL"1 0 0 0

0 0 0 ... BL"1,L BL 0 0

0 0 0 ... 0 I " BL I 0

CT 0 1

&

''''''''''''''''''(

. (29)

Notice that the matrix AH! is in its canonical form (see Equation 22) with submatrices T and R given by

T =

#

$$$$$$$$$$$$%

B0 0 0 ... 0 0

B0,1 B1 0 ... 0 0

0 B1,2 B2 ... 0 0...

...... ...

......

0 0 0 ... BL"1 0

0 0 0 ... BL"1,L BL

&

''''''''''''(

,

R =

#

% 0 ... 0 I " BL

CT

&

( .

(30)

The only nonzero entries in the initial probability distribution vector !H! [0] are its first |Q| entries, i.e., !H [0] =

("T [0] 0 ... 0)T , where "[0] denotes the initial probability distribution of S (i.e., it is a |Q|-dimensional vector, whose

jth entry denotes the probability that S is initially in state j). Recall that !H! denotes the stationary probability

distribution vector of H $ and has nonzero entries only in the absorbing states, i.e., its last |Q|+1 states. Hence, for

the observed sequence zL1 , we can express !H! as !H! = (0 ... 0 "T [L + 1] pin[L + 1])T , where vector "[L + 1]

captures the joint probabilities of the consistent states and the observed sequence zL1 , and scalar pin[L+1] denotes

the joint probability of the inconsistent state and the observed sequence zL1 . Notice that we get joint probabilities of

state occupancies and the observed sequence because FSM H was constructed for the particular observed sequence.

The following equations hold:

!H! = limn'(

AnH! · !H! [0]

(0 ... 0 "T [L + 1] pin[L + 1])T = limn'(

AnH! · ("T [0] 0 ... 0)T

"T [L + 1] = limn'(

AnH!(L + 1, 0) · "[0]

(31)

(here, we stretch notation a bit so that AH! (L + 1, 0) denotes the (L + 1, 0)th block of matrix AH! as opposed

to its (L + 1, 0)th entry). Therefore, in order to calculate the probability of the consistent states (jointly with the

observed sequence zL1 ), we only need the initial probability distribution of states of S and the (L + 1, 0)th block

of the matrix limn'( AnH! .

Next, we argue that by using induction on the power n, we can compute the limn'( AnH(L + 1, 0) with much

lower complexity than the standard computation in Equation 23. Equation 32 shows the state transition matrix AH

for the case when L = 2. We suppose that AkH is given by Equation 33 below where the indices j1, j2, j3, j4 in

the summations are nonnegative integers; we can prove that A(k+1)H satisfies Equation 34 below by performing the

multiplication A(k+1)H = AH Ak

H . For example, the calculation for the (1, 0)th block of A(k+1)H is performed in

Equation 35 below. Also note that when k = 1, the base case for AH is clearly satisfied.

AH =

#

$$$%

B0 0 0

B0,1 B1 0

0 B1,2 B2

&

'''((32)

AkH =

#

$$$$$$$%

Bk0 0 0

"

j1+j2=k&1

Bj21 B0,1B

j10 Bk

1 0

"

j1+j2+j3=k&2

Bj32 B1,2B

j21 B0,1B

j10

"

j1+j2=k&1

Bj22 B1,2B

j11 Bk

2

&

'''''''(

(33)

Ak+1H =

#

$$$$$$$%

Bk+10 0 0

"

j1+j2=k

Bj21 B0,1B

j10 Bk+1

1 0

"

j1+j2+j3=k&1

Bj32 B1,2B

j21 B0,1B

j10

"

j1+j2=k

Bj22 B1,2B

j11 Bk+1

2

&

'''''''(

(34)

A(k+1)H (1, 0) =

.

/"

j1+j2=k"1

Bj21 B0,1B

j10

0

1 B0 + Bk1B0,1

="

j1+j2=k"1

Bj21 B0,1B

j1+10 + Bk

1B0,1

="

j1+j2=k

Bj21 B0,1B

j10 .

(35)

Using this approach we can prove by induction that the expression for AnH is as in Equation 34.

As explained earlier, we are interested in the state probabilities of the consistent states (jointly with the observed

sequence zL1 ), which will be given by the bottom |Q| entries of the vector !H , i.e., the entries of the vector

"[3] = limn'(

AnH(3, 0) ·"[0] for the case when L = 2. If we manipulate further the (3, 0)th block of lim

m'(Am

H , we

have

limm'(

AmH(3, 0) = lim

m'(

"

j1+j2+j3+j4=m&3

Ij4 I Bj32 B1,2B

j21 B0,1B

j10

= (I + B2 + B22 + ...)B1,2(I + B1 + B2

1 + ...)B0,1(I + B0 + B20 + ...)

=

.

/("

j=0

Bj2

0

1B1,2

.

/("

j=0

Bj1

0

1B0,1

.

/("

j=0

Bj0

0

1

(36)

From the above equation, we get

"[3] = (I " B2)"1B1,2(I " B1)"1B0,1(I " B0)"1"[0]. (37)

Note that Equation 37 can also be obtained by directly inverting matrix (I " AH) in Equation 21. To simplify

notation let us define B$m,m+1 = (I" Bm+1)"1Bm,m+1. Hence, "[3] = B$

1,2B$0,1(I " B0)"1"[0].

Generalizing the above result for any number of observations L, the vector that describes the probabilities of the

consistent states (jointly with the observed sequence zL1 ) satisfies

"[L + 1] =

2L"13

i=0

B$m,m+1

4(I " B0)"1 "[0], (38)

where B$m,m+1 = (I " Bm+1)"1Bm,m+1, m = 0, 1, ..., L.

By inspection of Equation 38 we notice that the computation of "[L+1] can be performed recursively as follows:

"[1] = B$0,1(I " B0)"1 "[0],

"[m + 1] = B$m,m+1 "[m], m = 1, 2, ..., L,

(39)

where "[m + 1] represents the joint probability of consistent states and the observed sequence zm1 . The probability

that the observed sequence zL1 was produced by the particular FSM S is equal to the sum of the elements of the

state probability distribution vector "[L + 1], i.e., P (zL1 | S) =

!|Q|"1j=0 "[L + 1](j).

The probability that the observations were produced by the particular FSM S is equal to the sum of the elements

of the state probability distribution vector "[L + 1], i.e.,

P (zL1 | S) =

|Q|"1"

j=0

"[L + 1](j). (40)

Proposition 2: The likelihood of the observed sequence zL1 given model S and allowing for sensor failures (as

defined earlier) is given by

P (zL1 | S) =

|Q|"1"

j=0

"[L + 1](j), (41)

where "[m] is calculated recursively via the equations:

"[1] = B$0,1(I " B0)"1 "[0],

"[m + 1] = B$m,m+1 "[m], m = 1, 2, ..., L,

(42)

with "[0] representing the initial probability distribution vector of S and matrices B representing the blocks of

matrix AH as defined in Equation 28.

To gain intuition regarding the recursion, let us consider for now the case of reliable sensors. This case corresponds

to matrices Bm in AH! being equal to zero, which means that there are no vertical transitions (transitions within the

same stage) in the trellis diagram. The recursion (Equation 42) becomes "[m+1] = Bm,m+1· "[m], m = 0, 1, ..., L.

In fact, for the case of reliable sensors we can replace Bm,m+1 with Ay[m+1] and perform the following recursion

"[m + 1] = Bm,m+1 · "[m], m = 0, 1, ..., L. (43)

Intuitively, every time we get a new observation we update the current probability vector by multiplying it with

the state transition matrix of S that corresponds to the new observation. This corresponds to the standard version

of the forward algorithm.

With the above intuition at hand, we now return to the case of sensor failures. Here, we also need to take

into consideration the fact that any number of vertical transitions may occur. Therefore, every time we get a new

observation z[m+1], we multiply the current probability vector with the state transition matrix of S that corresponds

to the new observation (as before) and also with (I " Bm+1)"1 =!(

j=0 Bjm+1, thereby taking into account the

vertical transitions (possibly an infinite number of them) that can take place at stage m + 1.

The matrices Bm,m+1 have dimension |Q|# |Q|, while the matrix AH has dimension |QH |# |QH | with |QH | =

(L + 2) · |Q|. If we calculate !H without taking advantage of the structure of AH , the computational complexity

is proportional to O(((L + 2) · |Q|)3) = O(L3 · |Q|3). If we use the recursive approach instead, the computational

complexity reduces significantly to O((L+2)·(|Q|2+|Q|3)) = O(L·|Q|3) (each stage requires the inversion of a new

|Q|# |Q| matrix which has complexity O(|Q|3) and dominates the computational complexity associated with that

particular stage). In fact, if sensor failure probabilities remain invariant at each stage, then matrix Bm at stage m only

needs to be inverted once and the complexity of the recursive approach is O((L+2)·|Q|2+|Q|3) = O(L·|Q|2+|Q|3).

In addition to complexity gains, the recursive nature of the calculations allows us to monitor the system under

diagnosis online and calculate the probability of the observed sequence at each observation step by first updating

the state probabilities and then summing them up.

Example 1 (continued): In our example, if we assume that there are no sensor failures, the likelihoods for each

model are given by P (z71 | S1) = 5.7156 # 10"4 and P (z7

1 | S2) = 12.5743 # 10"4. For instance, if the priors

are P1 = P2 = 1/2, then we compute the posterior of each model given the observations as P (S1 | z71) = 0.3125

and P (S2 | z71) = 0.6875. Clearly, it is most likely that the machine that produced the observed sequence z7

1 =<

a b c b c a c > conforms to FSM model S2, i.e.,

P (S1 | z71) < P (S2 | z7

1).

If sensor failures occur with probabilities pia = 0.1, p$ia= 2 · pia , pdb = 0.05, and p$db

= 2 · pdb (as described in

this section), we can follow the procedure described earlier in this section to construct Bm, m = 0, 1, . . . , 7, and

Bm,m+1, m = 0, 1, . . . , 6, for each of the two machines. For example, the state transition matrices that capture the

vertical transitions at the first five stages under model S1 are given by

B0 = B1 = B2 = B3 = B4 =

#

$$$$$$%

0 0 0 0

0 0 0 pdb

0 pdb 0 0

0 0 pdb 0

&

''''''(.

The probability that the given observed sequence was produced by FSM S1 and the probability that it was

produced by FSM S2 are calculated to be

P (z71 =< a b c b c a c >| S1) = 5.0289# 10"4,

P (z71 =< a b c b c a c >| S2) = 12.7681# 10"4.

With priors P1 = P2 = 1/2, we compute the probabilities as P (S1 | z71) = 0.2826 and P (S2 | z7

1) = 0.7174.

Hence, it is still more probable that the machine that produced the observed sequence < a b c b c a c > conforms

to FSM model S2, i.e.,

P (S1 | z71) < P (S2 | z7

1);

we thus conclude that the underlying system is faulty. !

A. Numerical Stability

It is obvious from the recursive equation, that the entries of vector "[m] decrease with m. One way to avoid

numerical errors is to normalize the vector "[m] at every step so that its entries sum up to one. Since the likelihood

of the observations given a model depends on the sum of the entries of "[m] (before normalization), we need to

keep track of the normalization factor of each step. Moreover, numerical errors may arise because the likelihood

P (zL1 | S) for each model decreases as the number of observations grow. To fix that, we keep track of the negative

logarithm of the likelihood, i.e., the log likelihood which we denote by %% = " log P (zL1 | S). The following

algorithm introduces "̂[m] and shows how to compute the log likelihood of the observations given a model S.

Algorithm Input: Matrices Bm and Bm,m+1 for m = 0, 1, . . . , L (matrices correspond to the observed sequence

zL1 ) and initial probability distribution "[0].

1. Initialization. Let m = 0, z[m] = (,

compute B$0,1 = (I " B1)"1B0,1,

compute "̂[1] = B$0,1(I " B0)"1 "[0],

compute % =!|Q|"1

j=0 "̂[1](j),

compute "[1] = 1$ "̂[1],

compute %% = " log %.

2. For m = 1 : L, do

Consider the output z[m]

compute B$m,m+1 = (I " Bm+1)"1Bm,m+1,

compute "̂[m + 1] = B$m,m+1"[m],

compute l =!|Q|"1

j=0 "̂[m + 1](j),

compute "[m + 1] = 1$ "̂[m + 1],

compute %% = %% " log %.

end.

3. Set log P (zL1 | S) = %%. !

The overhead of the modifications introduced to avoid numerical errors is not significant. More specifically, at

each observation step we need to perform two additional operations as well as keep track of the log likelihood of

the observations so far given the model. Notice that the operation of inverting the matrices of the form (I"Bm) is

stable because such matrices are non-singular. Furthermore, these matrices have diagonal elements close (or equal)

to one and off-diagonal elements close (or equal) to zero. In fact, the smaller the probabilities of deletions are at

observation step m, the less likely it would be to run into numerical stability problems when inverting (I " Bm).

B. Transpositions

In addition to deletions and insertions, our approach can be modified to handle transpositions in a straightforward

manner. A transposition is denoted by t!j ,!k and represents the corruption of subsequence < #k #j > to < #j #k >.

We allow for errors to be overlapping, however, as we illustrate in Example 2 below, we require an output that is

involved in a transposition to not suffer simultaneously a deletion, an insertion or a different transposition.

Example 2: In this example we illustrate our assumption that there can be overlapping errors, however, each

output cannot be involved in two different transpositions. Suppose that the possible sensor failures are db, tab, tac and

we observe the sequence < a b c >. Then the set of possible output sequences is the following: < b% a b b% c b% >

, < b% b a b% c b% >. The observation FSM So for this example is shown in Figure 8. Notice that < b c a > is

not a possible output sequence because this would mean that two transpositions involving the same output a have

occurred, namely tab and tac. !

Due to space limitations, we do not explain in detail how to construct FSM H for the case of transpositions (the

entire scheme of construction of matrices H and H $ can be found in [40]). Since FSM H is constructed using FSMs

S and So (which may include transitions from state m to state m + 2), FSM H may have transitions that span two

0 1 2 3

tab

a b c

db db db db

Fig. 8. State transition diagram of So for observed sequence < a b c > in Example 2.

stages. Hence, the state transition matrix AH! includes submatrices of the form Bm,m+2, in addition to Bm,m+1

and Bm that were defined earlier. More specifically, matrices Bm,m+2, m = 0, 1, ..., L " 1, include transitions in

the state transition diagram that span two stages and account for transpositions. Using matrices B and row vector

CT , we can express AH! as

AH! =

#

$$$$$$$$$$$$$$$$$$$$$%

B0 0 0 0 ... 0 0 0 0 0 0

B0,1 B1 0 0 ... 0 0 0 0 0 0

B0,2 B1,2 B2 0 ... 0 0 0 0 0 0

0 B1,3 B2,3 B3 ... 0 0 0 0 0 0...

......

... ......

......

......

...

0 0 0 0 ... BL"3,L"1 BL"2,L"1 BL"1 0 0 0

0 0 0 0 ... 0 BL"2,L BL"1,L BL 0 0

0 0 0 0 ... 0 0 0 I " BL I 0

CT 0 1

&

'''''''''''''''''''''(

. (44)

Notice that matrix AH! remains lower triangular even with transpositions. As in the case of insertions and deletions,

we can find a closed form expression for AnH! due to the special structure of AH! . To simplify notation let us

define B$m"1,m+1 = (I"Bm+1)"1Bm"1,m+1. Then, we can follow a similar approach as before and compute the

probability distribution vector at time m + 1 recursively based on the vector at the previous two stages.

Proposition 3: The likelihood of the observed sequence zL1 given model S and allowing for sensor failures

(including transpositions) is given by

P (zL1 | S) =

|Q|"1"

j=0

"[L + 1](j), (45)

where "[0] is the initial probability distribution vector of S and "[m] is calculated recursively by the following

equations

"[1] = B$0,1(I " B0)"1 · "[0]

"[m + 1] = B$m,m+1 · "[m] + B$

m"1,m+1 · "[m " 1], m = 1, 2, ..., L(46)

with matrices B and B$ as defined earlier, i.e., B$m,m+1 = (I " Bm+1)"1Bm,m+1 and B$

m"1,m+1 = (I "

Bm+1)"1Bm"1,m+1.

Note that, if needed, we can apply the techniques in Section VI.A to avoid numerical errors in our computation.

C. Connections and Comparisons with Previous Work

Now that we have presented our recursive algorithm, we discuss how it differs from existing algorithms and how

it generalizes some of them. The techniques that we use relate to the evaluation problem in HMMs or the parsing

problem in probabilistic automata with vertical loops in the resulting trellis diagram. The forward algorithm is used

to evaluate the probability that a given set of observations is produced by a certain hidden Markov model (HMM).

To do that, the standard forward algorithm uses the HMM to build a trellis diagram based on the given sequence

of observations, and performs the likelihood calculation online. However, the standard forward algorithm cannot

handle the existence of vertical cycles in the trellis diagram. Ways around vertical cycles in the trellis diagram have

been suggested in speech recognition applications where HMMs are used to model speech patterns [16] – [19] and

may include null transitions (i.e., the HMM may move from the current state to the next state without producing

any output [17], [26]) as well as in the area of pattern recognition where one may have to deal null transitions

when solving the parsing problem for a given probabilistic finite state automaton [21].

While in most HMM formulations one deals with state observations, several authors have also studied the

evaluation problem in HMMs with transition observations, including null transitions (i.e., transitions with no outputs).

For instance, the authors of [17], [26], [27] develop HMM models that capture the generation of codewords in speech

recognition applications via observations that are associated with transitions rather than states. These HMMs also

include null transitions, i.e., transitions that change the state without producing outputs. The authors of [26] eliminate

loops in the resulting trellis diagram via an appropriate modification of the underlying HMM before constructing the

trellis diagram. In [21], an algorithm is presented to solve the parsing problem in pattern recognition applications

for the case where null transitions exist in a probabilistic finite-state automaton (PFSA) model (as pointed out in

[28], HMMs are equivalent to PFSAs with no final probabilities). The authors evaluate recursively the probability

that a sequence is produced by a "-PFSA (i.e., a PFSA that includes null transitions) and their approach can be

shown, after some manipulation, to be a special case of the algorithm we develop here. In particular, in contrast to

our algorithm, "-PFSA can not handle the case of time-varying sensor failures.

Also related to our likelihood computation algorithm is the well-known Viterbi algorithm [30], [31], which

solves the related problem of maximum-likelihood decoding of convolutional codes by choosing the most likely

state sequence based on a given sequence of observations. In fact, the Viterbi algorithm is a dynamic programming

algorithm which is amenable to online use and has found applications in various fields, e.g., in HMMs it is used

to find the most likely (hidden) state sequence corresponding to the observed output sequence [16]. Note that, in

contrast to the Viterbi algorithm, the maximum likelihood approach in this paper requires the total probability of

all paths (rather than the probability of the most likely path) which can be generated from the initial state(s) to the

final state(s). As a consequence of this requirement, the Viterbi algorithm or variations of it cannot obtain a solution

to the problem considered here. However, it is worth pointing out that the Viterbi algorithm has been frequently

suggested as a suboptimal alternative for likelihood evaluation in some applications [16]. Also note that a modified

Viterbi algorithm was proposed in [32] to identify the correct strings of data given an FSM representation of a

possibly erroneous output sequence; in [33] the same authors proposed a channel inversion algorithm for correcting

symbol sequences that have been corrupted by errors that can be described in terms of finite state automata (whose

transitions are weighted with costs representing the likelihood of different errors). The work in [34] proposes an

efficient implementation of the Viterbi algorithm to perform error-correcting parsing using an FSM and an error

model. The Viterbi algorithm can handle vertical cycles by unwrapping cycles so that each state on the cycle

is visited at most once (to avoid adding cost or decreasing the probability of the path — recall that the Viterbi

algorithm only searches for the most likely path).

Before closing this discussion, it is worth pointing out that the techniques used to solve our problem also relate

to maximum a posteriori (MAP) decoding of variable length codes (VLC). In MAP decoding of VLC, symbols that

are generated by a source may give rise to a different number of output bits and, given an observed bit sequence,

one has to recover the symbols that are transmitted according to the source codewords. The authors in [24], [25]

constructed a two-dimensional (symbol and bit) trellis diagram representation of the variable length coded data

and then applied the BCJR algorithm [29] to do either symbol or bit decoding. This setup resembles our setup

when only a finite number of sensor failures exist in the observed sequence (in such case, one can appropriately

enlarge the underlying model since, unlike our formulation, no vertical cycles can be present). More specifically,

if we assume a finite number of sensor failures, we could modify the models S1 and S2 to account for possible

sensor failures. However, since the probabilities of sensor failures can change with time (i.e., they can depend on

the observation step), the models for S1 and S2 would need to include these time variations. Even if we assume

that the probabilities of sensor failures for each observation step are known a priori (so that we are able to modify

a priori the models of S1 and S2 to account for sensor failures), the modified models will have an extended state

space. The next step would then involve the construction of the trellis diagram of these extended models and the

application of the standard forward algorithm. Our approach allows us to use a recursive algorithm and operate on

the original models S1 and S2, thereby dramatically reducing computational complexity and storage requirements.

To summarize, our approach is more general than the aforementioned approaches because it can handle different

kinds of loops at different stages of the trellis diagram (loops in our setup are not introduced by null transitions in

the underlying model but rather by errors in the observed sequence which can occur with time-varying probabilities).

Thus, the associated probabilities in the trellis diagram can be changing with time (which cannot be handled as

effectively using the techniques in [21] or in [26]). The problem is that the modification of the underlying model so

as to match the requirements of these earlier approaches results in a quite complex HMM (in which the evaluation

problem can still benefit from the techniques we propose here). Therefore, the attractive feature of the proposed

recursive algorithm for likelihood calculation is that it can handle time-varying and infinite number of sensor failures

(or, equivalently, vertical cycles in the trellis diagram) with reduced complexity.

D_CONNERRORRESET SETUP

NORMAL

ADM

Fig. 9. FSM model for part of the 802.2 protocol responsible for data link establishment, disconnection, and resetting states.

VII. A FAILURE DIAGNOSIS APPLICATION

As an example we consider the logical link control sublayer in the IEEE/Std 802.2 local area network protocol

[41], which is a peer protocol for use in a multi-station, multi-access environment. More specifically, we consider

the part of the protocol which is responsible for data link establishment, disconnection, and link resetting, and

which is modeled as the six-state FSM shown in Figure 9 with state transition functionality as defined in Table I.

The system to be diagnosed in our example is a system that supposedly complies to the 802.2 standard. The

model of the protocol and the model of a faulty (bogus) implementation of the protocol are known a priori. In order

to formulate our probabilistic framework we assume that the probability distribution of the inputs is known (e.g.,

these probabilities have been obtained from empirical measurements). In order to keep the example here simple

(and without any loss in generality), we associate three outputs to the inputs, i.e., the output set is Y = {a, b, c}

so that the resulting FSM denoted by Sff has six states and three outputs as shown in Figure 10. We assume that

input x1 appears with probability 13 and the remaining four inputs (namely x2, x3, x4, x5) appear with probability

16 each. Notice here that any input probability distribution could be used instead of the one we assume here.7

The particular fault we are interested in is a faulty transition from state “NORMAL” to state “ERROR” instead

of state “D CONN” under the transition with output a; this could be, for example, the result of a hardware fault or

a design error or a software bug. This, together with the nominal (fault-free) model description in Figure 9, fully

describe the faulty model denoted by Sf . Our goal is to determine whether the underlying FSM is executing Sff

or Sf when the observed sequence is z61 =< a b a b a c >. The additional challenge is that the output sequence

can be corrupted due to sensor failures. For this example, we consider that a can be deleted or inserted, b can be

inserted, and the sequence < c d > can be transposed, i.e, the failure set is defined as F = {da, ia, ib, tcd}. The

7We can always apply our algorithm to classify between two HMMs (as opposed to FSMs with i.i.d. inputs); the assumption on i.i.d. inputs

in this example is only made for convenience.

3

2

2

1

1

1

1

1

4

4

4

3

33

3

5

5

5

4

5

x |c

x |b

x |c

x |a

1x |a2x |a

x |dx |d

x |d4

x |d

x |a

x |ax |a

x |ax |a2x |a4

x |ax |a

x |cx |c

x |c2x |c

x |ex |cx |c

x |b

2

3x |bx |b

x |b

5

x |d5

0

1 2 3 4

5

Fig. 10. State transition diagram of FSM Sff .

probabilities of errors are assumed to be time-invariant in this example and are given by pda = 0.15, pia = 0.2,

pib = 0.2, and ptcd = 0.2.

We follow the recursive approach described in Section VI to compute the probability that the given output

sequence is produced by the fault-free FSM and the probability that it is produced by the faulty (bogus) FSM as

shown in the following tables.

m "T1 [m]

!6j=1 "1[m](j)

0 [1/6 1/6 1/6 1/6 1/6 1/6] 1

1 [0.0861 0.0278 0 0.0015 0.0086 0.0278] 0.1518

2 [0.0227 0 0 0.0108 0.0596 0 ] 0.0931

3 [0.0015 0 0 0.0004 0.0200 0.0076] 0.0295

4 [0.0001 0 0.0025 0.0001 0 0] 0.0027

m "T2 [m]

!6j=1 "2[m](j)

0 [1/6 1/6 1/6 1/6 1/6 1/6] 1

1 [0.2323 0 0 0.1502 0.1343 0] 0.5168

2 [0.0794 0 0 0.0812 0.1628 0] 0.3234

3 [0.0860 0 0 0.0439 0.0615 0] 0.1914

4 [0.0353 0 0 0.0237 0.0609 0] 0.1199

As long as Pff

Pf< 0.1199

0.0027 = 44.4074, the MAP rule dictates that, in order to minimize the probability of error,

we will decide that the underlying implementation of the protocol is a faulty one.

TABLE I

STATE TRANSITION FUNCTIONALITY OF FSM SHOWN IN FIGURE 9: EACH ENTRY SHOWS THE INPUTS THAT TAKE THE FSM FROM THE

CURRENT STATE (CORRESPONDING TO THE ENTRY’S ROW) TO THE NEXT STATE (CORRESPONDING TO THE ENTRY’S COLUMN).

Next State ADM RESET ERROR D CONN SETUP NORMAL

Current State

ADM Con Req R Sabm

RESET R Disc Cmd R Sabm R Ua Rsp

ERROR R Dm Rsp, TExp>N2 TExp, R Frmr R Sabm

R Disc Cmd R Ua Rsp

D CONN R Dm Rsp, R Disc Cmd

R Sabm,

R Ua Rsp,

TExp>N2

SETUP TExp>N2 R Sabm, R Ua Rsp

TExp

NORMAL R Ua Rsp, R Frmr Rsp, R Sabm

R Invi Cmd R Disc Req

VIII. CONCLUSIONS

In this paper we propose a probabilistic approach to the problem of failure diagnosis based on the observation of

a possibly corrupted output sequence. The a priori probability distribution of the input sequence of two given FSMs

is assumed known (equivalently, we are given two known hidden Markov models) and our goal is to determine

(e.g., with minimal probability of error) which of the two models has generated an observed sequence of outputs.

We assume that there are three types of errors (insertions, deletions, and transpositions) which can corrupt the

output sequence and that, given the observed sequence, the probabilities of these errors are independent from the

inputs. We construct an observation FSM that includes all possible output sequences that correspond to the given

observed sequence produced by the FSMs under diagnosis. Based on this observation machine, we develop a

recursive algorithm that can efficiently compute the total probability with which each FSM model, together with

a combination of sensor failures, can generate the observed sequence. Our algorithm is able to deal with cycles

which are present in the trellis diagram due to output deletions.

In this work, we considered the fault-free operation of the system and one mode of a fault-prone operation (given

information from unreliable sensors). Multiple fault-prone operation modes can be handled in a straightforward

manner by our proposed algorithm (by evaluating how well the observations match each possible model so that

we can eventually choose the best match). However, following our current approach, we would need to invoke the

algorithm as many times as the number of operation modes of the system. We plan to extend our current approach to

situations where we can take advantage of the system structure to evaluate the likelihood of multiple faulty models

more efficiently. One possible extension is to introduce more structured fault-free model as well as more structured

fault-prone models. For example, we could consider systems that consist of some independent or loosely coupled

components. Another direction would be to use factorial hidden Markov models by imposing constraints on the state

transition functionality of the system so that each state variable evolves independently of the remaining variables

and it is a priori decoupled from the others. Our results can be easily extended to classification of several hidden

Markov models with applications in various fields such as document or image classification, pattern recognition,

and bioinformatics. An interesting extension of this work would be to modify the algorithm to be able to diagnose

a system based on a partially observable output sequence. It would also be interesting to study the sensitivity of

this approach to the probabilities of the inputs and/or the sensor failures.

REFERENCES

[1] C. G. Cassandras and S. Lafortune, Introduction to Discrete-Event Systems, Kluwer, 1999.

[2] F. Lin, “Diagnosability of discrete event systems and its applications,” Discrete Event Dynamic Systems: Theory and Applications, vol. 4,

no. 2, pp. 197–212, May 1994.

[3] M. Sampath, R. Sengupta, S. Lafortune, K. Sinnamohideen, and D. Teneketzis, “Diagnosability of discrete-event systems,” IEEE Trans.

Automatic Control, vol. 40, no. 9, pp. 1555–1575, September 1995.

[4] M. Sampath, R. Sengupta, S. Lafortune, K. Sinnamohideen, and D. Teneketzis, “Failure diagnosis using discrete-event models,” IEEE Trans.

Control Systems Technology, vol. 4, no. 2, pp. 105–124, February 1996.

[5] A. Benveniste, E. Fabre, S. Haar, and C. Jard, “Diagnosis of asynchronous discrete-event systems: a net unfolding approach,” IEEE Trans.

Automatic Control, vol. 48, no. 5, pp. 714–727, May 2003.

[6] S. H. Zad, R. H. Kwong, and W. M. Wonham, “Fault diagnosis in discrete-event systems: framework and model reduction,” IEEE Trans.

Automatic Control, vol. 48, no. 7, pp. 1199–1212, July 2003.

[7] Y. Wu and C. N. Hadjicostis, “Algebraic approaches for fault identification in discrete-event systems,” IEEE Trans. Automatic Control, vol.

50, no. 12, pp. 2048–2055, December 2005.

[8] A. Benveniste, E. Fabre, and S. Haar, “Markov nets: probabilistic models for distributed and concurrent systems,” IEEE Trans. Automatic

Control, vol. 48, no. 11, pp. 1936–1950, November 2003.

[9] C. N. Hadjicostis, “Probabilistic detection of FSM single state-transition faults based on state occupancy measurements,” IEEE Trans.

Automatic Control, vol. 50, no. 12, pp. 2078–2083, December 2005.

[10] M. Blanke, M. Kinnaert, J. Lunze, M. Staroswiecki, Diagnosis and Fault-Tolerant Control. Springer-Verlag, 2003.

[11] D. Thorsley and D. Teneketzis, “Diagnosability of stochastic discrete-event systems,” IEEE Trans. Automatic Control, vol. 50, no. 4, pp.

476–492, April 2005.

[12] A. T. Bouloutas, G. W. Hart, and M. Schwartz, “Simple finite-state fault detectors for communication networks,” IEEE Trans.

Communications, vol. 40, no. 3, pp. 477–479, March 1992.

[13] A. T. Bouloutas, G. W. Hart, and M. Schwartz, “Fault identification using a finite state machine model with unreliable partially observed

data sequences,” IEEE Trans. Communications, vol. 41, no. 7, pp. 1074–1083, July 1993.

[14] S. H. Low, “Probabilistic conformance testing of protocols with unobservable transitions,” in Proc. IEEE Int. Conf. on Network protocols,

pp. 368–375, October 1993.

[15] J. G. Kemeny, J. L. Snell, and A. W. Knapp, Denumerable Markov Chains. 2nd ed., New York: Springer-Verlag, 1976.

[16] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. of the IEEE, vol. 77, no. 2,

pp. 257–286, February 1989.

[17] F. Jelinek, Statistical Methods for Speech Recognition, The MIT Press, 1997.

[18] A. M. Poritz, “Hidden Markov models: A guided tour,” Proc. 1988 IEEE Conf. Acoustics, Speech, and Signal Processing, vol. 1, pp. 7–13,

April 1988.

[19] Y. Ephraim and N. Merhav, “Hidden Markov processes,” IEEE Trans. Information Theory, vol. 48, no. 6, pp. 1518–1569, June 2002.

[20] K. S. Fu, Syntactic Pattern Recognition and Applications. Prentice-Hall, 1982.

[21] E. Vidal, F. Thollard, C. de la Higuera, F. Casacuberta, and R. C. Carrasco, “Probabilistic finite-state machines–part I,” IEEE Trans. Pattern

Analysis and Machine Intelligence, vol. 27, no. 7, pp. 1013–1025, July 2005.

[22] R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, Biological sequence analysis: Probablistic Models of Proteins and Nucleic Acids.

University Press, Cambridge, 1998.

[23] T. Koski, Hidden Markov Models of Bioinformatics. Kluwer Academic Publishers, 2001.

[24] R. Bauer and J. Hagenauer, “Symbol-by-symbol MAP decoding of variable length codes,” in Proc. 3rd ITG Conf. on Source and Channel

Coding, pp. 111–116, January 2000.

[25] A. Guyader, E. Fabre, C. Guillemot, and M. Robert, “Joint source-channel turbo decoding of entropy-coded sources,” IEEE Journal on

Sel. Areas in Comm., vol. 19, no. 9, pp. 1680–1696, September 2001.

[26] L. R. Bahl and F. Jelinek,“Decoding for channels with insertions, deletions and substitutions with applications to speech recognition,”

IEEE Trans. Information Theory, vol. IT-21, no. 4, pp. 404–411, July 1975.

[27] L. R. Bahl, F. Jelinek, and R. L. Mercer, “A maximum likelihood approach to continuous speech recognition,” IEEE Trans. Pattern Analysis

and Machine Intelligence, vol. PAMI-5, no. 2, pp. 179–190, March 1983.

[28] P. Dupont, F. Denis, and Y. Esposito,“Links between probabilistic automata and hidden Markov models: probability distributions, learning

models and induction algorithms,” Pattern Recognition, vol. 38, no. 9, pp. 1349–1371, September 2005.

[29] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing label error rate,” IEEE Trans. Information

Theory, vol. IT-20, no. 2, pp. 284–287, March 1974.

[30] A. D. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,” IEEE Trans. Information Theory,

vol. IT-13, no. 2, pp. 260–269, April 1967.

[31] G. D. Forney, Jr., “The Viterbi algorithm,” Proc. of the IEEE, vol. 61, no. 3, pp. 268–278, March 1973.

[32] A. Bouloutas, G. W. Hart, and M. Schwartz, “Two extensions of the Viterbi algorithm,” IEEE Trans. Information Theory, vol. 37, no. 2,

pp. 430–436, March 1991.

[33] G. W. Hart and A. T. Bouloutas, “Correcting dependent errors in sequences generated by finite-state processes,” IEEE Trans. Information

Theory, vol. 39, no. 4, pp. 1249–1260, July 1993.

[34] J. C. Amengual and E. Vidal, “Efficient error-correcting Viterbi parsing,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20,

no. 10, pp. 1109–1116, October 1998.

[35] P. Bremaud, Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. Springer-Verlag, 1999.

[36] T. Yoo and H. E. Garcia, “New results on discrete-event counting under reliable and unreliable observation information,” Proc. IEEE Conf.

on Networking, Sensing, and Control, pp. 688 - 693, March 2005.

[37] M. C. Davey and D. J. C. Mackay, “Reliable communication over channels with insertions, deletions, and substitutions,” IEEE Trans.

Inform. Theory, vol. 47, no. 2, pp. 687–698, February 2001.

[38] E. Athanasopoulou and C. N. Hadjicostis, “Maximum likelihood diagnosis in partially observable finite-state machines,” in Proc. IEEE

Intl. Symp. on Intelligent Control, pp. 896–901, 2005.

[39] A. Graham, Kronecker Products and Matrix Calculus with Applications. Mathematics and its Applications, Chichester, UK: Ellis Horwood

Ltd, 1981.

[40] E. Athanasopoulou, “Diagnosis of finite state models under partial or unreliable observations,” Ph.D. thesis, University of Illinois at

Urbana-Champaign, 2007.

[41] The Institute of Electrical and Electronics Engineers, “Logical link control,” American National Standards Institute, ANSI/IEEE Std.

802.2-1985.

maximum likelihood failure diagnosis in finite state machines … · 2010-02-19 · existing...

Documents