data processing for outliers detection

8/22/2019 Data Processing for Outliers Detection

1/18

Data Processing for Outliers Detectio

Silvia Cateni and Valentina CoPERCRO - Istituto di tecnologie della Comunicazione, dellInformazione e della Percezio

Scuola Superiore S. Anna, Pisa, It

1 Introduction

Outlier detection is an important branch in data pre-processing and data mining, as this stage is required

elaboration and mining of data coming from many application fields such as industrial processes, transpor

tion, ecology, public safety, climatology. Outliers are data which can be considered anomalous due to seve

causes (e.g. erroneous measurements or anomalous process conditions). Outlier detection techniques are us

for instance, to minimize the influence of outliers in the final model to develop, or as a preliminary p

processing stage before the information conveyed by a signal is elaborated. On the other hand in many appli

tions, such as network intrusion, medical diagnosis or fraud detection, outliers are more interesting than t

common samples and outliers detection techniques are used to search for them. The traditional outlier detecti

methods can be classified into four main approaches: distance-based, density-based, clustering-based and d

tribution-based. Each of these approaches presents advantages and limitations, thus in the recent years ma

contributions have been proposed to overcome them and improve the quality of the data. Classical methods a

often not suitable to treat some particular databases, therefore recent studies have been conducted on outlier d

tection for these kind of datasets. In particular, a high number of contributions based on artificial intelligen

genetic algorithms and image processing have been proposed in order to develop new efficient outliers det

tion methods that can be suitable in many different applications.

This chapter is organized as follows: in Section 2 an introduction on outlier detection definitions and p

tential applications is proposed. Section 3 presents a review of traditional outlier detection methods, while

Section 4 some outlier detection techniques based on particular data representations are discussed. In Section

recent approaches that are capable to outperform the widely adopted traditional methods are described and S

tion 6 introduces the application of outlier detection methods to the image processing area. Section 7 illustra

the results obtained considering a synthetic case-study and, finally, Section 8 provides some concluding

marks.

2 Outlier Detection: Definitions and Applications

An outlier in a dataset is defined as a measurement that is different from the other values. The classical defin

tion of outlier is due to Hawkins (Hawkins, 1980) which defines an outlier as "an observation that deviates

much from other observations as to arouse suspicion that it was generated by a different mechanism". Anoth

outlier definition is given by Barnett and Lewis (Barnett & Lewis, 1994) and defines an outlier as "an observ

tion (or subset of observations) which appears to be inconsistent with the remainder of that set of data". A

garwal and Yu (Aggarwal and Yu, 2001) states that "outliers may be considered as noise points lying outside


2/18

a set of defined clusters or alternatively outliers may be considered as the points that lie outside of the set of

clusters but also are separated from the noise".

The application fields of outlier detection include fraud detection, weather prediction, fault diagnosis, detecting

novelties in images (i.e. for robot neotaxis or surveillance systems), motion segmentation, satellite image anal-

ysis, medical condition monitoring and others (Hodge, 2004).

3 Classical Methods

The main traditional approaches to outlier detection can be classified into several categories, such as distance-

based, density-based, clustering-based, distribution-based.

3.1 Distance-Based Method

The distance-based outlier method was presented in (Knorr & Ng, 1998), where the definition of outlier be-

comes: "An object O in a dataset T is a DB(p,D)-outlier if at least fraction p of the objects in T lie at a distance

greater than D from O". The parameterp is the minimum fraction of objects that must lie outside an outlier's D-

neighborhood. In the several approaches Mahalanobis distance is used as outlying degree (Matsumoto, 2007).

The Mahalanobis distance (Mahalanobis, 1936) is defined as in equation 1.

DM(x)= (x )T

C.1

(x ) (1)

where x is the data vector, is center of mass of dataset and C the covariance matrix. The Mahalanobis dis-

tance is also defined as the distance between each point and its center of mass. If the covariance matrix is the

identity matrix, the Mahalanobis distance becomes the Euclidean distance. Data points that are located far away

from the center of mass are detected as outliers.

3.2 Density-Based Method

The density-based approaches calculate the density of data and consider outliers the points that lie in regions

with low density. An important contribute was given by Breunig et al. (Breunig et al., 2000) that assigned an

index value namedLocal Outlier Factor (LOF) to each object on the basis of the local density of its neighbor-

hood. In LOF algorithm a high LOF indicates that the considered object is an outlier.

The following definitions are necessary to understand the LOF method:

- kdistance of an instance x. Is the distance between two instancesx andy belonging to datasetD such that:

a) for at least kinstances (kis a positive integer)y'D-{x} it holds that d(x,y') d(x,y).

b) for at most k-1 instancesy'D-{x} it holds that d(x,y') < d(x,y).

- kdistance neighborhood of an instancexis defined in (2) and includes instances whose distance fromx is

not greater than the k-distance.

Nk-distance(x)(x) = {qD {x} : d(x,q) k-distance(x)} (2)

where q are objects called k-nearest neighbors ofx.

- reachability distance of an instancexwith respect to the instancey.

Ifkis a natural number the reachability distance of objectx with respect to objecty is defined as:

reach-dist(x,y) = max{k-distance(y) , d(x,y)} (3)


3/18

- local reachability density of an instancexis the inverse of the average reachability distance based on t

MinPts-nearest neighbors ofx. Also MinPts is an important parameter required by LOF algorithm wh

represents the number of nearest neighbors used in defining the local neighborhood of the istance.

lrdMinpts(x) = [ oNMinpts(x) reach distMinpts(x,o)/ |NMinpts(x) | ]-1

(

Finally the LOF is defined as:

LOFMinpts(x) = [ oNMinpts(x)lrdMinpts(o) /lrdMinpts(x)] / |NMinpts(x) | (

LOF is defined as the average of the ratio of the local reachability density ofx and the cardinality of

set containing theMinPts-nearest neighbors ofx.

The LOF is an outlier degree and is used to decide if an object is an outlier or not. When LOF assume

value close to 1, thenx is comparable to its neighbors, the region is quite dense and the considered object is n

an outlier; on the other hand, a LOF value significantly greater than 1 indicates that x is an outlier.

3.3 Clustering-Based Method

Clustering-based methods consider outliers as objects that do not belong to any cluster after an opportune clu

tering operation. A variation regarding clustering is the use of a fuzzy model. Fuzzy clustering assigns a mebership degree to each sample for each cluster. The most popular known fuzzy clustering algorithm is

Fuzzy C means (FCM).

The fuzzy C means is an unsupervised clustering algorithm due to Dunn (1974) and it is based on the m

imization of an objective function which is defined as the weighted sum of squared error within groups, as d

scribed in the following equation:

Jm(U,V;X) = uik||xkvi||2

(6

where V=(v1,v2, ..., vc) is the vector of the centers of the clusters, uikis the grade of membership of dataxkX

the clusteri. When a stable condition is reached the iteration stops and a point is associated to the cluster

which the value of membership is maximal.

3.4 Distribution-Based Method

The distribution based approaches use standard distribution to fit the dataset and outliers are detected on t

basis of a probability distribution. A fundamental limit of this approach is that this method requires an a pri

knowledge of the probability distribution of the data.

For many applications the a priori knowledge is not always available and obtainable, moreover the computati

cost for fitting data with common distributions (such as, for instance, Gaussian, Log-Normal (Aitchison

Brown, 1957), Gumbel (Castillo,1988) or Weibull (Canfield et al., 1981) distributions) could be considerable

A well known method belonging to this approach was proposed by Grubbs (Grubbs, 1969). The Grub

test detects outliers if the data distribution can be approximated by a Gaussian function. The Grubbs test co

putes the following statistics:

G=[maxi(xi )] / (

whereis the mean value of data and is their standard deviation. If the variable G is greater than a tabula

value than the sample corresponding to the maximum normalized distance from the mean value is consider

an outlier.

The Rosners test (Rosner, 1983) is a generalization of the Grubb's test and it is used to find multiple o

liers. In the Rosner's test the parameterJ, corresponding to the maximum number of possible outliers, must

fixed. Then data are ranked in ascending order. Let define o and o be, respectively, the mean value and

standard deviation of the initial dataset. The samplex0 farthest from o is deleted from data and the mean va

n c

k=1 i=1

m


4/18

and the standard deviation (1,1) are computed on the remaining data. This process is repeated until Jextreme

samples have been removed. Finally the following statistic is calculated and compared to a critical tabulated

value (Gilbert, 1987):

RJ= |xJ-1J-1| / J1 (8)

IfRJ is higher or equal than the critical value then the Jselected samples are considered outliers, otherwise

the test is repeated. If for some iRi=|xi-ii-1| / i-1 is at last equal to the critical value, then the samplesxk for 0ki are actually outliers, otherwise there are not outliers.

4 Outlier Detection based on Particular Data Representations.

A functional dependency (Ramakrishan & Gehrke, 2002) is a relationship between attributes of a givendataset, i.e. for each sample value of an attribute y can be calculated by exploiting the values of some other at-

tributes (x1, x2, xn) in the formy=f(x1, x2, xn) and all the records (x1, x2, xn, y) should respect such relation.

When the functional dependencyfis unknown, there are algorithms created to discover it (Huhtala et al., 1999;

Kivinen & Mannila, 1992). Quasi-functional dependencies (Huhtala et al., 1999; Bruno & Garza, 2007) are re-

lationships that are not satisfied by all the records (x1, x2, xn, y), that are also called tuple. Some methods baseoutlier detection on the respect of the quasi-functional dependencies, as they label as outliers the few tuples

which deviate from the common functional behavior. The use of quasi-functional dependency to detect anoma-

lies was introduced in (Apiletti et al., 2006) and subsequently this approach was improved (Bruno & Garza,

2007). However both the above cited methods are limited to databases which do not contain time information.

On the other hand temporal databases contain attributes which vary over time and temporal aspects are

embedded in them (Date et al, 2002); they also include all database applications that require some aspect of

time when organizing their information. The main difference between non-temporal database and temporal da-

tabase is that non-temporal database consider the data stored at the same time instant, i.e. without considering

past and future database states; while temporal database contain time information attaching a time period to the

data. Temporal databases are widely used in several applications (Pakadakis et al., 2006; Weekes et al., 2002;

Chundi et al., 2009, Wua et al., 2009).

Bruno & Garza (Bruno & Garza, 2010) introduced a new outlier detection method which is suitable fortemporal databases. These authors address the outlier detection problem as part of the data mining process, by

defining the temporal quasi-functional dependency (i.e. a quasi functional dependency that varies through time),

and present the so-called Temporal Outlier Detection (TOD) algorithm. In practice, the proposed approach ex-

tracts the temporal association rules from the database and then combines them to discover temporal quasi-

function dependency. The association rules is the pattern knowledge existing in a given dataset, i.e. association

rules mining is a technique for discovering data dependencies (Liang et al., 2005). Temporal association rules

are an extension of the association rules concept considering also the antecedent and the consequence (Bruno &

Garza, 2010).

The algorithm extracts quasi-functional dependencies with a dependency degree value higher than (or

equal to) a user-specified threshold. Then for each temporal quasi-functional dependency a set of data is select-

ed to be deleted in order to change the temporal quasi-functional dependency into a potential temporal func-

tional dependency. The removed data are defined as outliers.Another particular way to represent data that have been frequently used for outliers detection is the one

based on rough sets. The rough set concept is based on the assumption that each observation of the universe is

associated to a specified amount of information. Objects with the same information are indiscernible; any set of

all indiscernible observations is referred to as crisp set, otherwise the set is imprecise or rough. Rough set theo-

ry was introduced by Pawlak (Pawlak, 1982; Pawlak, 1991; Pawlak et al., 1995) and it is interesting in the

study of intelligent systems characterized by incomplete and insufficient information. Several works demon-

strate the importance of the rough set approach especially in the fields of machine learning and data mining


5/18

(Lin & Gereone, 1996; Pawlak et al., 1995; Skawron & Ranszer, 1992; Yao et al., 2003). In rough sets, da

model information is organized in a table called data system. If there are attributes which derive from a clas

fication operation, the data table is also called a decision system. Each rough set, in contrast to precise se

cannot be characterized by some information and is also defined by lower approximation, upper approximati

and boundary region. The lower approximation is also called positive region, while the upper approximation

called negative region. The positive region includes all the observations belonging to the considered conce

while the upper approximation includes observations which possibly belong to the concept. The difference btween the two regions is the boundary region. Figure 1 show an example of rough set.

Figure 1. An example of rough set

Jang et al. (Jang et al., 2009) propose a method which combines the rough set theory and the outlier dete

tion methods suggesting two different approaches: a sequence-based outlier detection in information systems

rough set theory and a classical distance-based outlier detection method applied to rough sets.

The definition of sequence-based outliers in an informal system is inspired by Hawkin's definition (Hawk

1980) and the basic idea is built from the work by Skowron and Sinak (Skowron & Sinak, 2004), that intduced the basic concepts for approximate information exchanges using information granules.

The basic idea is as follows. Given an information system, defined by a quadruple IS=(U,A,V,f) where Ui

non-empty set of observations,A represents a non empty set of attributes, Vis the union of attributes andfis

informational function which links one value of each attribute to each observation included in U, for each d

x belonging to U, ifx differs (on the basis of some characteristic) from other objects in U, it is labeled as

outlier with respect toIS. The second approach applies a traditional distance-based outlier detection method

rough sets in order to calculate the distance between two objects in an informational system. To this aim it


6/18

necessary to use a suitable distance metric for nominal attributes in rough set theory. An appropriate distance

function for nominal attributes, that is called Value Difference Metric (VDM), was introduced by Stanfill &

Waltz (Stanfill & Waltz, 1986).

A value difference metric between two objectsx andy is defined as follows:

VDM(x,y) = fdf(xf,yf) (9)

wherefis the feature,xf the value of objectx on feature f,yf is the value of objecty on the featurefand dis thedistance between two objects.

5 Recent Artificial Intelligence-based Approaches to Outliers Detection

Artificial Intelligence (AI) is a branch of computer science aiming at providing machines with a sort of intelli-

gence, similar to the one characterizing living beings. Actually many definitions of AI can be found in litera-

ture: in particular Russel & Norvig (Russel & Norvig, 2003) defined an intelligent agent as a system that

perceives its environment and takes actions that maximize its chances of success. Nowadays, the term AI is

widely used to indicate a variety of methods and techniques, such as neural networks, fuzzy logic and genetic

algorithms.

In the most recent years, the ever increasing application of AI techniques leads many researcher to evalu-

ate the possibility of exploiting some of them for outlier detection. Thus many works have been proposed to

improve already existing methods already existed or to introduce new algorithms.

5.1 Support Vector Machine-based Methods

The SVM algorithm, introduced by Vapnik (Vapnik, 1995)is essentially a binary classification algorithm, alt-

hough it has been extended to multi-classes problems. The data belonging to the different classes need to be

separated by a hyperplane but they are not always well separable. To overcome this, the data are mapped to a

feature space with higher dimensionality, where the data separation through hyperplanes is easier. The SVM

classifier is widely used in many disciplines because it has a high accuracy and it is able to treat with high di-

mensional data (Ben-Hur&Weston, 2010).

SVM-based methodologies are been widely used for outlier detection, such as, for instance, in (Tax &Juskczak, 2002; Guo et al. 2008; Peng et al., 2010, Zhang et al, 2008), because they do not require a-priori

knowledge about any kind of statistical model, can be applied to data with high dimensionality and provide an

optimum solution maximizing the margin of decision boundary.

A modification of SVM algorithm that is suitable to detect outliers was proposed by Scholkopf (Scholkopf

et al., 2001), who suggested a method of adapting the SVM method to one-class classification problems. One-

class SVM is an unsupervised algorithm which maps input data into a high dimensional feature space and,

through several iterations, finds the hyperplane which best separates the training samples from the origin. Also

the One-class SVM is a normal two-class SVM where all training samples belong to the first class and the

origin belong as only member to the second class.

One-class SVM method maps data into a feature space through an appropriate kernel function and the

most popular choices of kernel functions used in SVM method are linear, polynomial, gaussian and sigmoidal

functions. The final aim is to separate the mapped vector from the origin with maximum margin.An advantage of one-class SVM for outlier detection is due to a high True Positive Rate (TPR) (that is the

probability to correctly detect the outliers) but the disadvantage is due to also a high False Positive Rate (FPR)

(i.e. the probability to misclassify as outliers samples which are not outliers). To solve this problem Tian & Gu

(Thiang & Gu, 2010) proposed a novel one-class model which combines one-class SVM and Particle SwarmOptimization (PSO) algorithms (Kennedy & Ebehart, 1995; Shi & Ebehart, 1998). The PSO algorithm is in-

spired by social behavior insects, birds and fish. It is used to optimize a given problem by trying to iteratively


7/18

candidate solutions. The candidate solution are included into an initial population. This algorithm is successf

ly applied to a wide variety of problems and has performance comparable to genetic algorithms. In this a

proach PSO algorithms are used to identify the optimum SVM parameters obtaining a high detection rate w

a low FPR. The combination of SVM classifier and PSO algorithm suggests that outliers are effectively dete

ed through the optimization of the classifier which is built through a suitable parameters selection and bound

ry movement strategy. The results show that the proposed approach improves the robustness of the overall d

cision and the best compromise between TPR and FPR is obtained. Other recent examples of outliers detectias one-class learning problem are presented in(Schweizer & Moura, 2000, Miller & Brewning, 2003, Sch

kopf et al. 2001, Banerjee et al., 2006, Campbell & Bennet, 2001, Ratsch et al. 2002, Markou & Singh, 200

Han & Cho, 2006, Abe et al., 2006) and several other applications exploit SVM based techniques to detect o

liers with satisfactory results (Davy et al. 2006, Zhang et al. 2009, King et al. 2002, Gardner et al. 2006, Esk

et al. 2002, Lazarevic et al. 2003, Giacinto et al. 2008, Roberts & Tarassenko, 1994, Tax & Duin, 1999, Tax

Duin, 2004).

5.2 Fuzzy Logic-based Methods

Fuzzy Logic (FL) is connected with the theory of fuzzy sets, a theory which provides classes of objects with u

sharp boundaries and a single object can simultaneously belong to different sets with different degrees

membership. A Fuzzy Inference System (FIS) (Ross, 2004) calculates the mapping from a given input to output by using fuzzy logic. The input variables are also mapped into sets of membership functions cal

"fuzzy set" and the process of converting a crisp value to a fuzzy value is named "fuzzification". The process

fuzzy inference involves membership functions (MF), i.e. a curve that defines how each point in the input spa

is mapped to a membership value or degree of membership between 0 and 1; fuzzy logic operators (and,

not) and if-then rules. The rules results are mapped into a membership function and are combined to giv

crisp answer. This last process is called "defuzzification".

In the last year, a novel interesting approach, called Fuzzy Rough Semi-Supervised Outlier Detecti

(FRSSOD) (Xue et al. 2010) was proposed. This approach combines the Semi-Supervised Outlier Detecti

method (SSOD), that was proposed by Gao et al. (Gao et al, 2006), with a clustering method introduced by H

and Yu. (Hu & Yu, 2005), that is named Fuzzy Rough C-Means clustering (FRCM). Naturally this method b

longs to clustering-based approaches. The proposed method integrates the advantages of SSOD and FRCM a

decides only if the points on the boundary can be considered as outliers. In order to deeply understand FRSOD, the SSOD method and FRCM approach must be known as well. A brief description of this approaches

provided in the following.

Many outlier detection methods are unsupervised algorithms (Breunig et al., 2000, Jin et al. 2001, Eskin

al., 2002) and often the unsupervised methods have a high FPR and a low TPR. The supervised detecti

methods have been introduced in order to improve the algorithm performance (Marsland, 2001, Kazarev

2003, Markou,2006), but the collection of a large amount of labeled training data can be quite difficult. F

these reasons recently semi-supervised outlier detection methods (Li et al., 2007, Zhang et al. 2005, Gao et a

2006, Xu & Liu, 2009) have been presented.

SSOD uses both unlabeled and labeled data, by thus improving accuracy without the need for a high nu

ber of labeled data. Let X = {x1, x2, . . . xn} be a set of data points drawn from Rm

. The first lpoints ofX, w

l


8/18

where ch is the center of clusterCh, dist represents the Euclidean distance and 1 , 2 are adjusting parameters.

The first term is inherited from k-means clustering objective function. As only normal points are partitioned

into clusters, outliers are not included in this term. The second term is used to constrain the number of outliers

not to be too large. The third term is used to maintain consistency of labeling proposed by authors with existing

labels. The minimization of the above-defined objective function leads to point out outliers that do not belong

to any clusters.

FRCM was introduced by Hu & Yu (Hu & Yu, 2005) as a combination between Fuzzy C-means method

and Rough C-means. Fuzzy C-means method is based on the partition of data set points into clusters centers. A

fuzzy membership for every cluster in the range 0-1 is assigned to each point; each object belongs to some or

all of the clusters with some fuzzy degree. The results depend on clusters centers initialization (see Subsection

3.3). In RCM method the concept of C-means clustering is added to the concept of rough set (already treated in

Section 4), i.e. each cluster is seen as a rough set which has a lower approximation region, an upper approxima-

tion region and a boundary region. The upper approximation region of a rough set includes samples in the clus-

ter which are also members of other clusters, i.e. RCM classifies the object space into three parts, lower ap-

proximation, boundary and negative region.

The main difference between rough clustering and classical clustering lies in the fact that that in rough

clustering a sample is member of more than one cluster and this allows overlaps between clusters. In particular

Lingras assumes the following properties:- A data can be a member only of one lower approximation.

- The lower approximation of a given cluster must be a subset of its upper approximation

- If a data is not a member of any lower approximation then it is a member of two or more upper approxi-

mations.

- Data in boundary region are uncertain data and are assigned to at least two upper approximations.

RCM has many advantages and its applicability is extended to several fields that have uncertain information

granulation.

FRCM combines the advantage of fuzzy set theory and rough set theory and integrates fuzzy membership

value of each sample to the lower approximation and boundary area of a cluster. FRCM can be formulated as

follows. LetX ={x1, x2, . . ., xn}be a set of data points and let Ck and kC be, respectively, the lower and upper

approximation of a cluster. kC C C

B

k k=

is the boundary area, c={c1, c2, ..., ck} is a vector of kcenters ofclusters and u={uik} are memberships of a nxKmatrix. FRCM partitions data into two classes: a lower approx-

imation region and a boundary region; only objects belonging to the boundary region are fuzzyfied. The prob-

lem of FRCM lies in the optimization of the following function:

(11)

FRSSOD exploits both the above-described methods and combine the two approaches in a novel one. Let

X={x1, x2, . . ., xn}be a set of data points and let Ybe a subset ofXthat is formed by l


9/18

where1

and2

are adjusting positive parameters and are applied to make the three terms compete with ea

other, while m is a fuzziness weighting exponent (m>1). As only normal points are partitioned into clusters

the idea of SSOD approach) outliers do not contribute to the first term. The second term avoids the detection

an extremely large number of outliers. The third term preserves consistency of user labeling with existing

bels and punishes mislabeled points. FRSSOD does not only uses unlabeled and labeled data but also integra

fuzzy and rough sets theory; therefore it can be applied to many fields that have fuzzy information granulati

or do not take a decision under certain conditions. The experimental results show that FRSSOD has many a

vantages over SSOD, as it improves outliers detection accuracy and reduces false alarm rate under the guidan

of labeled points. On the other hand, the performance of FRSSOD depends on the selection of the number

clusters and on the adjusting parameters 1 and 2.

Fuzzy logic has also been applied for outlier detection as a tool to combine different outlier detecti

methods, in an attempt to exploit the advantages of each of them while overcoming their drawbacks. Also th

approach does not belong only to a category but include more categories of outlier detection based-metho

Cateni et al (Cateni et al., 2009) proposed a novel method based on fuzzy logic theory, which is a substant

improvement of a first attempt previously proposed by the same authors in (Cateni et al., 2007, Cateni et a

2008) and combines a distance-based method, a density-based method, a clustering-based method and a dist

bution-based method. This method does not require any a priori assumption on the data and it is able to det

outliers without the need to made preliminary statistical analysis or parameters tuning. Therefore this approacan be adopted even by inexperienced users. For each sample four features are calculated by using the m

popular outlier detection techniques (see Section 3). The inputs are represented by Mahalanobis distance (M

halanobis, 1936), a membership function evaluated through fuzzy c-means technique (Bezdek, 81, Dunn, 197

the local outlier factor (Breunig et al., 2000) and the result of Grubb test (Grubbs, 1969).

Noticeably the Fuzzy C-means algorithm requires the number of cluster to be known a priori, while in this ca

such number is automatically calculated. In this algorithm the clustering based method is treated through bo

fuzzy c-means algorithm and the validity measure based on inter and intra-cluster distance measure and p

posed by Ray and Turi. (Ray & Turi, 1999). This approach consists in calculating the distance between a po

and its cluster center to decide if the clusters are compact.

Also two measures are defined, the intra-cluster that is defined as the average between a point and its clu

ter center and the inter-cluster which is defined as the distance between clusters. To determine the optim

number of clusters the intra-distance must be minimized while the inter-distance must be maximized. Their tio, that is named validity measure, is defined as follows:

validity = intra-distance/inter-distance (13

and the optimum number of clusters is calculated by minimizing the validity measure (13).

The four features are fed as inputs to the fuzzy inference system (FIS) (Ross, 2004) that provides as output

index in the range (0,1) which represents a measure of probability that the selected sample is an outlier. T

adopted FIS is of the Mandani type (Mandani, 1974). Figure 2 depicts a scheme of the proposed method.

The method have been tested in an industrial context and the results show that this approach outperform

the traditional techniques.


10/18

Figure 2: Block diagram of the outlier detection based fuzzy logic

5.3 Genetic Algorithm-based Methods

Genetic Algorithms (GA) belong to the wider class of evolutionary optimization methods: their main feature

consists in their attempt to mimic the evolution of living organisms through generations: this natural process is

simulated in order to progressively build a solution to a certain problem which is optimal under a (or, some-

times, more than one) arbitrary criterion. A set of possible solutions to the considered optimization problem is

organized into apopulation of candidate solutions which is evolved by means of the GA engine. At each gen-eration of the GA the goodness of each candidate solution is evaluated through a performance measured usual-

ly named fitness function: the individuals with higher fitness are used to build a the population at the subse-

quent generation. The best candidates not only survive but are also combined in order to generate new (and

hopefully better) individuals, such as it happens in natural evolution. The GA population is evolved, generation

by generation, until the achievement of an arbitrarystop condition which typically involves the attainment of a

particularly high fitness value by one of the candidates or by part of the population, the completion of a prede-

termined number of generations or the protract evolution of the population without any improvement in terms

of goodness of the candidates.

Tolvi (Tolvi, 2004) proposed an application of GA for outlier detection based on a statistical approach.

The problem of the association of data to the best possible model is faced by firstly finding any outliers in the

data; a number of initially candidate models are selected and examined. Tolvi introduced a new nuisance in

outlier detection, the possibility of smearing and masking. Smearingmeans that the presence of an outlier caus-

es another normal observations to be misclassified as outlier, while maskingmeans that an outlier prevents an-

other datum from being correctly classified as outlier through an outlier detection method. In (Tolvi, 2004) an

outlier detection method in linear regression modeling is treated. GAs are used for outlier detection avoiding

the potential problems of smearing and masking and simultaneously the problem of variable selection is dis-

cussed. The motivation to treat two different problems (i.e. outlier detection and variable selection) lies in the

fact that the choice of the variables to select can affect the outlier detection and vice-versa (Chatterjee & Hadi,


11/18

1994). Potential outliers can be included into the linear regression model using a dummy variable. A dumm

variable is a binary vector which is zero for outlier samples and one for non-outlier samples. Also the aim

the proposed approach is the selection of the best model where the models have several combinations of

possible dummy variables. The outlier detection is based on the use of informational criteria, in particular

Bayesian Information Criterion (BIC) (Schwarz, 1978).

Schwarz introduced BIC to serve as an approximation to a transformation of the Bayesian posteriori pro

ability of a candidate model. The computation of BIC depends on the model complexity, i.e. on the numberparameters of the model selected. Let us suppose that X = {x1, x2, . . ., xN} is the dataset to be modeled and M

{M1, M2, . . ., Mk} are the candidates of parametric models. Let L(X,M) be the maximization of likelihood fun

tion for each model, the definition of BIC is described as:

BIC = log L(X,M)-1/2log(N) (1

where is the number of parameters in the modelM. If BIC has a low value, which corresponds to have a mo

el with few parameters and small residuals, it is selected by the GA. The proposed GA starts with a populati

size in each generation compose by 40 individuals, which is randomly generated. Each individual contai

genes with value zero with a probability of 0.9 and genes with unitary value with a probability 0.1. The alg

rithm becomes faster adding preliminary information about which of samples are potentially outliers and, a

hough this paper treats linear regression models, the method is also suitable for other statistical models.

Other recent applications of GA for outlier detection (Aggarwal & Yu, 2001; Yan et al., 2004; Bandypadhyay & Santra, 2008) can be found in literature.

6 Outlier Detection in Image Processing

Outlier detection method is an important tool beside image processing analysis. In an image an outlier can a

pear when the image changes over time or can be represented by regions which are anomalous with respect

the rest of a quasi-static image (i.e. with very small variations through time). Outliers can be due to motion,

sertion of anomalous objects or instrumentation errors. The outlier detection process is a fundamental p

processing tool in many interesting image analysis application, such as satellite imagery, spectroscopy, ma

mographic image or video surveillance (Chandola et al., 2009). Often in image processing the data present bo

spatial and temporal characteristics and outlier detection is an important task to identify the false matches.

Malpica et al. (Malpica et al., 2008) propose an innovative technique for outlier detection in hyperspect

images. A hyperspectral image is a digital image where each element of the image (pixel) consists of an asso

ated electromagnetic spectrum. It can also be seen as a cube of data (called hypercube). Due to high number

bands, the big amount of data can result redundant and the most interesting information is difficult to extr

because of the high dimensionality of data themselves. After detection, anomalous points can be retained b

cause they contain interesting information or can be discarded/deleted. The authors propose a method based

Projection Pursuit (PP) (Friedamn & Turkey, 1974; Kruskal, 1969) to detect possible anomalies. This tec

nique is based on the use of one or more linear combinations of the original features with the aim of maxim

ing an index representing an interestingness measure. The results show that PP technique can detect group

outliers or isolated outlier; the proposed algorithm was applied to AHS and HYDICE hyperspectral imagerie

The common Principal Component Analysis (PCA) (Jolliffe, 2002) is a special case of PP. In PCA the redution of data is made by choosing the linear combination of the considered variables which maximizes the va

ance of the projected data, also the index is represented by the variance. An important contribution for a ne

perspective PCA-based approach is suggested by Ding and He (Ding & He, 2004). A method based on PCA

reduce dimensionality and detect outliers in hyperspectral imagery is treated in (Goovaerts et al., 2005) wh

in (Saha et al., 2009) outlier detection through PCA is used also to automate snake contours for object detecti

Snakes are deformable models used to estimate the boundary when the object shape is partially unknown,

example of use of snake is shown in Figure 3.


12/18

Figure 3: Example of snake contour

The deformable curve is a sort of elastic curve which is able to approximate the considered image features.

A novel method for active contour models or snakes is proposed by Chan (Chan, 2001). It is an interesting ap-

proach because the proposed model is able to detect objects that contain boundaries not necessarily defined by

the gradient. In this research area outliers are features which do not lie in the object boundary. In (Nascimento,

2005) an algorithm for detection of objects boundaries in the presence of outliers is proposed. A deformable

contour (as in snakes) approximates the object boundary through the Maximum A Posteriori (MAP) estimate

method (Abrantes & Marques, 1996) using the Expectation Maximization (EM) (McLachlan & Krishnan,

1997).

Dashti et al. (Dashti et al., 2010) proposed the ET-DRN method to understand the relationships between

objects in a given dataset. The hierarchical clustering procedure includes the Euler algorithm to assign objectsto clusters, the GA to increase the density between objects inner each cluster and, finally, the Kullback-Leibler

divergence to calculate the dissimilarity of the clusters. Objects are considered in high dimensionality and are

examined as objects of a digital geometry; also it is possible to build a sensible mathematical structure where

outliers are clearly detectable.Silveira et al. (Silveira et al., 2008) proposed a new method which classifies im-

age features as valid or invalid (i.e. outliers) by organizing edge points into connected segments (the so-called

strokes). An adaptive stopping force which allows the contour to bridge the invalid features and stop at the val-

id features is applied. For each stroke is than assigned a confidence degree during the evolution process, also

the weights are given by the probability that a stroke is valid.

7 Case Study Using Synthetic Data

In order to show how the different classical methods and a recent proposed method work an example using

synthetic data is proposed. The created database includes 100 samples of a random variable whose probability

density function derives from the composition of two Gaussian functions, as show in Figure 4. 10 outliers, in-

dicated with red circles in Figure 4, have been included in such database.


13/18

Figure 4: The synthetic dataset and, on the left, the distribution of the data that are not outliers.

Four classical outlier detection methods have been applied to this database: a distance-based approac

where the Mahalanobis distance is exploited, a density-based approach based on LOF algorithm, a clusterin

based method which exploits the Fuzzy C-means as clustering algorithm and, finally, a distribution-based

proach using the Rosners algorithm. Moreover, on the same database also an AI-based technique has be

tested, in particular the one proposed in (Cateni et al., 2009). The results of these tests are depicted in Table 1

Approach Outlier detected (%)

Distance-based (Mahalanobis distance) 30% (A - E - I)Density-based (LOF) 30% (G - H - L)

Clustering-based (Fuzzy C-means) 30% (B - D - F)

Distribution-based (Rosner's test) 70% (A - C - E - G - H - I - L)

AI-based (Fuzzy approach) 100% (A - B - C - D - E - F - G - H - I - L)

Table 1: Test results of some outlier detection techniques on the synthetic database of Fig. 4.

The results put into evidence the particular features of the tested algorithms. In particular, the distanc

based approach is capable to point out the outliers that mostly differ from the mean value, while the densi

based approach detects only outliers that are isolated from data. The clustering-based approach finds isola

outliers after a clustering operation. Finally the distribution-based approach considers as outliers those poi

that deviate from the model. In this example the distribution-based method works quite well because the initdataset is created by two Gaussian distributions. The fuzzy-based approach, combining the several classi

methods, outperforms all the traditional techniques as it exploits all their capabilities by compensating th

weaknesses.


14/18

8 Conclusion

A survey about outlier detection methods is proposed. Both traditional approaches and their recent enhance-

ments as well as some interesting applications are presented and discussed. Finally a case study is proposed,

that is based on a synthetic database, with the purpose to show how the different methods work. The conclusion

is that the potential and efficiency of an outlier detection method strongly depend on the kind and distribution

of the data that are processed. For instance clustering-based methods are very effective if the data are stronglyclustered, while distribution-based methods can work quite well if the hypotheses that are required on the data

distribution are correct, which means that they can be applied only when some a-priori knowledge on the data

distribution are available. If no information is available on the data to process and/or if the data features can

change through time in a non predictable way, than probably the best solution is to try different methods and/or

apply a combination of many outlier detection methods which are based on different principles. Fuzzy logic

can provide a powerful tool to automatically perform such combination, but also other combination procedures

are possible.

References

Abe, N, Zadrozny, B. & Langford, J. Outlier Detection by Active Learning, Proc. ACM SIGKDD 06, 2006, (pp. 504-509).

Abrantes, A. and Marques, J. A class of constrained clustering algorithms for object boundary detection, IEEE Trans. Image Pro-

cess., vol.5, no. 11, pp. 15071521, Nov. 1996.

Aggaewal, C.C., Yu, P.S. Outlier detection for high dimentional data.Proceeding of ACM SIGMOD Conference, 2001, (pp. 37-

47)

Aitchison, J. and Brown, J.A.C. The lognormal distribution, Cambridge University Press, Cambridge UK, 1957

Apiletti, D. Baralis, E. Bruno, G. Ficarra, e. Data cleaning and semantic improvement in biological databases, Journal of Integra-

tive Bioinformatics 3 (2) (2006) 111.

Bandyopadhyay, S. & Santra, S. A genetic approach for efficient outlierd etection in projected space. Pattern Recognition, 41,

2208, (pp. 1338-1349).

Banerjee, A., Burlina, P. & Diehl, C. A Support Vector Method for Anomaly Detection in Hyperspectral Imagery, IEEE Trans.Geoscience and Remote Sensing, vol. 44, no. 8, 2006 (pp. 2282-2291).

Barnett, V, Lewis, T. Outliers in Statistical Data, 3rd ed., John Wiley & Sons, New York, 1984.

Ben-Hur, A. and Weston, J. A user's guide to Support Vector Machines, Meth Mol Biol 609, 2010, (pp. 223-239)

Bezdek, J.C., Pattern Recognition with Fuzzy Objective Function Algoritms, Plenum Press, New York, 1981

Bishop, C. Neural Network for Pattern Recognition. Oxford UniversityPress, Oxford, 1995.

Breunig, M.M, Kriegel, H.P, Ng, R.T., and Sander, J. LOF: Identifying Density-based Local Outliers, Proc. of the 2000 ACM

SIGMOD Intl Conf. on Management of Data, ACMNew York, NY, USA, June 2000, Vol. 29, Issue 2, pp. 93-104.

Bruno, G. & Garza, P. TOD: Temporal outlier detection by using quasi-functional temporal dependencies. Elsevier, Data &

Knowledge Engineering, 69, 2010, (pp.619-639).

Bruno, G., Garza, P., Quintarelli, E., Rossato, R. Anomaly detection through quasi-functional dependency analysis, Journal of

Digital Information Management5 (4) (2007) 191200.

Campbell, C. & Bennet, K.P. A Linear Programming Approach to Novelty Detection, Advances in Neural Information Pro-

cessing Systems 13, 2001 (pp. 395-401).

Canfield, R.V., Taillie, C., Patil, G.P, Baldessari, B.A. Extreme value theory with applications to hydrology. In Statistical distri-

butions in scientific work, Vol. 6, Reidel Pubblishing Company, Dordrecht, Holland (pp. 35-49), 1981.

Castillo, E. Extreme Value theory in engineering. New York: Academic, 1988.


15/18

Cateni, S., Colla, V., Vannucci , A fuzzy logic based method for outlier detection, Proc. 25th

IASTED Int. Conf. Artificial Inte

gence and Applications, AIA 2007 pp. 561-66, Innsbruck, Austria, 2007

Cateni, S., Colla, V., Vannucci: Outlier detection methods for industrial applications, in Advances in Robotics, Automation

Control, I-Tech Education and Publishing KG, Croatia, October 2008.

Cateni, S., Colla, V., Vannucci, M., A fuzzy system for combining different outliers detection methods, In proceedings of the 2

conference on proceedings of the International conference: Artificial intelligence and Applications , Innsbruck, Aust

16-18 Febbraio 2009

Chan, T.F. Active contours without edges. IEEE Transaction on image processing, Vol.10, N.2, February 2001.

Chandola, V. Banerjee, A., Kumar, V. Anomaly detection survey. ACM Computing Surveys, September 2009.

Chatterjee, S. & hadi, A.S. Sensitivity analysis in linear regression, Wiley New York, 1998.

Chaudhuri, P. On a Geometric Notion of Quantiles for MultivariateData, J. Am. Statistical Assoc., vol. 91, no. 434, 1996 (

862-872).

Chundi, P., Subramaniam, M., Vasireddy, D.K., An approach for temporal analysis of email data based on segmentation, D

and Knowledge Engineering68 (11) (2009) 12531270.

Dashti, H.T., Kloc, M.E., Simas, T., Ribeiro, R.A., Assadi, A.H. Introduction of empirical topology in construction of relations

networks of informative objects, IFIP Advances in Information and Communication Technology, Springer, 2010.

Davy, M., Desobry, F., Gretton, A., Doncarli, C. An online support vector machine for abnormal events detection. Signal Proc86(8), 2006, (pp. 20092025)

C.J. Date, H. Darwen, N. Lorentzos. Temporal Data & the Relational Model, First Edition (The Morgan Kaufmann Series in D

Management Systems); Morgan Kaufmann; 1st edition; 2002, ISBN 1-55860-855-9.

Ding, C., He, X. Principal component analysis and effective K-means clustering, SDM, 2004 (pp.497-501).

Dunn, J.C. Some recent investigations of a new fuzzy partition algorithm and its application to pattern classification proble

Journal of Cybernetics 4 (1974) 115.

Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S. A geometric framework for unsupervised anomaly detection Detect

intrusions in unlabeled data.In: Data Mining for Security Applications, vol. 19 , 2002.

Friedman, J.H.,. Tukey, J.W. A projection pursuit algorithm for exploratory data analysis, IEEE Trans. Comput. C-23 (9) (197

881--890.

Gardner, A.B., Krieger, A.M., Vachtsevanos, G., Litt, B. One-class novelty detection for seizure analysis from intracrania leeg

Mach. Learn. Res. 7, 2006, (10251044 )

Giacinto, G., Perdisci, R., Del Rio, M., Roli, F.: Intrusion detection in computer networks by a modular ensemble of one-cl

classifiers. Inf. Fusion 9(1), 2008, (pp 6982 ).

Gao, J.,Cheng, H., Tan, P.N.,. Semi-supervised outlier detection, Proc. of the 2006 ACM Symposium on Applied Computi

ACM Press, 2006, pp. 635636.

Gilbert, R.O. statistical methods for enviromental pollution monitoring, Van Nostrand, Reinholds, New York, 1987.

Goovaerts, P., Jacqueza, G.M.. Marcus, A. Geostatistical and local cluster analysis of high resolution hyperspectral imagery

detection of anomalies, Remote Sensing Environ. 95 (2005) 351--367.

Grubbs, F.E., Procedures for detecting outlying observations in samples, Technometrics 11, pp.1-21, 1969.

Guo, S.M., Chen, L.C., Tsai, J.S.H. A boundary method for outlier detection based on support vector domain descrption. Patt

Recognition 42, 2009, (pp. 77-83).

S.-J. Han and S.-B. Cho, Evolutionary Neural Networks for Anomaly Detection Based on the Behavior of a Program, IEEE Tr

Systems, Man, and Cybernetics B, vol. 36, no. 3, 2006 (pp. 559-570).

Hawkin, D.,Identification of outliers, Chapman and Hall, London, 1980.

Hodge, V.J., A survey of outlier detection methodologies, Kluver Academic Publishers, Netherlands, January 2004.


16/18

Hu, Q, Yu, D. An improved clustering algorithm for information granulation, in: Proceeding of 2nd

International Conference on

Fuzzy Systems and Knowledge Discovery (FSKD05), vol. 3613, LNCS, Springer-Verlag, Berlin Heidelberg Changsha,

China, 2005, pp. 494504.

Huhtala, Y., Krkkinen, J., Porkka, Toivonen, H. TANE: an efficient algorithm for discovering functional and approximate de-

pendencies, The Computer Journal42 (2) (1999) 100111.

Jang, F., Sui, Y. & Cao, C. Some issues about outlier detection in rough set theory, Expert Systems with Applications, 36,

pp.4680-4687, 2009.

Jolliffe, I. Princypal Component Analysis. Springer.New York, 2002.

Kennedy, J. & Ebehart, R., Particle Swarm Optimization. Proceeding of IEEE International Conference on Neural Network, IV,

pp. 1942-1948.

King, S.P., King, D.M., Astley, K., Tarassenko, L., Hayton, P., Utete, S. The use of novelty detection techniques for monitoring

high-integrity plant. In:Proceedings of the 2002 International Conference on Control Applications, Cancun, Mexico, vol.

1, 2002, (pp. 221226 )

Kivinen, J., Mannila, H. Approximate inference of functional dependencies from relations, Theoretical Computer Science 149 (1)

(1992) 129149.

Knorr, E.M., Ng, R. Algorithms for Mining Distance-Based Outliers in Large Datasets.,Proceeding VLDB,pp.392-403.

Kruskal,J.B. Toward a practical method which helps uncover the structure of a set of multivariate observations by finding the lin-ear transformation which optimizes a new index of condensation, in: R.C. Milton, J.A. Nelder (Eds.), Statistical Com-

putation, Academic Press, New York, 1969, pp. 427--440.

Jin, W., Tung, A.K.H., Han, J. Mining Top-n Local Outliers in Large Databases. Proc. of the Seventh A SIGKDD Intl Conf. on

Knowledge Discovery and Data Mining, ACM New York, NY, USA, 2001, pp. 293298.

Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J. A comparative study of anomaly detection schemes in network in-

trusion detection. In:Proceedings of Third

SIAM Conference on Data Mining, San Francisco, vol. 3, 2003.

Li, B. Fang, L. Guo, A novel Data Mining Method for Network Anomaly Detection based on Transductive Scheme, in Advances

in Neural Networks, LNCS, vol.4491, Springer Berlin, 2007, pp. 1286-1292.

Liang, Z., Ximming, T., Lin, L., Wenliang, J. Temporal association rule mining based on a T a-priori algorithm and its typical

application. Proceedings of International Symposium on Spatio-Temporal Modeling, Spatial Reasoning Analysis, Datamining and Data Fusion, 2005.

Lin, T. Y., & Gereone, N. (1996). Rough sets and data mining: Analysis of imprecise data. Dordrecht: Kluwer Academic.

Lingras, P, West,C. Interval set clustering of web users with rough k-means,Journal of Intelligent Information System, vol. 23, no.

1, July 2004, pp. 516.

MacQueen, J. Some methods for classiication and analysis of multivariate observations. In Proceedings of the Fifth Berkeley

Symposium on Mathematical Statistics and Probability, Vol. 1, university of California Press, Berkley, 1967, (pp. 1664-

1677).

Mahalanobis, P.C.. On the generalized distance in statistics, Proc. of the National Institute of Science of India, pp.49-55, 1936.

Malpica, J.A., Rejas, J.C., Alonso, M.C. A projection pursuit algorithm for anomaly detection in hyperspectral imagery., Pattern

Recognition, 41 (pp.3313-3327)

Mandani E.H., Application of fuzzy algorithms for control of simple dynamic plant, Proc. of the IEEE Control and Science, No.

121, pp. 298-316, 1974.

Marsland, S., On-line Novelty Detection Through Selforganisation, with Application to Inspection Robotics. Ph.D. Thesis, Facul-

ty of Science and Engineering, University of Manchester, UK, 2001.

Markou, M. & Singh, S. A Neural Network-Based Novelty Detection for Image Sequence Analysis, IEEE Trans. Pattern Analysis

and Machine Intelligence, vol. 28, no. 10, 2006, ( pp. 1664-1677).

Matsumoto, S; Kamey, Y; Monden, A. Comparison of Outlier Detection Methods in Faul proneness


17/18

Models. Proceedings of the 1st International Symposium on Empirical Software Engineering and Measurement (ESEM200

pp.461-463, September 2007.

McLachlan, G.J. & Krishnan, T. The EM Algorithm and Extensions. New York: Wiley, 1997.

Miller, D.J, and Browning, J. A Mixture Model and EM-Based Algorithm for Class Discovery, Robust Classification, and Out

Rejection in Mixed Labeled/Unlabeled Data Sets, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no.

Nov 2003 (pp. 1468-1483).

Nascimento, J.C. Adaptive snakes using the EM algorithm.IEEE TRANSACTIONS ON IMAGE PROCESSING, Vol. 14, N.

2005, (pp. 1678-1686)

Papakakis,N., Antoniou, G., Plexousakis, D. The ramification problem in temporal databases: changing beliefs about the p

Data and knowledge Engineering59, 2, 2006, (pp 379-434).

Pawlak, Z. (1982). Rough sets. International Journal of Computer and Information Sciences, 11, 341356.

Pawlak, Z. (1991). Rough sets: Theoretical aspects of reasoning about data. Dordrecht: Kluwer Academic Publishers.

Pawlak, Z., Grzymala-Busse, J. W., Slowinski, R., & Ziarko, W. (1995). Rough sets. Communications of the ACM, 38(11), 8

95.

Peng, X., Chen, J., Shen, H. Outlier Detection Method BasedOn SVS and its applicationin Copper-matte Converting, IE

ISBN: 978-1-4244-5181-4, 2010.

Ramakrishnan, R., Gehrke, J. Database Management Systems,McGraw-Hill Science Engineering Math, 2002.

Ratsch, G., Mika, S., Scholkopf, B. and Muller,K., Constructing Boosting Algorithms from SVMs: An Application to One-Cl

Classification,IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 9, Sept 2002, ( pp. 1184-1199).

Ray, S., Turi, .H., 1999. Determination of number of clusters in k-means clustering and application in colour image segmentati

Proc. 4th

Int. Conf. Advances in Pattern Recognition and Digital Techniques , (ICAPRDT 99), Calcutta, India, 27-29 D

cember, 1999, pp. 137-43.

Roberts, S. & Tarassenko, M.. A Probabilistic Resource Allocating Network for Novelty Detection, Neural Computation, vol

no. 2, 1994, (pp. 270-284).

Rosner, B. Percentage points for a generalized ESD many-Outlier procedure. Technometrics, 25, (pp. 165-172), 1983.

Ross, T.J. Fuzzy logic with engineering applications, John Wiley &sons ltd England, 2004.

Russell, S. J.; Norvig, P. Artificial Intelligence: A Modern Approach (2nd ed.), Upper Saddle River, New Jersey: Prentice H2003.

Saha, B.D.; Roy N. & Zhang, H. Snake validation: a PCA-based outlier detection method, IEEE signal processing letters, vol

N 6, 2009.

Schwarz, G. Estimating the dimensional of a model. The annual stat, 6, 1978, (pp. 461-464).

S.M. Schweizer and J.M.F. Moura, Hyperspectral Imagery: Clutter Adaptation in Anomaly Detection, IEEE Trans. Informat

Theory, vol. 46, no. 5, Aug.2000, ( pp. 1855-1871).

Scholkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., and Williamson, R.C., Estimating the Support of a High-Dimensio

Distribution,Neural Computation, vol. 13, no. 7, 2001, (pp. 1443-1471).

Shannon, C. E. (1948). The mathematical theory of communication. Bell System Technical Journal, 27(34), 373423.

Shi, Y. Ebehart, R.C. A modified Particle Swarm Optimization. Proceedings of IEEE International Conference on Evolution

Cmputation,pp. 69-73.

Silveira, M., Nascimento, J.C., Marques, J.S., Level set segmentation with outlier rejection. IEEE, ICIP 2008.

Skowron, A., & Rauszer, C. (1992). The discernibility matrices and functions in information systems. Handbook of applicati

and advances of rough set theory (Vol. 11, pp. 331362). Dordrecht: Kluwer Academic Publishers.

Skowron, A., & Synak, P. (2004). Reasoning in information maps.Fundamenta Informaticae, 59, 241259.

Stanfill, C., & Waltz, D. (1986). Towards memory-based reasoning. Communications of the ACM, 29(12), 12131228.


18/18

Tax, D.M.J & Duin, R.P.W. Support vector data description.Machine Learning54, (pp. 45-66), 2004.

Tax, D.M.J & Duin, R.P.W. Support vector domain description.Pattern Recogn. Lett. 20 11-13, (pp. 1191-1199), 1999.

Tax D.M.J. & Juszczak, P. Kernel whitening for one-class classification,Lecture Notes in Computer Science, vol. 2388, Springer,

Berlin, 2002, (pp. 4052).

Theodoridis, S., Koutroumbas, K. Pattern Recognition, 3rd edn. Academic Press, San Diego,

2006.Thiang, T. & Gu, H. (2010). Anomaly detection combining one-class SVMs and particle swarm optimization algorithms. In Non-

lynear Dyn, Springer 61(pp. 303310).

Tolvi, J. Genetic algorithms for outlier detection and variable selection in linear regression models, Soft Computing 8, Springer-

VerlagV (pp. 527-533)

Vapnik, V. The Nature of Statistical Learning Theory (M).New York: Springer-Verlag. 1995

Weekes, C.D, Vose, J.M,. Lynch, J.C., Weisenburger, D.D.. Bierman, M.M., Greiner, T., Bociek, G., Enke, C,. Bast, M., Chan,

W.C., Armitage, J.O, Hodgkinn disease in the elderly: improved treatment outcome with a doxorubicin containing regi-

men,Journal of Clinical Oncology 20 (4) (2002) 10871093.

Wua, S.Y., Chen, Y.L., Discovering hybrid temporal patterns from sequences consisting of point and interval based events,

Data and Knowledge Engineering68 (11) (2009) 13091330.

Xue, Z & Liu, S. Rough based Semi-Supervised Outlier Detection. Sixth International Conference on Fuzzy system andknowledge Discovery, (pp. 520-524).

Xue, Z, Shang, Y., Feg, S., Semi-supervised outlier detection based on fuzzy rough C-means clustering, Mathematics and Com-

puters in Simulation, 80, 2010, (pp.2011-2021).

Yan, C., Chen, G., Shen, Y. Outlier analysis for gene expression data,J. Cmput. Sci. Technol., 19 (1), 2004, (pp.13-21).

Yao, Y. Y., Zhao, Y., & Maguire, R. B. (2003). Explanation oriented association mining using rough set theory. InProceedings

of the ninth international conference on rough sets, fuzzy sets, data mining, and granular computing (pp. 165172). China.

Zhang, D.Gatica-Perezs, D., Bengio, S..and McCowan,I. Semi-supervised Adapted HMMs for Unusual Event Detection, IEEE

Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05), IEEE Press, June 2005, vol.1,

pp.611618.

Zhang, Y., Meratnia, N. and Havinga, P.J.M. (2008) Outlier Detection Techniques For Wireless Sensor Networks: A Survey.

Technical Report TR-CTIT-08-59, Centre for Telematics and Information Technology University of Twente, Enschede.ISSN 1381-3625

Zhang, Y., Liu, X.D., Xie, F.D., Li, K.Q. Fault classifier of rotating machinery based on weighted support vector data description.

Expert Syst. Appl. 36(4), 2009, (pp.79287932)

data processing for outliers detection

Documents