data processing for outliers detection

Upload: rehab-shehata

Post on 08-Aug-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/22/2019 Data Processing for Outliers Detection

    1/18

    Data Processing for Outliers Detectio

    Silvia Cateni and Valentina CoPERCRO - Istituto di tecnologie della Comunicazione, dellInformazione e della Percezio

    Scuola Superiore S. Anna, Pisa, It

    1 Introduction

    Outlier detection is an important branch in data pre-processing and data mining, as this stage is required

    elaboration and mining of data coming from many application fields such as industrial processes, transpor

    tion, ecology, public safety, climatology. Outliers are data which can be considered anomalous due to seve

    causes (e.g. erroneous measurements or anomalous process conditions). Outlier detection techniques are us

    for instance, to minimize the influence of outliers in the final model to develop, or as a preliminary p

    processing stage before the information conveyed by a signal is elaborated. On the other hand in many appli

    tions, such as network intrusion, medical diagnosis or fraud detection, outliers are more interesting than t

    common samples and outliers detection techniques are used to search for them. The traditional outlier detecti

    methods can be classified into four main approaches: distance-based, density-based, clustering-based and d

    tribution-based. Each of these approaches presents advantages and limitations, thus in the recent years ma

    contributions have been proposed to overcome them and improve the quality of the data. Classical methods a

    often not suitable to treat some particular databases, therefore recent studies have been conducted on outlier d

    tection for these kind of datasets. In particular, a high number of contributions based on artificial intelligen

    genetic algorithms and image processing have been proposed in order to develop new efficient outliers det

    tion methods that can be suitable in many different applications.

    This chapter is organized as follows: in Section 2 an introduction on outlier detection definitions and p

    tential applications is proposed. Section 3 presents a review of traditional outlier detection methods, while

    Section 4 some outlier detection techniques based on particular data representations are discussed. In Section

    recent approaches that are capable to outperform the widely adopted traditional methods are described and S

    tion 6 introduces the application of outlier detection methods to the image processing area. Section 7 illustra

    the results obtained considering a synthetic case-study and, finally, Section 8 provides some concluding

    marks.

    2 Outlier Detection: Definitions and Applications

    An outlier in a dataset is defined as a measurement that is different from the other values. The classical defin

    tion of outlier is due to Hawkins (Hawkins, 1980) which defines an outlier as "an observation that deviates

    much from other observations as to arouse suspicion that it was generated by a different mechanism". Anoth

    outlier definition is given by Barnett and Lewis (Barnett & Lewis, 1994) and defines an outlier as "an observ

    tion (or subset of observations) which appears to be inconsistent with the remainder of that set of data". A

    garwal and Yu (Aggarwal and Yu, 2001) states that "outliers may be considered as noise points lying outside

  • 8/22/2019 Data Processing for Outliers Detection

    2/18

    a set of defined clusters or alternatively outliers may be considered as the points that lie outside of the set of

    clusters but also are separated from the noise".

    The application fields of outlier detection include fraud detection, weather prediction, fault diagnosis, detecting

    novelties in images (i.e. for robot neotaxis or surveillance systems), motion segmentation, satellite image anal-

    ysis, medical condition monitoring and others (Hodge, 2004).

    3 Classical Methods

    The main traditional approaches to outlier detection can be classified into several categories, such as distance-

    based, density-based, clustering-based, distribution-based.

    3.1 Distance-Based Method

    The distance-based outlier method was presented in (Knorr & Ng, 1998), where the definition of outlier be-

    comes: "An object O in a dataset T is a DB(p,D)-outlier if at least fraction p of the objects in T lie at a distance

    greater than D from O". The parameterp is the minimum fraction of objects that must lie outside an outlier's D-

    neighborhood. In the several approaches Mahalanobis distance is used as outlying degree (Matsumoto, 2007).

    The Mahalanobis distance (Mahalanobis, 1936) is defined as in equation 1.

    DM(x)= (x )T

    C.1

    (x ) (1)

    where x is the data vector, is center of mass of dataset and C the covariance matrix. The Mahalanobis dis-

    tance is also defined as the distance between each point and its center of mass. If the covariance matrix is the

    identity matrix, the Mahalanobis distance becomes the Euclidean distance. Data points that are located far away

    from the center of mass are detected as outliers.

    3.2 Density-Based Method

    The density-based approaches calculate the density of data and consider outliers the points that lie in regions

    with low density. An important contribute was given by Breunig et al. (Breunig et al., 2000) that assigned an

    index value namedLocal Outlier Factor (LOF) to each object on the basis of the local density of its neighbor-

    hood. In LOF algorithm a high LOF indicates that the considered object is an outlier.

    The following definitions are necessary to understand the LOF method:

    - kdistance of an instance x. Is the distance between two instancesx andy belonging to datasetD such that:

    a) for at least kinstances (kis a positive integer)y'D-{x} it holds that d(x,y') d(x,y).

    b) for at most k-1 instancesy'D-{x} it holds that d(x,y') < d(x,y).

    - kdistance neighborhood of an instancexis defined in (2) and includes instances whose distance fromx is

    not greater than the k-distance.

    Nk-distance(x)(x) = {qD {x} : d(x,q) k-distance(x)} (2)

    where q are objects called k-nearest neighbors ofx.

    - reachability distance of an instancexwith respect to the instancey.

    Ifkis a natural number the reachability distance of objectx with respect to objecty is defined as:

    reach-dist(x,y) = max{k-distance(y) , d(x,y)} (3)

  • 8/22/2019 Data Processing for Outliers Detection

    3/18

    - local reachability density of an instancexis the inverse of the average reachability distance based on t

    MinPts-nearest neighbors ofx. Also MinPts is an important parameter required by LOF algorithm wh

    represents the number of nearest neighbors used in defining the local neighborhood of the istance.

    lrdMinpts(x) = [ oNMinpts(x) reach distMinpts(x,o)/ |NMinpts(x) | ]-1

    (

    Finally the LOF is defined as:

    LOFMinpts(x) = [ oNMinpts(x)lrdMinpts(o) /lrdMinpts(x)] / |NMinpts(x) | (

    LOF is defined as the average of the ratio of the local reachability density ofx and the cardinality of

    set containing theMinPts-nearest neighbors ofx.

    The LOF is an outlier degree and is used to decide if an object is an outlier or not. When LOF assume

    value close to 1, thenx is comparable to its neighbors, the region is quite dense and the considered object is n

    an outlier; on the other hand, a LOF value significantly greater than 1 indicates that x is an outlier.

    3.3 Clustering-Based Method

    Clustering-based methods consider outliers as objects that do not belong to any cluster after an opportune clu

    tering operation. A variation regarding clustering is the use of a fuzzy model. Fuzzy clustering assigns a mebership degree to each sample for each cluster. The most popular known fuzzy clustering algorithm is

    Fuzzy C means (FCM).

    The fuzzy C means is an unsupervised clustering algorithm due to Dunn (1974) and it is based on the m

    imization of an objective function which is defined as the weighted sum of squared error within groups, as d

    scribed in the following equation:

    Jm(U,V;X) = uik||xkvi||2

    (6

    where V=(v1,v2, ..., vc) is the vector of the centers of the clusters, uikis the grade of membership of dataxkX

    the clusteri. When a stable condition is reached the iteration stops and a point is associated to the cluster

    which the value of membership is maximal.

    3.4 Distribution-Based Method

    The distribution based approaches use standard distribution to fit the dataset and outliers are detected on t

    basis of a probability distribution. A fundamental limit of this approach is that this method requires an a pri

    knowledge of the probability distribution of the data.

    For many applications the a priori knowledge is not always available and obtainable, moreover the computati

    cost for fitting data with common distributions (such as, for instance, Gaussian, Log-Normal (Aitchison

    Brown, 1957), Gumbel (Castillo,1988) or Weibull (Canfield et al., 1981) distributions) could be considerable

    A well known method belonging to this approach was proposed by Grubbs (Grubbs, 1969). The Grub

    test detects outliers if the data distribution can be approximated by a Gaussian function. The Grubbs test co

    putes the following statistics:

    G=[maxi(xi )] / (

    whereis the mean value of data and is their standard deviation. If the variable G is greater than a tabula

    value than the sample corresponding to the maximum normalized distance from the mean value is consider

    an outlier.

    The Rosners test (Rosner, 1983) is a generalization of the Grubb's test and it is used to find multiple o

    liers. In the Rosner's test the parameterJ, corresponding to the maximum number of possible outliers, must

    fixed. Then data are ranked in ascending order. Let define o and o be, respectively, the mean value and

    standard deviation of the initial dataset. The samplex0 farthest from o is deleted from data and the mean va

    n c

    k=1 i=1

    m

  • 8/22/2019 Data Processing for Outliers Detection

    4/18

    and the standard deviation (1,1) are computed on the remaining data. This process is repeated until Jextreme

    samples have been removed. Finally the following statistic is calculated and compared to a critical tabulated

    value (Gilbert, 1987):

    RJ= |xJ-1J-1| / J1 (8)

    IfRJ is higher or equal than the critical value then the Jselected samples are considered outliers, otherwise

    the test is repeated. If for some iRi=|xi-ii-1| / i-1 is at last equal to the critical value, then the samplesxk for 0ki are actually outliers, otherwise there are not outliers.

    4 Outlier Detection based on Particular Data Representations.

    A functional dependency (Ramakrishan & Gehrke, 2002) is a relationship between attributes of a givendataset, i.e. for each sample value of an attribute y can be calculated by exploiting the values of some other at-

    tributes (x1, x2, xn) in the formy=f(x1, x2, xn) and all the records (x1, x2, xn, y) should respect such relation.

    When the functional dependencyfis unknown, there are algorithms created to discover it (Huhtala et al., 1999;

    Kivinen & Mannila, 1992). Quasi-functional dependencies (Huhtala et al., 1999; Bruno & Garza, 2007) are re-

    lationships that are not satisfied by all the records (x1, x2, xn, y), that are also called tuple. Some methods baseoutlier detection on the respect of the quasi-functional dependencies, as they label as outliers the few tuples

    which deviate from the common functional behavior. The use of quasi-functional dependency to detect anoma-

    lies was introduced in (Apiletti et al., 2006) and subsequently this approach was improved (Bruno & Garza,

    2007). However both the above cited methods are limited to databases which do not contain time information.

    On the other hand temporal databases contain attributes which vary over time and temporal aspects are

    embedded in them (Date et al, 2002); they also include all database applications that require some aspect of

    time when organizing their information. The main difference between non-temporal database and temporal da-

    tabase is that non-temporal database consider the data stored at the same time instant, i.e. without considering

    past and future database states; while temporal database contain time information attaching a time period to the

    data. Temporal databases are widely used in several applications (Pakadakis et al., 2006; Weekes et al., 2002;

    Chundi et al., 2009, Wua et al., 2009).

    Bruno & Garza (Bruno & Garza, 2010) introduced a new outlier detection method which is suitable fortemporal databases. These authors address the outlier detection problem as part of the data mining process, by

    defining the temporal quasi-functional dependency (i.e. a quasi functional dependency that varies through time),

    and present the so-called Temporal Outlier Detection (TOD) algorithm. In practice, the proposed approach ex-

    tracts the temporal association rules from the database and then combines them to discover temporal quasi-

    function dependency. The association rules is the pattern knowledge existing in a given dataset, i.e. association

    rules mining is a technique for discovering data dependencies (Liang et al., 2005). Temporal association rules

    are an extension of the association rules concept considering also the antecedent and the consequence (Bruno &

    Garza, 2010).

    The algorithm extracts quasi-functional dependencies with a dependency degree value higher than (or

    equal to) a user-specified threshold. Then for each temporal quasi-functional dependency a set of data is select-

    ed to be deleted in order to change the temporal quasi-functional dependency into a potential temporal func-

    tional dependency. The removed data are defined as outliers.Another particular way to represent data that have been frequently used for outliers detection is the one

    based on rough sets. The rough set concept is based on the assumption that each observation of the universe is

    associated to a specified amount of information. Objects with the same information are indiscernible; any set of

    all indiscernible observations is referred to as crisp set, otherwise the set is imprecise or rough. Rough set theo-

    ry was introduced by Pawlak (Pawlak, 1982; Pawlak, 1991; Pawlak et al., 1995) and it is interesting in the

    study of intelligent systems characterized by incomplete and insufficient information. Several works demon-

    strate the importance of the rough set approach especially in the fields of machine learning and data mining

  • 8/22/2019 Data Processing for Outliers Detection

    5/18

    (Lin & Gereone, 1996; Pawlak et al., 1995; Skawron & Ranszer, 1992; Yao et al., 2003). In rough sets, da

    model information is organized in a table called data system. If there are attributes which derive from a clas

    fication operation, the data table is also called a decision system. Each rough set, in contrast to precise se

    cannot be characterized by some information and is also defined by lower approximation, upper approximati

    and boundary region. The lower approximation is also called positive region, while the upper approximation

    called negative region. The positive region includes all the observations belonging to the considered conce

    while the upper approximation includes observations which possibly belong to the concept. The difference btween the two regions is the boundary region. Figure 1 show an example of rough set.

    Figure 1. An example of rough set

    Jang et al. (Jang et al., 2009) propose a method which combines the rough set theory and the outlier dete

    tion methods suggesting two different approaches: a sequence-based outlier detection in information systems

    rough set theory and a classical distance-based outlier detection method applied to rough sets.

    The definition of sequence-based outliers in an informal system is inspired by Hawkin's definition (Hawk

    1980) and the basic idea is built from the work by Skowron and Sinak (Skowron & Sinak, 2004), that intduced the basic concepts for approximate information exchanges using information granules.

    The basic idea is as follows. Given an information system, defined by a quadruple IS=(U,A,V,f) where Ui

    non-empty set of observations,A represents a non empty set of attributes, Vis the union of attributes andfis

    informational function which links one value of each attribute to each observation included in U, for each d

    x belonging to U, ifx differs (on the basis of some characteristic) from other objects in U, it is labeled as

    outlier with respect toIS. The second approach applies a traditional distance-based outlier detection method

    rough sets in order to calculate the distance between two objects in an informational system. To this aim it

  • 8/22/2019 Data Processing for Outliers Detection

    6/18

    necessary to use a suitable distance metric for nominal attributes in rough set theory. An appropriate distance

    function for nominal attributes, that is called Value Difference Metric (VDM), was introduced by Stanfill &

    Waltz (Stanfill & Waltz, 1986).

    A value difference metric between two objectsx andy is defined as follows:

    VDM(x,y) = fdf(xf,yf) (9)

    wherefis the feature,xf the value of objectx on feature f,yf is the value of objecty on the featurefand dis thedistance between two objects.

    5 Recent Artificial Intelligence-based Approaches to Outliers Detection

    Artificial Intelligence (AI) is a branch of computer science aiming at providing machines with a sort of intelli-

    gence, similar to the one characterizing living beings. Actually many definitions of AI can be found in litera-

    ture: in particular Russel & Norvig (Russel & Norvig, 2003) defined an intelligent agent as a system that

    perceives its environment and takes actions that maximize its chances of success. Nowadays, the term AI is

    widely used to indicate a variety of methods and techniques, such as neural networks, fuzzy logic and genetic

    algorithms.

    In the most recent years, the ever increasing application of AI techniques leads many researcher to evalu-

    ate the possibility of exploiting some of them for outlier detection. Thus many works have been proposed to

    improve already existing methods already existed or to introduce new algorithms.

    5.1 Support Vector Machine-based Methods

    The SVM algorithm, introduced by Vapnik (Vapnik, 1995)is essentially a binary classification algorithm, alt-

    hough it has been extended to multi-classes problems. The data belonging to the different classes need to be

    separated by a hyperplane but they are not always well separable. To overcome this, the data are mapped to a

    feature space with higher dimensionality, where the data separation through hyperplanes is easier. The SVM

    classifier is widely used in many disciplines because it has a high accuracy and it is able to treat with high di-

    mensional data (Ben-Hur&Weston, 2010).

    SVM-based methodologies are been widely used for outlier detection, such as, for instance, in (Tax &Juskczak, 2002; Guo et al. 2008; Peng et al., 2010, Zhang et al, 2008), because they do not require a-priori

    knowledge about any kind of statistical model, can be applied to data with high dimensionality and provide an

    optimum solution maximizing the margin of decision boundary.

    A modification of SVM algorithm that is suitable to detect outliers was proposed by Scholkopf (Scholkopf

    et al., 2001), who suggested a method of adapting the SVM method to one-class classification problems. One-

    class SVM is an unsupervised algorithm which maps input data into a high dimensional feature space and,

    through several iterations, finds the hyperplane which best separates the training samples from the origin. Also

    the One-class SVM is a normal two-class SVM where all training samples belong to the first class and the

    origin belong as only member to the second class.

    One-class SVM method maps data into a feature space through an appropriate kernel function and the

    most popular choices of kernel functions used in SVM method are linear, polynomial, gaussian and sigmoidal

    functions. The final aim is to separate the mapped vector from the origin with maximum margin.An advantage of one-class SVM for outlier detection is due to a high True Positive Rate (TPR) (that is the

    probability to correctly detect the outliers) but the disadvantage is due to also a high False Positive Rate (FPR)

    (i.e. the probability to misclassify as outliers samples which are not outliers). To solve this problem Tian & Gu

    (Thiang & Gu, 2010) proposed a novel one-class model which combines one-class SVM and Particle SwarmOptimization (PSO) algorithms (Kennedy & Ebehart, 1995; Shi & Ebehart, 1998). The PSO algorithm is in-

    spired by social behavior insects, birds and fish. It is used to optimize a given problem by trying to iteratively

  • 8/22/2019 Data Processing for Outliers Detection

    7/18

    candidate solutions. The candidate solution are included into an initial population. This algorithm is successf

    ly applied to a wide variety of problems and has performance comparable to genetic algorithms. In this a

    proach PSO algorithms are used to identify the optimum SVM parameters obtaining a high detection rate w

    a low FPR. The combination of SVM classifier and PSO algorithm suggests that outliers are effectively dete

    ed through the optimization of the classifier which is built through a suitable parameters selection and bound

    ry movement strategy. The results show that the proposed approach improves the robustness of the overall d

    cision and the best compromise between TPR and FPR is obtained. Other recent examples of outliers detectias one-class learning problem are presented in(Schweizer & Moura, 2000, Miller & Brewning, 2003, Sch

    kopf et al. 2001, Banerjee et al., 2006, Campbell & Bennet, 2001, Ratsch et al. 2002, Markou & Singh, 200

    Han & Cho, 2006, Abe et al., 2006) and several other applications exploit SVM based techniques to detect o

    liers with satisfactory results (Davy et al. 2006, Zhang et al. 2009, King et al. 2002, Gardner et al. 2006, Esk

    et al. 2002, Lazarevic et al. 2003, Giacinto et al. 2008, Roberts & Tarassenko, 1994, Tax & Duin, 1999, Tax

    Duin, 2004).

    5.2 Fuzzy Logic-based Methods

    Fuzzy Logic (FL) is connected with the theory of fuzzy sets, a theory which provides classes of objects with u

    sharp boundaries and a single object can simultaneously belong to different sets with different degrees

    membership. A Fuzzy Inference System (FIS) (Ross, 2004) calculates the mapping from a given input to output by using fuzzy logic. The input variables are also mapped into sets of membership functions cal

    "fuzzy set" and the process of converting a crisp value to a fuzzy value is named "fuzzification". The process

    fuzzy inference involves membership functions (MF), i.e. a curve that defines how each point in the input spa

    is mapped to a membership value or degree of membership between 0 and 1; fuzzy logic operators (and,

    not) and if-then rules. The rules results are mapped into a membership function and are combined to giv

    crisp answer. This last process is called "defuzzification".

    In the last year, a novel interesting approach, called Fuzzy Rough Semi-Supervised Outlier Detecti

    (FRSSOD) (Xue et al. 2010) was proposed. This approach combines the Semi-Supervised Outlier Detecti

    method (SSOD), that was proposed by Gao et al. (Gao et al, 2006), with a clustering method introduced by H

    and Yu. (Hu & Yu, 2005), that is named Fuzzy Rough C-Means clustering (FRCM). Naturally this method b

    longs to clustering-based approaches. The proposed method integrates the advantages of SSOD and FRCM a

    decides only if the points on the boundary can be considered as outliers. In order to deeply understand FRSOD, the SSOD method and FRCM approach must be known as well. A brief description of this approaches

    provided in the following.

    Many outlier detection methods are unsupervised algorithms (Breunig et al., 2000, Jin et al. 2001, Eskin

    al., 2002) and often the unsupervised methods have a high FPR and a low TPR. The supervised detecti

    methods have been introduced in order to improve the algorithm performance (Marsland, 2001, Kazarev

    2003, Markou,2006), but the collection of a large amount of labeled training data can be quite difficult. F

    these reasons recently semi-supervised outlier detection methods (Li et al., 2007, Zhang et al. 2005, Gao et a

    2006, Xu & Liu, 2009) have been presented.

    SSOD uses both unlabeled and labeled data, by thus improving accuracy without the need for a high nu

    ber of labeled data. Let X = {x1, x2, . . . xn} be a set of data points drawn from Rm

    . The first lpoints ofX, w

    l

  • 8/22/2019 Data Processing for Outliers Detection

    8/18

    where ch is the center of clusterCh, dist represents the Euclidean distance and 1 , 2 are adjusting parameters.

    The first term is inherited from k-means clustering objective function. As only normal points are partitioned

    into clusters, outliers are not included in this term. The second term is used to constrain the number of outliers

    not to be too large. The third term is used to maintain consistency of labeling proposed by authors with existing

    labels. The minimization of the above-defined objective function leads to point out outliers that do not belong

    to any clusters.

    FRCM was introduced by Hu & Yu (Hu & Yu, 2005) as a combination between Fuzzy C-means method

    and Rough C-means. Fuzzy C-means method is based on the partition of data set points into clusters centers. A

    fuzzy membership for every cluster in the range 0-1 is assigned to each point; each object belongs to some or

    all of the clusters with some fuzzy degree. The results depend on clusters centers initialization (see Subsection

    3.3). In RCM method the concept of C-means clustering is added to the concept of rough set (already treated in

    Section 4), i.e. each cluster is seen as a rough set which has a lower approximation region, an upper approxima-

    tion region and a boundary region. The upper approximation region of a rough set includes samples in the clus-

    ter which are also members of other clusters, i.e. RCM classifies the object space into three parts, lower ap-

    proximation, boundary and negative region.

    The main difference between rough clustering and classical clustering lies in the fact that that in rough

    clustering a sample is member of more than one cluster and this allows overlaps between clusters. In particular

    Lingras assumes the following properties:- A data can be a member only of one lower approximation.

    - The lower approximation of a given cluster must be a subset of its upper approximation

    - If a data is not a member of any lower approximation then it is a member of two or more upper approxi-

    mations.

    - Data in boundary region are uncertain data and are assigned to at least two upper approximations.

    RCM has many advantages and its applicability is extended to several fields that have uncertain information

    granulation.

    FRCM combines the advantage of fuzzy set theory and rough set theory and integrates fuzzy membership

    value of each sample to the lower approximation and boundary area of a cluster. FRCM can be formulated as

    follows. LetX ={x1, x2, . . ., xn}be a set of data points and let Ck and kC be, respectively, the lower and upper

    approximation of a cluster. kC C C

    B

    k k=

    is the boundary area, c={c1, c2, ..., ck} is a vector of kcenters ofclusters and u={uik} are memberships of a nxKmatrix. FRCM partitions data into two classes: a lower approx-

    imation region and a boundary region; only objects belonging to the boundary region are fuzzyfied. The prob-

    lem of FRCM lies in the optimization of the following function:

    (11)

    FRSSOD exploits both the above-described methods and combine the two approaches in a novel one. Let

    X={x1, x2, . . ., xn}be a set of data points and let Ybe a subset ofXthat is formed by l

  • 8/22/2019 Data Processing for Outliers Detection

    9/18

    where1

    and2

    are adjusting positive parameters and are applied to make the three terms compete with ea

    other, while m is a fuzziness weighting exponent (m>1). As only normal points are partitioned into clusters

    the idea of SSOD approach) outliers do not contribute to the first term. The second term avoids the detection

    an extremely large number of outliers. The third term preserves consistency of user labeling with existing

    bels and punishes mislabeled points. FRSSOD does not only uses unlabeled and labeled data but also integra

    fuzzy and rough sets theory; therefore it can be applied to many fields that have fuzzy information granulati

    or do not take a decision under certain conditions. The experimental results show that FRSSOD has many a

    vantages over SSOD, as it improves outliers detection accuracy and reduces false alarm rate under the guidan

    of labeled points. On the other hand, the performance of FRSSOD depends on the selection of the number

    clusters and on the adjusting parameters 1 and 2.

    Fuzzy logic has also been applied for outlier detection as a tool to combine different outlier detecti

    methods, in an attempt to exploit the advantages of each of them while overcoming their drawbacks. Also th

    approach does not belong only to a category but include more categories of outlier detection based-metho

    Cateni et al (Cateni et al., 2009) proposed a novel method based on fuzzy logic theory, which is a substant

    improvement of a first attempt previously proposed by the same authors in (Cateni et al., 2007, Cateni et a

    2008) and combines a distance-based method, a density-based method, a clustering-based method and a dist

    bution-based method. This method does not require any a priori assumption on the data and it is able to det

    outliers without the need to made preliminary statistical analysis or parameters tuning. Therefore this approacan be adopted even by inexperienced users. For each sample four features are calculated by using the m

    popular outlier detection techniques (see Section 3). The inputs are represented by Mahalanobis distance (M

    halanobis, 1936), a membership function evaluated through fuzzy c-means technique (Bezdek, 81, Dunn, 197

    the local outlier factor (Breunig et al., 2000) and the result of Grubb test (Grubbs, 1969).

    Noticeably the Fuzzy C-means algorithm requires the number of cluster to be known a priori, while in this ca

    such number is automatically calculated. In this algorithm the clustering based method is treated through bo

    fuzzy c-means algorithm and the validity measure based on inter and intra-cluster distance measure and p

    posed by Ray and Turi. (Ray & Turi, 1999). This approach consists in calculating the distance between a po

    and its cluster center to decide if the clusters are compact.

    Also two measures are defined, the intra-cluster that is defined as the average between a point and its clu

    ter center and the inter-cluster which is defined as the distance between clusters. To determine the optim

    number of clusters the intra-distance must be minimized while the inter-distance must be maximized. Their tio, that is named validity measure, is defined as follows:

    validity = intra-distance/inter-distance (13

    and the optimum number of clusters is calculated by minimizing the validity measure (13).

    The four features are fed as inputs to the fuzzy inference system (FIS) (Ross, 2004) that provides as output

    index in the range (0,1) which represents a measure of probability that the selected sample is an outlier. T

    adopted FIS is of the Mandani type (Mandani, 1974). Figure 2 depicts a scheme of the proposed method.

    The method have been tested in an industrial context and the results show that this approach outperform

    the traditional techniques.

  • 8/22/2019 Data Processing for Outliers Detection

    10/18

    Figure 2: Block diagram of the outlier detection based fuzzy logic

    5.3 Genetic Algorithm-based Methods

    Genetic Algorithms (GA) belong to the wider class of evolutionary optimization methods: their main feature

    consists in their attempt to mimic the evolution of living organisms through generations: this natural process is

    simulated in order to progressively build a solution to a certain problem which is optimal under a (or, some-

    times, more than one) arbitrary criterion. A set of possible solutions to the considered optimization problem is

    organized into apopulation of candidate solutions which is evolved by means of the GA engine. At each gen-eration of the GA the goodness of each candidate solution is evaluated through a performance measured usual-

    ly named fitness function: the individuals with higher fitness are used to build a the population at the subse-

    quent generation. The best candidates not only survive but are also combined in order to generate new (and

    hopefully better) individuals, such as it happens in natural evolution. The GA population is evolved, generation

    by generation, until the achievement of an arbitrarystop condition which typically involves the attainment of a

    particularly high fitness value by one of the candidates or by part of the population, the completion of a prede-

    termined number of generations or the protract evolution of the population without any improvement in terms

    of goodness of the candidates.

    Tolvi (Tolvi, 2004) proposed an application of GA for outlier detection based on a statistical approach.

    The problem of the association of data to the best possible model is faced by firstly finding any outliers in the

    data; a number of initially candidate models are selected and examined. Tolvi introduced a new nuisance in

    outlier detection, the possibility of smearing and masking. Smearingmeans that the presence of an outlier caus-

    es another normal observations to be misclassified as outlier, while maskingmeans that an outlier prevents an-

    other datum from being correctly classified as outlier through an outlier detection method. In (Tolvi, 2004) an

    outlier detection method in linear regression modeling is treated. GAs are used for outlier detection avoiding

    the potential problems of smearing and masking and simultaneously the problem of variable selection is dis-

    cussed. The motivation to treat two different problems (i.e. outlier detection and variable selection) lies in the

    fact that the choice of the variables to select can affect the outlier detection and vice-versa (Chatterjee & Hadi,

  • 8/22/2019 Data Processing for Outliers Detection

    11/18

    1994). Potential outliers can be included into the linear regression model using a dummy variable. A dumm

    variable is a binary vector which is zero for outlier samples and one for non-outlier samples. Also the aim

    the proposed approach is the selection of the best model where the models have several combinations of

    possible dummy variables. The outlier detection is based on the use of informational criteria, in particular

    Bayesian Information Criterion (BIC) (Schwarz, 1978).

    Schwarz introduced BIC to serve as an approximation to a transformation of the Bayesian posteriori pro

    ability of a candidate model. The computation of BIC depends on the model complexity, i.e. on the numberparameters of the model selected. Let us suppose that X = {x1, x2, . . ., xN} is the dataset to be modeled and M

    {M1, M2, . . ., Mk} are the candidates of parametric models. Let L(X,M) be the maximization of likelihood fun

    tion for each model, the definition of BIC is described as:

    BIC = log L(X,M)-1/2log(N) (1

    where is the number of parameters in the modelM. If BIC has a low value, which corresponds to have a mo

    el with few parameters and small residuals, it is selected by the GA. The proposed GA starts with a populati

    size in each generation compose by 40 individuals, which is randomly generated. Each individual contai

    genes with value zero with a probability of 0.9 and genes with unitary value with a probability 0.1. The alg

    rithm becomes faster adding preliminary information about which of samples are potentially outliers and, a

    hough this paper treats linear regression models, the method is also suitable for other statistical models.

    Other recent applications of GA for outlier detection (Aggarwal & Yu, 2001; Yan et al., 2004; Bandypadhyay & Santra, 2008) can be found in literature.

    6 Outlier Detection in Image Processing

    Outlier detection method is an important tool beside image processing analysis. In an image an outlier can a

    pear when the image changes over time or can be represented by regions which are anomalous with respect

    the rest of a quasi-static image (i.e. with very small variations through time). Outliers can be due to motion,

    sertion of anomalous objects or instrumentation errors. The outlier detection process is a fundamental p

    processing tool in many interesting image analysis application, such as satellite imagery, spectroscopy, ma

    mographic image or video surveillance (Chandola et al., 2009). Often in image processing the data present bo

    spatial and temporal characteristics and outlier detection is an important task to identify the false matches.

    Malpica et al. (Malpica et al., 2008) propose an innovative technique for outlier detection in hyperspect

    images. A hyperspectral image is a digital image where each element of the image (pixel) consists of an asso

    ated electromagnetic spectrum. It can also be seen as a cube of data (called hypercube). Due to high number

    bands, the big amount of data can result redundant and the most interesting information is difficult to extr

    because of the high dimensionality of data themselves. After detection, anomalous points can be retained b

    cause they contain interesting information or can be discarded/deleted. The authors propose a method based

    Projection Pursuit (PP) (Friedamn & Turkey, 1974; Kruskal, 1969) to detect possible anomalies. This tec

    nique is based on the use of one or more linear combinations of the original features with the aim of maxim

    ing an index representing an interestingness measure. The results show that PP technique can detect group

    outliers or isolated outlier; the proposed algorithm was applied to AHS and HYDICE hyperspectral imagerie

    The common Principal Component Analysis (PCA) (Jolliffe, 2002) is a special case of PP. In PCA the redution of data is made by choosing the linear combination of the considered variables which maximizes the va

    ance of the projected data, also the index is represented by the variance. An important contribution for a ne

    perspective PCA-based approach is suggested by Ding and He (Ding & He, 2004). A method based on PCA

    reduce dimensionality and detect outliers in hyperspectral imagery is treated in (Goovaerts et al., 2005) wh

    in (Saha et al., 2009) outlier detection through PCA is used also to automate snake contours for object detecti

    Snakes are deformable models used to estimate the boundary when the object shape is partially unknown,

    example of use of snake is shown in Figure 3.

  • 8/22/2019 Data Processing for Outliers Detection

    12/18

    Figure 3: Example of snake contour

    The deformable curve is a sort of elastic curve which is able to approximate the considered image features.

    A novel method for active contour models or snakes is proposed by Chan (Chan, 2001). It is an interesting ap-

    proach because the proposed model is able to detect objects that contain boundaries not necessarily defined by

    the gradient. In this research area outliers are features which do not lie in the object boundary. In (Nascimento,

    2005) an algorithm for detection of objects boundaries in the presence of outliers is proposed. A deformable

    contour (as in snakes) approximates the object boundary through the Maximum A Posteriori (MAP) estimate

    method (Abrantes & Marques, 1996) using the Expectation Maximization (EM) (McLachlan & Krishnan,

    1997).

    Dashti et al. (Dashti et al., 2010) proposed the ET-DRN method to understand the relationships between

    objects in a given dataset. The hierarchical clustering procedure includes the Euler algorithm to assign objectsto clusters, the GA to increase the density between objects inner each cluster and, finally, the Kullback-Leibler

    divergence to calculate the dissimilarity of the clusters. Objects are considered in high dimensionality and are

    examined as objects of a digital geometry; also it is possible to build a sensible mathematical structure where

    outliers are clearly detectable.Silveira et al. (Silveira et al., 2008) proposed a new method which classifies im-

    age features as valid or invalid (i.e. outliers) by organizing edge points into connected segments (the so-called

    strokes). An adaptive stopping force which allows the contour to bridge the invalid features and stop at the val-

    id features is applied. For each stroke is than assigned a confidence degree during the evolution process, also

    the weights are given by the probability that a stroke is valid.

    7 Case Study Using Synthetic Data

    In order to show how the different classical methods and a recent proposed method work an example using

    synthetic data is proposed. The created database includes 100 samples of a random variable whose probability

    density function derives from the composition of two Gaussian functions, as show in Figure 4. 10 outliers, in-

    dicated with red circles in Figure 4, have been included in such database.

  • 8/22/2019 Data Processing for Outliers Detection

    13/18

    Figure 4: The synthetic dataset and, on the left, the distribution of the data that are not outliers.

    Four classical outlier detection methods have been applied to this database: a distance-based approac

    where the Mahalanobis distance is exploited, a density-based approach based on LOF algorithm, a clusterin

    based method which exploits the Fuzzy C-means as clustering algorithm and, finally, a distribution-based

    proach using the Rosners algorithm. Moreover, on the same database also an AI-based technique has be

    tested, in particular the one proposed in (Cateni et al., 2009). The results of these tests are depicted in Table 1

    Approach Outlier detected (%)

    Distance-based (Mahalanobis distance) 30% (A - E - I)Density-based (LOF) 30% (G - H - L)

    Clustering-based (Fuzzy C-means) 30% (B - D - F)

    Distribution-based (Rosner's test) 70% (A - C - E - G - H - I - L)

    AI-based (Fuzzy approach) 100% (A - B - C - D - E - F - G - H - I - L)

    Table 1: Test results of some outlier detection techniques on the synthetic database of Fig. 4.

    The results put into evidence the particular features of the tested algorithms. In particular, the distanc

    based approach is capable to point out the outliers that mostly differ from the mean value, while the densi

    based approach detects only outliers that are isolated from data. The clustering-based approach finds isola

    outliers after a clustering operation. Finally the distribution-based approach considers as outliers those poi

    that deviate from the model. In this example the distribution-based method works quite well because the initdataset is created by two Gaussian distributions. The fuzzy-based approach, combining the several classi

    methods, outperforms all the traditional techniques as it exploits all their capabilities by compensating th

    weaknesses.

  • 8/22/2019 Data Processing for Outliers Detection

    14/18

    8 Conclusion

    A survey about outlier detection methods is proposed. Both traditional approaches and their recent enhance-

    ments as well as some interesting applications are presented and discussed. Finally a case study is proposed,

    that is based on a synthetic database, with the purpose to show how the different methods work. The conclusion

    is that the potential and efficiency of an outlier detection method strongly depend on the kind and distribution

    of the data that are processed. For instance clustering-based methods are very effective if the data are stronglyclustered, while distribution-based methods can work quite well if the hypotheses that are required on the data

    distribution are correct, which means that they can be applied only when some a-priori knowledge on the data

    distribution are available. If no information is available on the data to process and/or if the data features can

    change through time in a non predictable way, than probably the best solution is to try different methods and/or

    apply a combination of many outlier detection methods which are based on different principles. Fuzzy logic

    can provide a powerful tool to automatically perform such combination, but also other combination procedures

    are possible.

    References

    Abe, N, Zadrozny, B. & Langford, J. Outlier Detection by Active Learning, Proc. ACM SIGKDD 06, 2006, (pp. 504-509).

    Abrantes, A. and Marques, J. A class of constrained clustering algorithms for object boundary detection, IEEE Trans. Image Pro-

    cess., vol.5, no. 11, pp. 15071521, Nov. 1996.

    Aggaewal, C.C., Yu, P.S. Outlier detection for high dimentional data.Proceeding of ACM SIGMOD Conference, 2001, (pp. 37-

    47)

    Aitchison, J. and Brown, J.A.C. The lognormal distribution, Cambridge University Press, Cambridge UK, 1957

    Apiletti, D. Baralis, E. Bruno, G. Ficarra, e. Data cleaning and semantic improvement in biological databases, Journal of Integra-

    tive Bioinformatics 3 (2) (2006) 111.

    Bandyopadhyay, S. & Santra, S. A genetic approach for efficient outlierd etection in projected space. Pattern Recognition, 41,

    2208, (pp. 1338-1349).

    Banerjee, A., Burlina, P. & Diehl, C. A Support Vector Method for Anomaly Detection in Hyperspectral Imagery, IEEE Trans.Geoscience and Remote Sensing, vol. 44, no. 8, 2006 (pp. 2282-2291).

    Barnett, V, Lewis, T. Outliers in Statistical Data, 3rd ed., John Wiley & Sons, New York, 1984.

    Ben-Hur, A. and Weston, J. A user's guide to Support Vector Machines, Meth Mol Biol 609, 2010, (pp. 223-239)

    Bezdek, J.C., Pattern Recognition with Fuzzy Objective Function Algoritms, Plenum Press, New York, 1981

    Bishop, C. Neural Network for Pattern Recognition. Oxford UniversityPress, Oxford, 1995.

    Breunig, M.M, Kriegel, H.P, Ng, R.T., and Sander, J. LOF: Identifying Density-based Local Outliers, Proc. of the 2000 ACM

    SIGMOD Intl Conf. on Management of Data, ACMNew York, NY, USA, June 2000, Vol. 29, Issue 2, pp. 93-104.

    Bruno, G. & Garza, P. TOD: Temporal outlier detection by using quasi-functional temporal dependencies. Elsevier, Data &

    Knowledge Engineering, 69, 2010, (pp.619-639).

    Bruno, G., Garza, P., Quintarelli, E., Rossato, R. Anomaly detection through quasi-functional dependency analysis, Journal of

    Digital Information Management5 (4) (2007) 191200.

    Campbell, C. & Bennet, K.P. A Linear Programming Approach to Novelty Detection, Advances in Neural Information Pro-

    cessing Systems 13, 2001 (pp. 395-401).

    Canfield, R.V., Taillie, C., Patil, G.P, Baldessari, B.A. Extreme value theory with applications to hydrology. In Statistical distri-

    butions in scientific work, Vol. 6, Reidel Pubblishing Company, Dordrecht, Holland (pp. 35-49), 1981.

    Castillo, E. Extreme Value theory in engineering. New York: Academic, 1988.

  • 8/22/2019 Data Processing for Outliers Detection

    15/18

    Cateni, S., Colla, V., Vannucci , A fuzzy logic based method for outlier detection, Proc. 25th

    IASTED Int. Conf. Artificial Inte

    gence and Applications, AIA 2007 pp. 561-66, Innsbruck, Austria, 2007

    Cateni, S., Colla, V., Vannucci: Outlier detection methods for industrial applications, in Advances in Robotics, Automation

    Control, I-Tech Education and Publishing KG, Croatia, October 2008.

    Cateni, S., Colla, V., Vannucci, M., A fuzzy system for combining different outliers detection methods, In proceedings of the 2

    conference on proceedings of the International conference: Artificial intelligence and Applications , Innsbruck, Aust

    16-18 Febbraio 2009

    Chan, T.F. Active contours without edges. IEEE Transaction on image processing, Vol.10, N.2, February 2001.

    Chandola, V. Banerjee, A., Kumar, V. Anomaly detection survey. ACM Computing Surveys, September 2009.

    Chatterjee, S. & hadi, A.S. Sensitivity analysis in linear regression, Wiley New York, 1998.

    Chaudhuri, P. On a Geometric Notion of Quantiles for MultivariateData, J. Am. Statistical Assoc., vol. 91, no. 434, 1996 (

    862-872).

    Chundi, P., Subramaniam, M., Vasireddy, D.K., An approach for temporal analysis of email data based on segmentation, D

    and Knowledge Engineering68 (11) (2009) 12531270.

    Dashti, H.T., Kloc, M.E., Simas, T., Ribeiro, R.A., Assadi, A.H. Introduction of empirical topology in construction of relations

    networks of informative objects, IFIP Advances in Information and Communication Technology, Springer, 2010.

    Davy, M., Desobry, F., Gretton, A., Doncarli, C. An online support vector machine for abnormal events detection. Signal Proc86(8), 2006, (pp. 20092025)

    C.J. Date, H. Darwen, N. Lorentzos. Temporal Data & the Relational Model, First Edition (The Morgan Kaufmann Series in D

    Management Systems); Morgan Kaufmann; 1st edition; 2002, ISBN 1-55860-855-9.

    Ding, C., He, X. Principal component analysis and effective K-means clustering, SDM, 2004 (pp.497-501).

    Dunn, J.C. Some recent investigations of a new fuzzy partition algorithm and its application to pattern classification proble

    Journal of Cybernetics 4 (1974) 115.

    Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S. A geometric framework for unsupervised anomaly detection Detect

    intrusions in unlabeled data.In: Data Mining for Security Applications, vol. 19 , 2002.

    Friedman, J.H.,. Tukey, J.W. A projection pursuit algorithm for exploratory data analysis, IEEE Trans. Comput. C-23 (9) (197

    881--890.

    Gardner, A.B., Krieger, A.M., Vachtsevanos, G., Litt, B. One-class novelty detection for seizure analysis from intracrania leeg

    Mach. Learn. Res. 7, 2006, (10251044 )

    Giacinto, G., Perdisci, R., Del Rio, M., Roli, F.: Intrusion detection in computer networks by a modular ensemble of one-cl

    classifiers. Inf. Fusion 9(1), 2008, (pp 6982 ).

    Gao, J.,Cheng, H., Tan, P.N.,. Semi-supervised outlier detection, Proc. of the 2006 ACM Symposium on Applied Computi

    ACM Press, 2006, pp. 635636.

    Gilbert, R.O. statistical methods for enviromental pollution monitoring, Van Nostrand, Reinholds, New York, 1987.

    Goovaerts, P., Jacqueza, G.M.. Marcus, A. Geostatistical and local cluster analysis of high resolution hyperspectral imagery

    detection of anomalies, Remote Sensing Environ. 95 (2005) 351--367.

    Grubbs, F.E., Procedures for detecting outlying observations in samples, Technometrics 11, pp.1-21, 1969.

    Guo, S.M., Chen, L.C., Tsai, J.S.H. A boundary method for outlier detection based on support vector domain descrption. Patt

    Recognition 42, 2009, (pp. 77-83).

    S.-J. Han and S.-B. Cho, Evolutionary Neural Networks for Anomaly Detection Based on the Behavior of a Program, IEEE Tr

    Systems, Man, and Cybernetics B, vol. 36, no. 3, 2006 (pp. 559-570).

    Hawkin, D.,Identification of outliers, Chapman and Hall, London, 1980.

    Hodge, V.J., A survey of outlier detection methodologies, Kluver Academic Publishers, Netherlands, January 2004.

  • 8/22/2019 Data Processing for Outliers Detection

    16/18

    Hu, Q, Yu, D. An improved clustering algorithm for information granulation, in: Proceeding of 2nd

    International Conference on

    Fuzzy Systems and Knowledge Discovery (FSKD05), vol. 3613, LNCS, Springer-Verlag, Berlin Heidelberg Changsha,

    China, 2005, pp. 494504.

    Huhtala, Y., Krkkinen, J., Porkka, Toivonen, H. TANE: an efficient algorithm for discovering functional and approximate de-

    pendencies, The Computer Journal42 (2) (1999) 100111.

    Jang, F., Sui, Y. & Cao, C. Some issues about outlier detection in rough set theory, Expert Systems with Applications, 36,

    pp.4680-4687, 2009.

    Jolliffe, I. Princypal Component Analysis. Springer.New York, 2002.

    Kennedy, J. & Ebehart, R., Particle Swarm Optimization. Proceeding of IEEE International Conference on Neural Network, IV,

    pp. 1942-1948.

    King, S.P., King, D.M., Astley, K., Tarassenko, L., Hayton, P., Utete, S. The use of novelty detection techniques for monitoring

    high-integrity plant. In:Proceedings of the 2002 International Conference on Control Applications, Cancun, Mexico, vol.

    1, 2002, (pp. 221226 )

    Kivinen, J., Mannila, H. Approximate inference of functional dependencies from relations, Theoretical Computer Science 149 (1)

    (1992) 129149.

    Knorr, E.M., Ng, R. Algorithms for Mining Distance-Based Outliers in Large Datasets.,Proceeding VLDB,pp.392-403.

    Kruskal,J.B. Toward a practical method which helps uncover the structure of a set of multivariate observations by finding the lin-ear transformation which optimizes a new index of condensation, in: R.C. Milton, J.A. Nelder (Eds.), Statistical Com-

    putation, Academic Press, New York, 1969, pp. 427--440.

    Jin, W., Tung, A.K.H., Han, J. Mining Top-n Local Outliers in Large Databases. Proc. of the Seventh A SIGKDD Intl Conf. on

    Knowledge Discovery and Data Mining, ACM New York, NY, USA, 2001, pp. 293298.

    Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J. A comparative study of anomaly detection schemes in network in-

    trusion detection. In:Proceedings of Third

    SIAM Conference on Data Mining, San Francisco, vol. 3, 2003.

    Li, B. Fang, L. Guo, A novel Data Mining Method for Network Anomaly Detection based on Transductive Scheme, in Advances

    in Neural Networks, LNCS, vol.4491, Springer Berlin, 2007, pp. 1286-1292.

    Liang, Z., Ximming, T., Lin, L., Wenliang, J. Temporal association rule mining based on a T a-priori algorithm and its typical

    application. Proceedings of International Symposium on Spatio-Temporal Modeling, Spatial Reasoning Analysis, Datamining and Data Fusion, 2005.

    Lin, T. Y., & Gereone, N. (1996). Rough sets and data mining: Analysis of imprecise data. Dordrecht: Kluwer Academic.

    Lingras, P, West,C. Interval set clustering of web users with rough k-means,Journal of Intelligent Information System, vol. 23, no.

    1, July 2004, pp. 516.

    MacQueen, J. Some methods for classiication and analysis of multivariate observations. In Proceedings of the Fifth Berkeley

    Symposium on Mathematical Statistics and Probability, Vol. 1, university of California Press, Berkley, 1967, (pp. 1664-

    1677).

    Mahalanobis, P.C.. On the generalized distance in statistics, Proc. of the National Institute of Science of India, pp.49-55, 1936.

    Malpica, J.A., Rejas, J.C., Alonso, M.C. A projection pursuit algorithm for anomaly detection in hyperspectral imagery., Pattern

    Recognition, 41 (pp.3313-3327)

    Mandani E.H., Application of fuzzy algorithms for control of simple dynamic plant, Proc. of the IEEE Control and Science, No.

    121, pp. 298-316, 1974.

    Marsland, S., On-line Novelty Detection Through Selforganisation, with Application to Inspection Robotics. Ph.D. Thesis, Facul-

    ty of Science and Engineering, University of Manchester, UK, 2001.

    Markou, M. & Singh, S. A Neural Network-Based Novelty Detection for Image Sequence Analysis, IEEE Trans. Pattern Analysis

    and Machine Intelligence, vol. 28, no. 10, 2006, ( pp. 1664-1677).

    Matsumoto, S; Kamey, Y; Monden, A. Comparison of Outlier Detection Methods in Faul proneness

  • 8/22/2019 Data Processing for Outliers Detection

    17/18

    Models. Proceedings of the 1st International Symposium on Empirical Software Engineering and Measurement (ESEM200

    pp.461-463, September 2007.

    McLachlan, G.J. & Krishnan, T. The EM Algorithm and Extensions. New York: Wiley, 1997.

    Miller, D.J, and Browning, J. A Mixture Model and EM-Based Algorithm for Class Discovery, Robust Classification, and Out

    Rejection in Mixed Labeled/Unlabeled Data Sets, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no.

    Nov 2003 (pp. 1468-1483).

    Nascimento, J.C. Adaptive snakes using the EM algorithm.IEEE TRANSACTIONS ON IMAGE PROCESSING, Vol. 14, N.

    2005, (pp. 1678-1686)

    Papakakis,N., Antoniou, G., Plexousakis, D. The ramification problem in temporal databases: changing beliefs about the p

    Data and knowledge Engineering59, 2, 2006, (pp 379-434).

    Pawlak, Z. (1982). Rough sets. International Journal of Computer and Information Sciences, 11, 341356.

    Pawlak, Z. (1991). Rough sets: Theoretical aspects of reasoning about data. Dordrecht: Kluwer Academic Publishers.

    Pawlak, Z., Grzymala-Busse, J. W., Slowinski, R., & Ziarko, W. (1995). Rough sets. Communications of the ACM, 38(11), 8

    95.

    Peng, X., Chen, J., Shen, H. Outlier Detection Method BasedOn SVS and its applicationin Copper-matte Converting, IE

    ISBN: 978-1-4244-5181-4, 2010.

    Ramakrishnan, R., Gehrke, J. Database Management Systems,McGraw-Hill Science Engineering Math, 2002.

    Ratsch, G., Mika, S., Scholkopf, B. and Muller,K., Constructing Boosting Algorithms from SVMs: An Application to One-Cl

    Classification,IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 9, Sept 2002, ( pp. 1184-1199).

    Ray, S., Turi, .H., 1999. Determination of number of clusters in k-means clustering and application in colour image segmentati

    Proc. 4th

    Int. Conf. Advances in Pattern Recognition and Digital Techniques , (ICAPRDT 99), Calcutta, India, 27-29 D

    cember, 1999, pp. 137-43.

    Roberts, S. & Tarassenko, M.. A Probabilistic Resource Allocating Network for Novelty Detection, Neural Computation, vol

    no. 2, 1994, (pp. 270-284).

    Rosner, B. Percentage points for a generalized ESD many-Outlier procedure. Technometrics, 25, (pp. 165-172), 1983.

    Ross, T.J. Fuzzy logic with engineering applications, John Wiley &sons ltd England, 2004.

    Russell, S. J.; Norvig, P. Artificial Intelligence: A Modern Approach (2nd ed.), Upper Saddle River, New Jersey: Prentice H2003.

    Saha, B.D.; Roy N. & Zhang, H. Snake validation: a PCA-based outlier detection method, IEEE signal processing letters, vol

    N 6, 2009.

    Schwarz, G. Estimating the dimensional of a model. The annual stat, 6, 1978, (pp. 461-464).

    S.M. Schweizer and J.M.F. Moura, Hyperspectral Imagery: Clutter Adaptation in Anomaly Detection, IEEE Trans. Informat

    Theory, vol. 46, no. 5, Aug.2000, ( pp. 1855-1871).

    Scholkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., and Williamson, R.C., Estimating the Support of a High-Dimensio

    Distribution,Neural Computation, vol. 13, no. 7, 2001, (pp. 1443-1471).

    Shannon, C. E. (1948). The mathematical theory of communication. Bell System Technical Journal, 27(34), 373423.

    Shi, Y. Ebehart, R.C. A modified Particle Swarm Optimization. Proceedings of IEEE International Conference on Evolution

    Cmputation,pp. 69-73.

    Silveira, M., Nascimento, J.C., Marques, J.S., Level set segmentation with outlier rejection. IEEE, ICIP 2008.

    Skowron, A., & Rauszer, C. (1992). The discernibility matrices and functions in information systems. Handbook of applicati

    and advances of rough set theory (Vol. 11, pp. 331362). Dordrecht: Kluwer Academic Publishers.

    Skowron, A., & Synak, P. (2004). Reasoning in information maps.Fundamenta Informaticae, 59, 241259.

    Stanfill, C., & Waltz, D. (1986). Towards memory-based reasoning. Communications of the ACM, 29(12), 12131228.

  • 8/22/2019 Data Processing for Outliers Detection

    18/18

    Tax, D.M.J & Duin, R.P.W. Support vector data description.Machine Learning54, (pp. 45-66), 2004.

    Tax, D.M.J & Duin, R.P.W. Support vector domain description.Pattern Recogn. Lett. 20 11-13, (pp. 1191-1199), 1999.

    Tax D.M.J. & Juszczak, P. Kernel whitening for one-class classification,Lecture Notes in Computer Science, vol. 2388, Springer,

    Berlin, 2002, (pp. 4052).

    Theodoridis, S., Koutroumbas, K. Pattern Recognition, 3rd edn. Academic Press, San Diego,

    2006.Thiang, T. & Gu, H. (2010). Anomaly detection combining one-class SVMs and particle swarm optimization algorithms. In Non-

    lynear Dyn, Springer 61(pp. 303310).

    Tolvi, J. Genetic algorithms for outlier detection and variable selection in linear regression models, Soft Computing 8, Springer-

    VerlagV (pp. 527-533)

    Vapnik, V. The Nature of Statistical Learning Theory (M).New York: Springer-Verlag. 1995

    Weekes, C.D, Vose, J.M,. Lynch, J.C., Weisenburger, D.D.. Bierman, M.M., Greiner, T., Bociek, G., Enke, C,. Bast, M., Chan,

    W.C., Armitage, J.O, Hodgkinn disease in the elderly: improved treatment outcome with a doxorubicin containing regi-

    men,Journal of Clinical Oncology 20 (4) (2002) 10871093.

    Wua, S.Y., Chen, Y.L., Discovering hybrid temporal patterns from sequences consisting of point and interval based events,

    Data and Knowledge Engineering68 (11) (2009) 13091330.

    Xue, Z & Liu, S. Rough based Semi-Supervised Outlier Detection. Sixth International Conference on Fuzzy system andknowledge Discovery, (pp. 520-524).

    Xue, Z, Shang, Y., Feg, S., Semi-supervised outlier detection based on fuzzy rough C-means clustering, Mathematics and Com-

    puters in Simulation, 80, 2010, (pp.2011-2021).

    Yan, C., Chen, G., Shen, Y. Outlier analysis for gene expression data,J. Cmput. Sci. Technol., 19 (1), 2004, (pp.13-21).

    Yao, Y. Y., Zhao, Y., & Maguire, R. B. (2003). Explanation oriented association mining using rough set theory. InProceedings

    of the ninth international conference on rough sets, fuzzy sets, data mining, and granular computing (pp. 165172). China.

    Zhang, D.Gatica-Perezs, D., Bengio, S..and McCowan,I. Semi-supervised Adapted HMMs for Unusual Event Detection, IEEE

    Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05), IEEE Press, June 2005, vol.1,

    pp.611618.

    Zhang, Y., Meratnia, N. and Havinga, P.J.M. (2008) Outlier Detection Techniques For Wireless Sensor Networks: A Survey.

    Technical Report TR-CTIT-08-59, Centre for Telematics and Information Technology University of Twente, Enschede.ISSN 1381-3625

    Zhang, Y., Liu, X.D., Xie, F.D., Li, K.Q. Fault classifier of rotating machinery based on weighted support vector data description.

    Expert Syst. Appl. 36(4), 2009, (pp.79287932)