unsupervised anomaly detection for high dimensional data · anomaly detection? i anomaly is a...

39
Unsupervised Anomaly Detection for High Dimensional Data Dr. Thayasivam, Umashanger Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Upload: others

Post on 06-Jul-2020

25 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Unsupervised Anomaly Detection for HighDimensional Data

Dr. Thayasivam, Umashanger

Department of Mathematics, Rowan University.

July 19th, 2013

International Workshop in Sequential Methodologies(IWSM-2013)

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 2: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Outline of Talk

I Motivation : Biometrics

I SVM(Supervised learning) Approach

I Unsupervised L2E Estimation Approach

I Experimental Results

I Concluding Remarks

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 3: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Introduction

We are drowning in the deluge of data that are being collectedworld-wide, while starving for knowledge at the same time.Anomalous events occur relatively infrequently However, whenthey do occur, their consequences can be quite dramatic and quiteoften in a negative sense

* - J. Naisbitt, Megatrends: Ten New Directions Transforming Our Lives.New York: Warner Books, 1982.

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 4: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Need for Accurate Speaker Recognition

I Method of recognizing a person based on his voice

I One of the forms of biometric identification

I Need for accurate and scalable speaker recognition -VoIP applications

I Applications in diverse areas- telephone, internetbanking,online trading,forensics

I Corporate and government sectors security enforcement

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 5: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

What is an intrusion detection?

I Intrusions are the activities that violate the security policy ofsystem.

I Intrusion Detection is the process used to identify maliciousbehavior that targets a network and its resources

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 6: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Intrusion Detection System

I Intrusion Detection Systems(IDSs) plays a key role as defensemechanism against malicious attacks in network security.

I Monitors traffic between users and networks; abnormalactivity.

I Analyzes patterns/signatures based on data packets.

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 7: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Intrusion Detection Techniques

I misuse intrusion detection-intrusion signatures

I statistical/anomaly intrusion detection

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 8: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Misuse intrusion detection

I Catch the intrusions in terms of the characteristics of knownattacks or system vulnerabilities

I Built with knowledge of bad behaviors

I Collection of signatures-Signature Analysis

I Examine event stream for signature match-Pattern Matching

I Cannot detect novel or unknown attacks

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 9: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Anomaly detection?

I Anomaly is a pattern in the data that does not conformto the expected behavior

I Also referred to as outliers, exceptions, peculiarities,surprise, etc.

I Detect any action that significantly deviates from thenormal behavior

I Built with knowledge of normal behaviors

I Examine event stream for deviations from normal

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 10: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Applications of Anomaly detection

I Network intrusion detection

I Insurance / Credit card fraud detection

I Healthcare Informatics / Medical diagnostics

I Industrial Damage Detection

I Image Processing / Video surveillance

I Novel Topic Detection in Text Mining

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 11: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Real world Analomies

Figure : Real world AnalomiesDr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 12: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Key Challenges

I Defining a representative normal region is challenging

I The boundary between normal and outlying behavior isoften not precise

I The exact notion of an outlier is different for differentapplication domains

I Availability of labeled data for training/validation

I Data is extremely huge, noisy, can be complex

I Normal behavior keeps evolving

I Fast and accurate real-time detection

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 13: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Novelty detection

I Identification of new or unknown data or signal that amachine learning system is not aware of during training.

I Fundamental requirements of good classification oridentification system

I Abnormalities are very rare or there may be no datadescribes the faulty conditions

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 14: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Techniques/approaches to detect anomalies

I Supervised - The data (observations, measurements, etc.)are labeled with pre-defined classes.

I Unsupervised - Class labels of the data are unknown

I Given a set of data, the task is to establish the existenceof classes or clusters in the data

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 15: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Support Vector Machine (SVM)

I A popular supervised anomaly detection technique

I SVMs are linear classifiers that find a hyperplane toseparate two class of data, positive and negative

I The common features in normal and adversary groupsneed to be learned and need to be differentiate

I Discovering the key characteristics of network trafficpatterns, a decision making boundary is superimposed inthe space of feature representations.

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 16: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

SVM for Network Traffic Classification

I Effectively understand the patterns of network trafficand detect measurements deemed untrustworthy frommalicious targets

I Eliminates the need for arbitrary assumptions about theunderlying network topology and parameters orthresholds in favor of direct training data.

I Discover key characteristics of network traffic patternsby superimposing a boundary in the space ofmeasurements.

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 17: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

SVM Framework

I Cast the problem of detecting malicious nodes in a SVMclassification framework

I Labeled Training Examples: (~xi , yi ), where ~xi is therepresentation of the i th example in the feature spaceand yi ∈ {1,−1} is the corresponding label

I Decision Boundary Function: y(→x ) =

→w .→x .+ w0 where

→w is the weight vector and w0 is the bias.

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 18: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

SVM Framework

I Network Traffic Features:→x

I Optimization Function:→w and w0

I Prediction of Training Set Label:

→w .→x +w0

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 19: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

SVM Optimization Problem

I

min1

2||→W ||2 + γ

N∑i=1

εi

subject to yi (→W .Φ(

→x ) + W0) > 1− εi , ∀i

I where N : number of training examples.

I εi : collection of non-negative slack variables that account forpossible misclassification’s.

I γ : trade off factor between the slack variables and the

regularization on the norm of the weight vector→W .

I The constraint in this minimization implies that we want our

predictions,→W .Φ (~x) .+ W to be similar to labels.

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 20: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Solution to the SVM Optimization Problem

I Solve optimization by quadratic programming in dual

I Parameter estimation by cross validation of training set

I Given a ~W ∗ and ~W ∗0 , predict whether a node is adversary

or not by looking at the sign of ~W ∗.Φ (~x) + ~W ∗0 .

I LibSVM package to implement the SVM model basedanomaly detection

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 21: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Key Challenges in Supervised learning

I Defining a representative normal region is challenging

I The boundary between normal and outlying behavior isoften not precise

I The exact notion of an outlier is different for differentapplication domains

I Availability of labeled data for training/validation

I Data is extremely huge, noisy, can be complex

I Normal behavior keeps evolving

I Fast and accurate real-time detection

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 22: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

What is Mixture Model

I Let fθm(~x) denote the general mixture probability density

function with m components.

fθm(~x) =

m∑i=1

πi f (~x |~φi ).

I πi ≥ 0,m∑i=1

πi = 1 for i = 1, . . . ,m;

θm = (π1, . . . , πm−1, πm, ~φT

1 , . . . ,~φT

m)T .

I In theory, the f (~x |~φi )’s could be any parametric density,although in practice they are often from the same parametricfamily (usually Gaussian)

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 23: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Estimation Approach with Built-in Robustness using L2E

I When m is known, we want to find fθm(~x) is close to g(~x) in

L2 distance.

I That is,

L2(fθm, g(~x)) =

∫ ∞−∞

[fθm(~x)− g(~x)]2d~x .

I The aim is to derive an estimate of θm that minimizes the L2distance

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 24: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Estimation Approach with Built-in Robustness

L2(fθm(~x), g(~x)) =

∫ ∞−∞

f 2θm(~x)d~x

− 2

∫ ∞−∞

fθm(~x)g(~x)d~x

+

∫ ∞−∞

g(~x)2d~x

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 25: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Estimation Approach with Built-in Robustness

I The last integral is constant with with respect to θm

I The first integral is often available as a closed formexpression

I The second integral is simply the average height of thedensity estimate, which may be estimated as−2n−1

∑ni=1 fθm

(~Xi ) where ~Xi is a sample observation.

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 26: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Computational Algorithm

I The L2E estimator of θm is given by

θ̂L2Em = arg min

θm

[∫ ∞−∞

f 2θm(~x)d~x − 2n−1

n∑i=1

fθm(~Xi )

],

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 27: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Computational Algorithm

I Normal Identity∫ ∞−∞

φ(x | µ1, σ12)φ(x | µ2, σ2

2)dx = φ(µ1 − µ2| 0, σ12 + σ2

2),

I where φ(x | µ, σ2) is the normal density function with mean µ andvariance σ2.

I For multivariate Gaussian mixtures-GMM, f (~x |~φi ) = φ(~x | ~µi ,Σi ),the use of the above identity reduces the key integral to

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 28: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Computational Algorithm

∫ ∞−∞

f 2θm(~x)d~x =

m∑k=1

m∑l=1

πkπl φ(~µk − ~µl | 0,Σk + Σl)

I Making the integral tractable and thereby significantlyreducing the computations involved in minimizing L2E .

I Thus, the estimation of L2E may be performed by anystandard optimization algorithm.

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 29: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Data Analysis

I The effective detection and identification of anomalies intraffic requires the ability to separate them from normalnetwork traffic.

I Network traffic data set from University of New Mexico.

I Trace files contained 13831 sample observations withprocess IDs and their respective system calls.

I We apply our L2E(unsupervised) and compare theperformance with SVM(supervised)

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 30: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Results: Accuracy with increasing dimensions-(70%-30%train-test partition of the data)

Dimensions L2E FalseDetec-tionRate

L2E TrueDetec-tionRate

SVMTrue De-tectionRate

2 0.774 1.000 0.99263 0.663 1.000 0.99264 0.561 1.000 0.99245 0.390 0.989 0.99246 0.322 1.000 0.99247 0.189 1.000 0.99248 0.000 0.980 0.9924

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 31: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Results: Accuracy with varied testing training data-(using8 dimensions of the data)

Train-Testing

L2E FalseDetec-tionRate

L2E TrueDetec-tionRate

SVMTrue De-tectionRate

50-50 0.0003 0.9884 0.991460-40 0.0001 0.9786 0.991970-30 0.0002 0.9836 0.992080-20 0.0002 0.9781 0.989890-10 0.0001 0.9814 0.9884

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 32: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Results: Accuracy with increasing testing samplesize-(using 70%-30% train-test partition using 8dimensions of the data)

TestingSampleSize

L2E FalseDetec-tion Rate

L2E TrueDetec-tionRate

SVMTrue De-tectionRate

500 0.0000 0.9792 0.99601000 0.0000 0.9744 0.99601500 0.0000 0.9686 0.99202000 0.0000 0.9814 0.99352500 0.0004 0.9844 0.99123000 0.0000 0.9876 0.99073500 0.0003 0.9840 0.99254000 0.0003 0.9836 0.9927

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 33: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Observations

I The false detection rate for SVM for all scenarios forthis data set is zero.

I Despite the lack of labeled training data , the truedetection rate of the L2E algorithm is comparable to theSVM for all scenarios.

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 34: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Analysis for Simulated data

I Case: 5 dimension, n=10000 and we use 80/20 randomsplit.

I Dataset : mu1 = (2, 2, 2, 2), mu2 = (2.5, 2.5, 2.5, 2.5),σ1 =diag(.1),σ2 = diag(.4),pi1 = 0.8,pi2 = 0.2

I We apply our L2E(unsupervised) and compare theperformance with SVM(supervised) and some othermachine learning algorithms.

I Classification accuracy for L2E is better than thealternatives.

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 35: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Results: Comparing Machine Learning Algorithm for thesimulate data:testing sample size-

Classifier Time False -ve False+ve

L2E 2.1 0.0345 0.0055EM 16 0.0315 0.006Trees 0.31 0.186 0.011SVM 1.95 0.167 0.007NN 5.2 0.214 0.01

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 36: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Conclusion-Significance of our L2E

I Does not require the labeled training data orspecial configuration

I Ease of use

I Efficiency in achieving accuracy with outcomputational overhead

I Results are Comparable to SVM and other machinelearning algorithms

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 37: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Current and Future work

I Evaluating the performance using multiple networktraffic data sets for speaker recognition.

I Applying real data sets with higher dimensions andlarge number of components.

I Estimating the number of components.

I Data Mining-Random Forest/Boosting

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 38: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Some Reference Article

I L2E Estimation of Mixture complexity for Count Data-CSDA(Oct,2009)

I Simultaneous Robust Estimation in Finite Mixture: TheContinuous Case- JISA(Special-Golden Jubilee-2012)

I Detection of Anomalies in Network traffic using L2E forAccurate speaker recognition, IEEE Midwest, August,2012.

I Elements of Statistical Learning- Book-http://www-stat.stanford.edu/ tibs/ElemStatLearn/

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data

Page 39: Unsupervised Anomaly Detection for High Dimensional Data · Anomaly detection? I Anomaly is a pattern in the data that does not conform to the expected behavior I Also referred to

Dr. Thayasivam, Umashanger Unsupervised Anomaly Detection for High Dimensional Data