anomaly detection with extreme value theory

94
Anomaly Detection with Extreme Value Theory A. Siffer, P-A Fouque, A. Termier and C. Largouet April 26, 2017

Upload: others

Post on 11-Sep-2021

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Anomaly Detection with Extreme Value Theory

Anomaly Detection with Extreme Value Theory

A. Siffer, P-A Fouque, A. Termier and C. LargouetApril 26, 2017

Page 2: Anomaly Detection with Extreme Value Theory

Contents

Context

Providing better thresholds

Finding anomalies in streams

Application to intrusion detection

A more general framework

1

Page 3: Anomaly Detection with Extreme Value Theory

Context

Page 4: Anomaly Detection with Extreme Value Theory

General motivations

⊸ Massive usage of the Internet

• More and more vulnerabilities• More and more threats

⊸ Awareness of the sensitive data and infrastructures

2

Page 5: Anomaly Detection with Extreme Value Theory

General motivations

⊸ Massive usage of the Internet• More and more vulnerabilities

• More and more threats

⊸ Awareness of the sensitive data and infrastructures

2

Page 6: Anomaly Detection with Extreme Value Theory

General motivations

⊸ Massive usage of the Internet• More and more vulnerabilities• More and more threats

⊸ Awareness of the sensitive data and infrastructures

2

Page 7: Anomaly Detection with Extreme Value Theory

General motivations

⊸ Massive usage of the Internet• More and more vulnerabilities• More and more threats

⊸ Awareness of the sensitive data and infrastructures

2

Page 8: Anomaly Detection with Extreme Value Theory

General motivations

⊸ Massive usage of the Internet• More and more vulnerabilities• More and more threats

⊸ Awareness of the sensitive data and infrastructures

2

⇒ Network security :a major concern

Page 9: Anomaly Detection with Extreme Value Theory

A Solution

⊸ IDS (Intrusion Detection System)• Monitor traffic• Detect attacks

⊸ Current methods : rule-based

• Work fine on common and well-known attacks• Cannot detect new attacks

⊸ Emerging methods : anomaly-based

• Use the network data to estimate a normal behavior• Apply algorithms to detect abnormal events (→ attacks)

3

Page 10: Anomaly Detection with Extreme Value Theory

A Solution

⊸ IDS (Intrusion Detection System)• Monitor traffic• Detect attacks

⊸ Current methods : rule-based• Work fine on common and well-known attacks• Cannot detect new attacks

⊸ Emerging methods : anomaly-based

• Use the network data to estimate a normal behavior• Apply algorithms to detect abnormal events (→ attacks)

3

Page 11: Anomaly Detection with Extreme Value Theory

A Solution

⊸ IDS (Intrusion Detection System)• Monitor traffic• Detect attacks

⊸ Current methods : rule-based• Work fine on common and well-known attacks• Cannot detect new attacks

⊸ Emerging methods : anomaly-based• Use the network data to estimate a normal behavior• Apply algorithms to detect abnormal events (→ attacks)

3

Page 12: Anomaly Detection with Extreme Value Theory

Overview

⊸ Basic scheme

Algorithmdata alerts

⊸ Many ”standard” algorithms have been tested⊸ Complex pipelines are emerging (ensemble/hybrid techniques)

dataalerts

4

Page 13: Anomaly Detection with Extreme Value Theory

Overview

⊸ Basic scheme

Algorithmdata alerts

⊸ Many ”standard” algorithms have been tested

⊸ Complex pipelines are emerging (ensemble/hybrid techniques)

dataalerts

4

Page 14: Anomaly Detection with Extreme Value Theory

Overview

⊸ Basic scheme

Algorithmdata alerts

⊸ Many ”standard” algorithms have been tested⊸ Complex pipelines are emerging (ensemble/hybrid techniques)

dataalerts

4

Page 15: Anomaly Detection with Extreme Value Theory

Inherent problem

⊸ Algorithms are not magic• They give some information about data (scores)

• But the decision often rely on a human choice

if score>threshold then trigger alert

⊸ The thresholds are often hard-set

• Expertise• Fine-tuning• Distribution assumption

⊸ Our idea: provide dynamic threshold with a probabilisticmeaning

5

Page 16: Anomaly Detection with Extreme Value Theory

Inherent problem

⊸ Algorithms are not magic• They give some information about data (scores)• But the decision often rely on a human choice

if score>threshold then trigger alert

⊸ The thresholds are often hard-set

• Expertise• Fine-tuning• Distribution assumption

⊸ Our idea: provide dynamic threshold with a probabilisticmeaning

5

Page 17: Anomaly Detection with Extreme Value Theory

Inherent problem

⊸ Algorithms are not magic• They give some information about data (scores)• But the decision often rely on a human choice

if score>threshold then trigger alert

⊸ The thresholds are often hard-set• Expertise• Fine-tuning• Distribution assumption

⊸ Our idea: provide dynamic threshold with a probabilisticmeaning

5

Page 18: Anomaly Detection with Extreme Value Theory

Inherent problem

⊸ Algorithms are not magic• They give some information about data (scores)• But the decision often rely on a human choice

if score>threshold then trigger alert

⊸ The thresholds are often hard-set• Expertise• Fine-tuning• Distribution assumption

⊸ Our idea: provide dynamic threshold with a probabilisticmeaning

5

Page 19: Anomaly Detection with Extreme Value Theory

Providing better thresholds

Page 20: Anomaly Detection with Extreme Value Theory

My problem

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

⊸ How to set zq such that P(X€ > zq) < q ?

6

Page 21: Anomaly Detection with Extreme Value Theory

My problem

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

⊸ How to set zq such that P(X€ > zq) < q ?6

Page 22: Anomaly Detection with Extreme Value Theory

Solution 1: empirical approach

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

⊸ Drawbacks: stuck in the interval, poor resolution

7

Page 23: Anomaly Detection with Extreme Value Theory

Solution 1: empirical approach

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

1− q q

zq

⊸ Drawbacks: stuck in the interval, poor resolution

7

Page 24: Anomaly Detection with Extreme Value Theory

Solution 1: empirical approach

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

1− q q

zq

⊸ Drawbacks: stuck in the interval, poor resolution7

Page 25: Anomaly Detection with Extreme Value Theory

Solution 2: Standard Model

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

⊸ Drawbacks: manual step, distribution assumption

8

Page 26: Anomaly Detection with Extreme Value Theory

Solution 2: Standard Model

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

⊸ Drawbacks: manual step, distribution assumption

8

Page 27: Anomaly Detection with Extreme Value Theory

Solution 2: Standard Model

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

zq

⊸ Drawbacks: manual step, distribution assumption

8

Page 28: Anomaly Detection with Extreme Value Theory

Solution 2: Standard Model

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

zq

⊸ Drawbacks: manual step, distribution assumption8

Page 29: Anomaly Detection with Extreme Value Theory

Realities

0 100 200 3000.000

0.005

0.010

0.015

16 18 20 22 240.00

0.05

0.10

0.15

0.20

20 40 60 80 1000.000

0.005

0.010

0.015

0.020

0 25 50 750.00

0.01

0.02

0.03

⊸ Different clients and/or temporal drift

9

Page 30: Anomaly Detection with Extreme Value Theory

Results

Properties Empirical quantile Standard modelstatistical guarantees Yes Yes

easy to adapt Yes Nohigh resolution No Yes

10

Page 31: Anomaly Detection with Extreme Value Theory

Inspection of extreme events

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

Probability estimation ?

11

Page 32: Anomaly Detection with Extreme Value Theory

Inspection of extreme events

0 25 50 75 100 125 150 175 200Daily payment by credit card ( )

0.000

0.005

0.010

0.015

0.020

0.025

Freq

uenc

y

Probability estimation ?

11

Page 33: Anomaly Detection with Extreme Value Theory

Extreme Value Theory

⊸ Main result (Fisher-Tippett-Gnedenko, 1928)

The extreme values of any distribution have nearly the samedistribution (called Extreme Value Distribution)

0 1 2 3 4 5 6x

0.0

0.1

0.2

0.3

0.4

0.5

0.6

(X>

x)

heavy tailexponential tailbounded tail

12

Page 34: Anomaly Detection with Extreme Value Theory

Extreme Value Theory

⊸ Main result (Fisher-Tippett-Gnedenko, 1928)

The extreme values of any distribution have nearly the samedistribution (called Extreme Value Distribution)

0 1 2 3 4 5 6x

0.0

0.1

0.2

0.3

0.4

0.5

0.6

(X>

x)

heavy tailexponential tailbounded tail

12

Page 35: Anomaly Detection with Extreme Value Theory

Extreme Value Theory

⊸ Main result (Fisher-Tippett-Gnedenko, 1928)

The extreme values of any distribution have nearly the samedistribution (called Extreme Value Distribution)

0 1 2 3 4 5 6x

0.0

0.1

0.2

0.3

0.4

0.5

0.6

(X>

x)

heavy tailexponential tailbounded tail

12

Page 36: Anomaly Detection with Extreme Value Theory

An impressive analogy

⊸ Let X1, X2, . . . Xn a sequence of i.i.d. random variables with

Sn =n∑i=1

Xi Mn = max1≤i≤n

(Xi)

⊸ Central Limit TheoremSn − nµ√n

d−→ N (0, σ2)

⊸ FTG Theorem Mn − anbn

d−→ EVD(γ)

13

Page 37: Anomaly Detection with Extreme Value Theory

An impressive analogy

⊸ Let X1, X2, . . . Xn a sequence of i.i.d. random variables with

Sn =n∑i=1

Xi Mn = max1≤i≤n

(Xi)

⊸ Central Limit TheoremSn − nµ√n

d−→ N (0, σ2)

⊸ FTG Theorem Mn − anbn

d−→ EVD(γ)

13

Page 38: Anomaly Detection with Extreme Value Theory

An impressive analogy

⊸ Let X1, X2, . . . Xn a sequence of i.i.d. random variables with

Sn =n∑i=1

Xi Mn = max1≤i≤n

(Xi)

⊸ Central Limit TheoremSn − nµ√n

d−→ N (0, σ2)

⊸ FTG Theorem Mn − anbn

d−→ EVD(γ)

13

Page 39: Anomaly Detection with Extreme Value Theory

A more practical result

⊸ Second theorem of EVT (Pickands-Balkema-de Haan, 1974)

The excesses over a high threshold follow a Generalized ParetoDistribution (with parameters γ, σ)

⊸ What does it imply ?

• we have a model for extreme events• we can compute zq for q as small as desired

14

Page 40: Anomaly Detection with Extreme Value Theory

A more practical result

⊸ Second theorem of EVT (Pickands-Balkema-de Haan, 1974)

The excesses over a high threshold follow a Generalized ParetoDistribution (with parameters γ, σ)

⊸ What does it imply ?• we have a model for extreme events• we can compute zq for q as small as desired

14

Page 41: Anomaly Detection with Extreme Value Theory

How to use EVT

⊸ Get some data X1, X2 . . . Xn⊸ Set a high threshold t and retrieve the excesses Yj = Xkj − t

when Xkj > t

⊸ Fit a GPD to the Yj (→ find parameters γ, σ)⊸ Compute zq such as P(X > zq) < q

EVT

q

X1, X2 . . . Xn zq

15

Page 42: Anomaly Detection with Extreme Value Theory

How to use EVT

⊸ Get some data X1, X2 . . . Xn⊸ Set a high threshold t and retrieve the excesses Yj = Xkj − t

when Xkj > t⊸ Fit a GPD to the Yj (→ find parameters γ, σ)

⊸ Compute zq such as P(X > zq) < q

EVT

q

X1, X2 . . . Xn zq

15

Page 43: Anomaly Detection with Extreme Value Theory

How to use EVT

⊸ Get some data X1, X2 . . . Xn⊸ Set a high threshold t and retrieve the excesses Yj = Xkj − t

when Xkj > t⊸ Fit a GPD to the Yj (→ find parameters γ, σ)⊸ Compute zq such as P(X > zq) < q

EVT

q

X1, X2 . . . Xn zq

15

Page 44: Anomaly Detection with Extreme Value Theory

How to use EVT

⊸ Get some data X1, X2 . . . Xn⊸ Set a high threshold t and retrieve the excesses Yj = Xkj − t

when Xkj > t⊸ Fit a GPD to the Yj (→ find parameters γ, σ)⊸ Compute zq such as P(X > zq) < q

EVT

q

X1, X2 . . . Xn zq

15

Page 45: Anomaly Detection with Extreme Value Theory

Finding anomalies in streams

Page 46: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 47: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 48: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 49: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 50: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 51: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 52: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 53: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 54: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 55: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 56: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 57: Anomaly Detection with Extreme Value Theory

Streaming Peaks-Over-Threshold (SPOT) algorithm

(initial batch)

X1, X2 . . . Xn Calibration

q

0

0.10

0.20

0.30

20 40 60 80 100 120

t zq

(stream)

Xi>n Xi > zq

trigger alarmyes

no Xi > t

yesupdate model

no drop

16

Page 58: Anomaly Detection with Extreme Value Theory

Can we trust that threshold zq ?

⊸ An example with ground truth : a Gaussian White Noise• 40 streams with 200 000 iid variables drawn from N (0, 1)• q = 10−3 ⇒ theoretical threshold zth ≃ 3.09

⊸ Averaged relative error

0 50000 100000 150000 200000Number of observations

0.01

0.02

0.03

0.04

Rela

tive

erro

r

17

Page 59: Anomaly Detection with Extreme Value Theory

Can we trust that threshold zq ?

⊸ An example with ground truth : a Gaussian White Noise• 40 streams with 200 000 iid variables drawn from N (0, 1)• q = 10−3 ⇒ theoretical threshold zth ≃ 3.09

⊸ Averaged relative error

0 50000 100000 150000 200000Number of observations

0.01

0.02

0.03

0.04

Rela

tive

erro

r

17

Page 60: Anomaly Detection with Extreme Value Theory

Application to intrusion detection

Page 61: Anomaly Detection with Extreme Value Theory

About the data

⊸ Lack of relevant public datasets to test the algorithms ...

⊸ KDD99 ? See [McHugh 2000] and [Mahoney & Chan 2003]⊸ We rather use MAWI

• 15 min a day of real traffic (.pcap file)• Anomaly patterns given by the MAWILab [Fontugne et al. 2010]with taxonomy [Mazel et al. 2014]

⊸ Preprocessing step : raw .pcap → NetFlow format (onlymetadata)

18

Page 62: Anomaly Detection with Extreme Value Theory

About the data

⊸ Lack of relevant public datasets to test the algorithms ...⊸ KDD99 ? See [McHugh 2000] and [Mahoney & Chan 2003]

⊸ We rather use MAWI

• 15 min a day of real traffic (.pcap file)• Anomaly patterns given by the MAWILab [Fontugne et al. 2010]with taxonomy [Mazel et al. 2014]

⊸ Preprocessing step : raw .pcap → NetFlow format (onlymetadata)

18

Page 63: Anomaly Detection with Extreme Value Theory

About the data

⊸ Lack of relevant public datasets to test the algorithms ...⊸ KDD99 ? See [McHugh 2000] and [Mahoney & Chan 2003]⊸ We rather use MAWI

• 15 min a day of real traffic (.pcap file)• Anomaly patterns given by the MAWILab [Fontugne et al. 2010]with taxonomy [Mazel et al. 2014]

⊸ Preprocessing step : raw .pcap → NetFlow format (onlymetadata)

18

Page 64: Anomaly Detection with Extreme Value Theory

About the data

⊸ Lack of relevant public datasets to test the algorithms ...⊸ KDD99 ? See [McHugh 2000] and [Mahoney & Chan 2003]⊸ We rather use MAWI

• 15 min a day of real traffic (.pcap file)• Anomaly patterns given by the MAWILab [Fontugne et al. 2010]with taxonomy [Mazel et al. 2014]

⊸ Preprocessing step : raw .pcap → NetFlow format (onlymetadata)

18

Page 65: Anomaly Detection with Extreme Value Theory

An example to detect network syn scan

⊸ The ratio of SYN packets : relevant feature to detect networkscan [Fernandes & Owezarski 2009]

0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0Time (s)

0.0

0.2

0.4

0.6

0.8

ratio

of S

YN p

acke

ts in

50m

s tim

e wi

ndow

⊸ Goal: find peaks

19

Page 66: Anomaly Detection with Extreme Value Theory

An example to detect network syn scan

⊸ The ratio of SYN packets : relevant feature to detect networkscan [Fernandes & Owezarski 2009]

0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0Time (s)

0.0

0.2

0.4

0.6

0.8ra

tio o

f SYN

pac

kets

in 5

0ms t

ime

wind

ow

⊸ Goal: find peaks

19

Page 67: Anomaly Detection with Extreme Value Theory

An example to detect network syn scan

⊸ The ratio of SYN packets : relevant feature to detect networkscan [Fernandes & Owezarski 2009]

0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0Time (s)

0.0

0.2

0.4

0.6

0.8ra

tio o

f SYN

pac

kets

in 5

0ms t

ime

wind

ow

⊸ Goal: find peaks19

Page 68: Anomaly Detection with Extreme Value Theory

SPOT results

⊸ Parameters : q = 10−4,n = 2000 (from the previous day record)

0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0Time (s)

0.0

0.2

0.4

0.6

0.8

ratio

of S

YN p

acke

ts in

50m

s tim

e wi

ndow

20

Page 69: Anomaly Detection with Extreme Value Theory

SPOT results

⊸ Parameters : q = 10−4,n = 2000 (from the previous day record)

0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0Time (s)

0.0

0.2

0.4

0.6

0.8ra

tio o

f SYN

pac

kets

in 5

0ms t

ime

wind

ow

20

Page 70: Anomaly Detection with Extreme Value Theory

Do we really flag scan attacks ?

⊸ The main parameter q: a False Positive regulator

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5

FPr (%)

0.0

20.0

40.0

60.0

80.0

100.0

TPr

(%)

9. 10−6

1. 10−5 5. 10−5

8. 10−5

8. 10−6

6. 10−3

⊸ 86% of scan flows detected with less than 4% of FP

21

Page 71: Anomaly Detection with Extreme Value Theory

Do we really flag scan attacks ?

⊸ The main parameter q: a False Positive regulator

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5

FPr (%)

0.0

20.0

40.0

60.0

80.0

100.0TPr

(%)

9. 10−6

1. 10−5 5. 10−5

8. 10−5

8. 10−6

6. 10−3

⊸ 86% of scan flows detected with less than 4% of FP

21

Page 72: Anomaly Detection with Extreme Value Theory

Do we really flag scan attacks ?

⊸ The main parameter q: a False Positive regulator

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5

FPr (%)

0.0

20.0

40.0

60.0

80.0

100.0TPr

(%)

9. 10−6

1. 10−5 5. 10−5

8. 10−5

8. 10−6

6. 10−3

⊸ 86% of scan flows detected with less than 4% of FP21

Page 73: Anomaly Detection with Extreme Value Theory

A more general framework

Page 74: Anomaly Detection with Extreme Value Theory

SPOT Specifications

⊸ A single main parameter q• With a probabilistic meaning → P(X > zq) < q• False Positive regulator

⊸ Stream capable

• Incremental learning• Fast (∼ 1000 values/s)• Low memory usage (only the excesses)

22

Page 75: Anomaly Detection with Extreme Value Theory

SPOT Specifications

⊸ A single main parameter q• With a probabilistic meaning → P(X > zq) < q• False Positive regulator

⊸ Stream capable• Incremental learning• Fast (∼ 1000 values/s)• Low memory usage (only the excesses)

22

Page 76: Anomaly Detection with Extreme Value Theory

Other things ?

⊸ SPOT• performs dynamic thresholding without distribution assumption• uses it to detect network anomalies

⊸ But it could be adapted to

• compute upper and lower thresholds• other fields• drifting contexts (with an additional parameter) → DSPOT

23

Page 77: Anomaly Detection with Extreme Value Theory

Other things ?

⊸ SPOT• performs dynamic thresholding without distribution assumption• uses it to detect network anomalies

⊸ But it could be adapted to

• compute upper and lower thresholds• other fields• drifting contexts (with an additional parameter) → DSPOT

23

Page 78: Anomaly Detection with Extreme Value Theory

Other things ?

⊸ SPOT• performs dynamic thresholding without distribution assumption• uses it to detect network anomalies

⊸ But it could be adapted to• compute upper and lower thresholds

• other fields• drifting contexts (with an additional parameter) → DSPOT

23

Page 79: Anomaly Detection with Extreme Value Theory

Other things ?

⊸ SPOT• performs dynamic thresholding without distribution assumption• uses it to detect network anomalies

⊸ But it could be adapted to• compute upper and lower thresholds• other fields

• drifting contexts (with an additional parameter) → DSPOT

23

Page 80: Anomaly Detection with Extreme Value Theory

Other things ?

⊸ SPOT• performs dynamic thresholding without distribution assumption• uses it to detect network anomalies

⊸ But it could be adapted to• compute upper and lower thresholds• other fields• drifting contexts (with an additional parameter) → DSPOT

23

Page 81: Anomaly Detection with Extreme Value Theory

A recent example

⊸ Thursday the 9th of February 2017

• 9h : explosion at Flamanville nuclear plant• 11h : official declaration of the incident by EDF

⊸ What about the EDF stock prices ?

24

Page 82: Anomaly Detection with Extreme Value Theory

A recent example

⊸ Thursday the 9th of February 2017

• 9h : explosion at Flamanville nuclear plant• 11h : official declaration of the incident by EDF

⊸ What about the EDF stock prices ?

24

Page 83: Anomaly Detection with Extreme Value Theory

A recent example

⊸ Thursday the 9th of February 2017• 9h : explosion at Flamanville nuclear plant

• 11h : official declaration of the incident by EDF

⊸ What about the EDF stock prices ?

24

Page 84: Anomaly Detection with Extreme Value Theory

A recent example

⊸ Thursday the 9th of February 2017• 9h : explosion at Flamanville nuclear plant• 11h : official declaration of the incident by EDF

⊸ What about the EDF stock prices ?

24

Page 85: Anomaly Detection with Extreme Value Theory

A recent example

⊸ Thursday the 9th of February 2017• 9h : explosion at Flamanville nuclear plant• 11h : official declaration of the incident by EDF

⊸ What about the EDF stock prices ?

24

Page 86: Anomaly Detection with Extreme Value Theory

EDF stock prices

09:02

09:42

10:34

11:32

12:19

13:30

14:43

15:34

16:21

17:14

Time

9.05

9.10

9.15

9.20

9.25

9.30

9.35

9.40ED

F st

ock

price

()

25

Page 87: Anomaly Detection with Extreme Value Theory

EDF stock prices

09:02

09:42

10:34

11:32

12:19

13:30

14:43

15:34

16:21

17:14

Time

9.05

9.10

9.15

9.20

9.25

9.30

9.35

9.40ED

F st

ock

price

()

25

Page 88: Anomaly Detection with Extreme Value Theory

Conclusion

⊸ Context: A great deal of work has been done to developanomaly detection algorithms

⊸ Problem: Decision thresholds rely on either distributionassumption or expertise

⊸ Our solution: Building dynamic threshold with a probabilisticmeaning

• Application to detect network anomalies• But a general tool to monitor online time series in a blind way

⊸ Future: Adapt the method to higher dimensions

26

Page 89: Anomaly Detection with Extreme Value Theory

Conclusion

⊸ Context: A great deal of work has been done to developanomaly detection algorithms

⊸ Problem: Decision thresholds rely on either distributionassumption or expertise

⊸ Our solution: Building dynamic threshold with a probabilisticmeaning

• Application to detect network anomalies• But a general tool to monitor online time series in a blind way

⊸ Future: Adapt the method to higher dimensions

26

Page 90: Anomaly Detection with Extreme Value Theory

Conclusion

⊸ Context: A great deal of work has been done to developanomaly detection algorithms

⊸ Problem: Decision thresholds rely on either distributionassumption or expertise

⊸ Our solution: Building dynamic threshold with a probabilisticmeaning

• Application to detect network anomalies• But a general tool to monitor online time series in a blind way

⊸ Future: Adapt the method to higher dimensions

26

Page 91: Anomaly Detection with Extreme Value Theory

Conclusion

⊸ Context: A great deal of work has been done to developanomaly detection algorithms

⊸ Problem: Decision thresholds rely on either distributionassumption or expertise

⊸ Our solution: Building dynamic threshold with a probabilisticmeaning

• Application to detect network anomalies• But a general tool to monitor online time series in a blind way

⊸ Future: Adapt the method to higher dimensions

26

Page 92: Anomaly Detection with Extreme Value Theory

Conclusion

⊸ Context: A great deal of work has been done to developanomaly detection algorithms

⊸ Problem: Decision thresholds rely on either distributionassumption or expertise

⊸ Our solution: Building dynamic threshold with a probabilisticmeaning

• Application to detect network anomalies

• But a general tool to monitor online time series in a blind way

⊸ Future: Adapt the method to higher dimensions

26

Page 93: Anomaly Detection with Extreme Value Theory

Conclusion

⊸ Context: A great deal of work has been done to developanomaly detection algorithms

⊸ Problem: Decision thresholds rely on either distributionassumption or expertise

⊸ Our solution: Building dynamic threshold with a probabilisticmeaning

• Application to detect network anomalies• But a general tool to monitor online time series in a blind way

⊸ Future: Adapt the method to higher dimensions

26

Page 94: Anomaly Detection with Extreme Value Theory

Conclusion

⊸ Context: A great deal of work has been done to developanomaly detection algorithms

⊸ Problem: Decision thresholds rely on either distributionassumption or expertise

⊸ Our solution: Building dynamic threshold with a probabilisticmeaning

• Application to detect network anomalies• But a general tool to monitor online time series in a blind way

⊸ Future: Adapt the method to higher dimensions

26