forecast, detect, intervene: anomaly detection for time series. deepak agarwal yahoo! research

42
Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Upload: madison-ashley-george

Post on 04-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Forecast, Detect, Intervene: Anomaly Detection for Time

Series.Deepak Agarwal

Yahoo! Research

Page 2: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Outline

• Approach– Forecast– Detect– Intervene

• Monitoring multiple series– Multiple testing, a Bayesian solution.

• Application

Page 3: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Issues• {yt} : univariate, regularly spaced time series to be

monitored for anomalies, “novel events” , surprises prospectively.– E.g. query volume, Hang-ups, ER admissions

• Goal: A semi-automated statistical approach – Forecast accurately : good baseline model.– Detect deviations from baseline:

• sensitivity/specificity/timeliness

– Baseline model adaptive: learn changes automatically • Important in applications : better forecasts →fewer false +ve

Page 4: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Approach• Three components : (West and Harrison, 1976)

–Forecast: Bayesian version of Kalman filter

–Detection: A new sequential algorithm

– Intervention: correct baseline model.

Page 5: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Forecasting

Page 6: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Kalman filter

• Observation Equation– Conditional distribution of data given parameters

• State Equation – Evolution of parameters (states) through time

• Posterior of states, predictive distribution– Estimated online by recursive algorithm

Page 7: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

OBSERVATION EQUATIONy ~ (0, )

t

: observed value: mean (unknown and estimated from data): with variance

(usually unknown and estimated from data)

' independent conditional on 'i.e., if the truth

t

t

v N Vt t

ytTruet

v Noise Vt

y s st t

1

is known, ' provide no informationto predict the future.

This model severely overfits (more unknowns than knowns)Need to make simplifying assumptions to estimate the

true mean surface { }t T

t

y st

t

Page 8: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

STATE EQUATION: What assumptions are appropriate for the “Truth”?

for all t

Too simple, performs poorly unless mean stationary.(Rarely works on real data)

position of particle at described by some differential equationt.

t

te g

Simple :

Determined by system dynamics :

. (Wikle,1998) studies ecological processes using reaction-diffusion equations.

:

Works well for empirical data analysis.Assumes function of true mean at previous time poit

Simple Markovian assumptions

0 gives back the constant model.

ntse.g. Simple Random Walk: ~ (0, )t 1

:

w N Wt tt

NB Wt

Page 9: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

More general models

1

equationy ~ (0, )

equation~ (0, )

updated using Kalman filter

One can almost take any static model and make it dynamic.

e.g, is a 7-dim vector correspo

Tt t t t t

Tt t t

t t t t t

t

t

observationx Vx

StateG w W

nding to day of week effects.

Covariates whose coefficients evolve dynamically

1t

Yt-1 Yt

t

xtXt-1

Gt

Yt-1

Xt-1

Page 10: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Kalman Filter update at time t: t-1 t-1 1 1 1

t t 1

1 1

1

t

(a) Posterior for : ( | ) ~ ( , )

( ) Prior for : ( | ) ~ ( , )

Prior parameters derived by plugging posterior estimates in state equation

0

( )Likelihood of y :

t t t

t t t

t t t

t t t

D N m C

b D N a R

a m mR C W

c

t t

1 t t-1 t

t

t 1

( | ) ~ ( , )

( )Posterior for : ( | ) ~ ( , )

(y -m ); C A /( )

(1 A )

If , this gives the well known exponentially weighted mo

t t t t

t t t

t t t t t

t t t

t t t t

t

y N V

d D N m C

m m A AV

R R V

m A y m

A A

ving average (EWMA). Under mild conditions, this is true.

Page 11: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Estimating Variance componentst t-1

t 1 1

Assume W C

W (1 ) / / (0 1)

called "discount" factor, makes prior at vague to enablecurrent estimates being influenced more by recent past.

0:Prior too vague.1: Pri

t w w t t w w

w

w

w

C R C

t

2t 0 0

or too tight, close to a static model.

plays a role similar to window size in streaming algorithms.

In practice, it is better to be conservative and choose smaller .Reason: Var(e | ) (1 ( )

w

w

w wD Q

2/(1 ))

We select using initial data and re-adjust the value later if needed.

w

w

Page 12: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Detection

Page 13: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

An existing method

errors. correlated-auto d)

changes variancec) shiftsmean b)a)outlier :Detects-

s.for factor Bayeson based algorithm sequentialA -

:

10 iid s null,Under -

)|(/))|(( 11

t

t

tttttt

u

),N(u

DyVDyEyu

'05) Salvador, and llowork(Garga Related

Gaussian ondistributi predictive when Residual

Page 14: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research
Page 15: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Pitfalls of GS

• What if predictive not Gaussian?– Mixtures of Gaussians, Poisson etc

• Bayes factor: specify alternative explicitly– Large number of unspecified parameters– Require explicit model for each alternative

Page 16: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Our approach

• Normal scores derived from p-values– Good for continuous, approximately good for

discrete, especially for large means.

• A sequential procedure with far less tweaking parameters.

• Our method has more power, we sacrifice on timeliness.

Page 17: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Sequential detection procedureAt time t, we are in one of these regions:• Acceptance region (A): The null model is true,

the system is behaving as expected, no anomalies, start a new run.

• Rejection region (R) : The null model is not true, an anomaly is generated which is reported to the user and/or the forecasting model is reset. Start a new run.

• Continue (C): Don’t have enough data to reach a decision, keep accumulating evidence by taking another sample.

Page 18: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Detecting outliers and mean shifts

r

r t-i+1i=0

inc

: | |

All other changes based on most recent run of ( 1) points in

: : ' are iid ( ,1) (|h|>0).

Test based on u u / ~ (0,1/ ) under null.

Test statistics: Pmean =

t

h i

u a

r

M u s N h

r N r

Outlier

C

Mean shifts

h=0 r r

dec h=0 r r

P (u u ) (to detect mean increase.) Pmean = P (u <= u ) (to detect mean decrease.)

Large shifts, would be identified as outliers. Moderateshifts when identified can

obs

obs

provide important information about the system.

Page 19: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Detecting variance shifts

k

2 2 2- 1

02 2( )

1

1

M : ' are iid (0, )

Test based on ~ with under null.

Test statistics: var ( ) (to detect variance increase) var (

tr

r t i ri

obsinc k r r

dec k r

u s N k

U u df r

P P U UP P U

2 2( ) ) (to detect variance decrease)

Large variance increase will be identified as an outlier, moderateincrease when detected can be important in applications.

Variance decrease indicates the s

obsrU

ystem getting stable or the discountparameters being set too low. Helps in improving the forecasting model,may not be that important to the user.

Page 20: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Gradual changes, auto correlated errors.

g i 1

Local patches of auto-correlated errors (Gargallo and Salvador, 2003).

Roughly, the residuals follow an AR(1) process.

M : u ~ (0,1)(0 1)

Residuals tend to be positive(negative) more often th

i igu N g

an negative(positive).Gradual ramp up(down) in the residuals, short run of consecutive moderatechanges.

Report the change, relax the discount factors to learn the gradual shifts.

Use two test statistics2 2

2- 1 1

0 0

1

:

g= / ~ (0,1/ ) under null for large r

For small r, exact distribution not available in closed form.

(1 ) ~ (0, (12

r r

t i t i t ii i

t t t

u u u N r

u u u N

(obs) (obs)1 g=0 2 g=0

(1 ) )) under null.

Test statistics: AC =P (g > g ); AC =min(P (g > g ), (| | | |)

r

obsnull t tP u u

Page 21: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

2 : Cauchy with scale 1 and center 0. 2, analytical form not known, well

approximated by mixture of two normals.

( ) (0,1) (1 ) (0,1 )

( , ) estimated using si

r

rr r g

f r g PN P N

P

Distribution of g under g = 0

mulationsfrom the distribution.

For 7, the normal approximation is good.r

r P τ

3 .84 14.91

4 .87 4.40

5 .96 3.77

6 .98 3.98

7 .99 3.66

8 .98 0.10

9 .98 0.10

10 .85 0.09

11 .85 0.09

Page 22: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

The sequential algorithm at time t

-1max 2

-1max 2

max 1 max

1: | | outlier else continue.

1: | | outlier compute max( ( , , var , var , ))

arg max( ( , , var , var , ))

accept null;

t

t

inc dec inc dec

inc dec inc dec

r u a

r u a elseS Pmean Pmean P P ACi Pmean Pmean P P AC

S c S

2 max

1 max 2

1 2

declare anomaly continue.

3.5, .25, 1.2, 3.0 for 5% false positive rate.

c ic S c

a c c

Page 23: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Blue:ours; red: Gargallo and Salvador(GS)

Page 24: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research
Page 25: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Intervention

Page 26: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Intervention to adjust the baseline.

• Outlier → A tail or rare event has occurred– Ignore points → short tail; more false +ve– Use points→ elongated tail, more false -ve

• A robust solution: ignore points but elongate tail – retain same prior mean, increase prior variance.– system adapts, re-initializing the monitor.

• Use the above for mean shifts and variance increase.

• Variance decrease: System stable, make prior tight.

• Slow changes: System under-adaptive, make prior vague.

Page 27: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Intervention strategy

2t 1 1 1

w

Outlier, mean shift, variance increase:

| ~ ( , )( 2,3); max(.9,.95 ) .

Variance decrease:min(1,1.025 ); min(1,1.025 )

Slow change, positive autocorrelation, persistent bli

t t t t v t

new neww v v

D N m m R m n n

w

ps.

max(.7,.95 ); max(.9,.95 )new neww v v

Page 28: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

No intervention, m=1

Page 29: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

strong intervention, m=3

Page 30: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Example: Blue is data, yellow is forecast.

Page 31: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Multiple testing

Page 32: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Multiple testing: A Bayesian Approach.

• Monitoring large number of independent streams – testing multiple hypotheses at each time point – Need correction for multiple testing.

• Main idea: – Derive an empirical null based on observed deviations– Present analyst with interesting cases adjusting for global

characteristics of the system.– We use a Bayesian approach to derive shrinkage

estimates of deviations– the “shrunk” deviations automatically build in penalty for

conducting multiple tests.

Page 33: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Bayesian procedure.

series). timeofnumber (moderate Hermite-Gauss :BayesianFully

series) timeofnumber (large EB using estimated : etersHyperparam

residuals. large of shrinkage-over prevents :component Two

)./(

),;()1/(),;()1/(

))1(()1(),,,;|(

. of meansposterior monitoringby anomaliesDetect

).,()(1)1(~);,(~|

on.distributi predictive of value-pon based score normal:

12

121

21

tttst

ttstttttsttstst

stststststtstttttstst

st

ttttsttsttststst

st

B

uNDPeNDPqq

BuBqquE

NPPNu

u

Page 34: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Experiment comparing multiple testing versus naïve procedure (threshold raw standardized

residuals)• Simulate K noise points N(0,1)

(K=500,1000,..), 100 signal points from [2,11]U[-2,-11].

• Adjust threshold of Bayesian residuals to match sensitivity of naive procedure.

• Compute False Discovery Rate (FDR) for both procedures.

Page 35: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

FDR of naive and Bayesian procedures. The Bayesianmethod gets better with increase in number of time series.

Calculations based on 100 replications.The differences are statistically significant.

Page 36: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Application

Page 37: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

• Goal: To find leading indicators of social disruption events

in China before it gets reported in the mainstream media.• Approach: Monitor the occurrence of a set of pre-defined

patterns on a collection of Chinese websites (mainly news sites, government sites and portals similar to yahoo located in eastern China).

Motivating Application (bio-surveillance).

Page 38: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

English translation of some Chinese patterns being monitored

Page 39: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Notations and transformation.

./1)(

)/)1(1000)/(1000(5.

Tukey -Freeman

./)(Var ;)/(

model,Poisson aUnder

rate. Occurence

.downloaded pages ofnumber

.day on websiteon pattern of freq

rates. ofon distributi thesymmetrize

;dependence ncemean varia remove tion totransforma

ijt

ijtijtijtijt

ijtijtijtijtijtijtijt

ijt

ijt

ththijt

nijt

ZVar

nSnSijt

Z

nrnSrE

n

tjiS

Page 40: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Dotted solid lines: Days when reports appeared in mainstream media

Dotted gray lines: Days when our system found spikes related to the reports that appeared later.

Page 41: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Rough validation using actual media reports.

• July 24th : mystery illness kills 17 people in China, we noticed several spikes on July 17th and 18th alerting us on this.

• Sept 29th and Dec 7th : On Sept 29th , news reports of China carving out emergency plans to fight bird flu and prevent it from spreading to humans. On Dec 7th , a confirmed case of bird flu in humans reported.

• We reported several spikes on Sept 12th and 14th, Nov 2nd, 7th, 11th, and 16th mostly for the pattern influenza, flu, pneumonia, meningitis. On Nov 21st , four big spikes on bf3.syd.com.cn on influenza, flu, pneumonia, meningitis;

emergency, disaster, crisis; prevention and quarantine.

Page 42: Forecast, Detect, Intervene: Anomaly Detection for Time Series. Deepak Agarwal Yahoo! Research

Questions?