midas: microcluster-based detector of anomalies in edge ...sbhatia/assets/pdf/midasslides.pdf ·...
TRANSCRIPT
1
MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams
Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, Christos [email protected]
@siddharthb_
2Introduction Method Experiments ConclusionProblem Related Work Future Work
Intrusion Detection
T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 3Introduction Method Experiments ConclusionProblem Related Work Future Work
Time Evolving Graphs
T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs
Computer Networks Twitter / IM Networks
4Introduction Method Experiments ConclusionProblem Related Work Future Work
Edge Streams
T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs
u1
u2
u3
u4
u5
a1 a2a3
u6
5Introduction Method Experiments ConclusionProblem Related Work
𝑢! 𝑢" 𝑢# 𝑢$ 𝑢$ 𝑢% 𝑢% 𝑎! 𝑎! 𝑎! 𝑎" 𝑎" 𝑎" 𝑎# 𝑎# 𝑎#1 2 2 3 4 4 5 6 6 6 6 6 6 6 6 6
𝑢" 𝑢! 𝑢% 𝑢# 𝑢& 𝑢& 𝑢& 𝑢" 𝑢# 𝑢% 𝑢" 𝑢% 𝑢& 𝑢# 𝑢% 𝑢&
Time
Future Work
Roadmap
T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 6Introduction Method Experiments ConclusionProblem Related Work
• Problem• Algorithm•MIDAS•MIDAS-R
• Related Work • Experiments• Future Work
Future Work
Problem
T3.1: Detecting Anomalous Dynamic Subgraphs
Input:• Edge stream 𝑬 from time evolving graph 𝑮• Directed, multigraph, discrete time
Output:• Anomaly Score for each edge
Our Contributions:• Microcluster Detection• Guarantees on False Positive Probability• Constant Memory• Constant Update Time
7Introduction Method Experiments ConclusionProblem Related Work Future Work
Roadmap
T3.1: Detecting Anomalous Dynamic Subgraphs 8Introduction Method Experiments ConclusionProblem Related Work
• Problem• Algorithm•MIDAS•MIDAS-R
• Related Work • Experiments• Future Work
Future Work
T3.1: Detecting Anomalous Dynamic Subgraphs
Background
T3.1: Detecting Anomalous Dynamic Subgraphs
Offline:• Time series methods
Online:• Bounded memory• Set of nodes is not fixed
9Introduction Method Experiments ConclusionProblem Related Work
Detecting Microcluster
Future Work
T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs
Gaussian distribution:
• Find mean & standard deviation
• Compute Gaussian likelihood(edge occurrences at 𝑡)
• if 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 < 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑, declare anomaly
10Introduction Method Experiments ConclusionProblem Related Work Future Work
Naive Solution
T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs
Past(t<10)
Current(t=10)
Expected 900 100Observed 0 1000
11Introduction Method Experiments ConclusionProblem Related Work
Mean Level at 𝑡 = 10 equals Mean Level for 𝑡 < 10
Our assumption
Future Work
T3.1: Detecting Anomalous Dynamic Subgraphs
𝑠𝑢𝑣 : 𝑢 − 𝑣 edges up to time 𝑡𝑎𝑢𝑣 : 𝑢 − 𝑣 edges at current time 𝑡
�̂�𝑢𝑣 : Approximate total count#𝑎𝑢𝑣 : Approximate current count
12Introduction Method Experiments ConclusionProblem Related Work
Streaming Data Structure: CMS
Future Work
Details
4𝑎𝑢𝑣 ≤ 𝑎𝑢𝑣 + 𝜐𝑁𝑡 with probability at least 1 − 𝜀
𝜐 is the amount of error we can tolerate.1 − 𝜀 is the probability.e.g. with 99% probability only up to 0.5% error
T3.1: Detecting Anomalous Dynamic Subgraphs
expectedobserved
Details
T3.1: Detecting Anomalous Dynamic Subgraphs 13Introduction Method Experiments ConclusionProblem Related Work Future Work
Anomaly Score: Chi-Squared Test
Roadmap
T3.1: Detecting Anomalous Dynamic Subgraphs 14Introduction Method Experiments ConclusionProblem Related Work
• Problem• Algorithm•MIDAS•MIDAS-R
• Related Work• Experiments• Future Work
Future Work
MIDAS-R: Incorporating Relations
T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs
Details
Temporal Relations:• Allows past edges to count toward the current time tick• Diminishing weight• At end of every time tick 𝑡, reduce counts by 𝛼 𝜖 [0,1]
15Introduction Method Experiments ConclusionProblem Related Work
1
2
3
4
5
6
7
8
9
444
888
Future Work
Spatial Relations:• Catch large groups of spatially nearby edges• 𝑠𝑢 : edges adjacent to node 𝑢 up to time 𝑡• 𝑎𝑢 : edges adjacent to node 𝑢 at current time 𝑡• max(𝑠𝑐𝑜𝑟𝑒 𝑢, 𝑣, 𝑡 , 𝑠𝑐𝑜𝑟𝑒 𝑢, 𝑡 , 𝑠𝑐𝑜𝑟𝑒 𝑣, 𝑡 )
Time and Memory Complexity
T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs
𝑑: number of hash functions𝑤: number of buckets
Space complexity:• 𝑂(𝑤𝑑)
Time complexity:• 𝑂(𝑑)
16Introduction Method Experiments ConclusionProblem Related Work Future Work
Roadmap
• Problem• Algorithm•MIDAS•MIDAS-R
• Related Work • Experiments• Future Work
T3.1: Detecting Anomalous Dynamic Subgraphs 17Introduction Method Experiments ConclusionProblem Related Work Future Work
Related Work
T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs
Anomaly Detection in Static Graphs:• Node detection: CATCHSYNC [2016], ODDBALL [2010]• Subgraph detection: [SHIN ET AL., 2018], FRAUDER [2017]• Edge detection: NRMF [2011], AUTOPART [2004]
Anomaly Detection in Graph Streams:• Node detection: DTA [2006] • Subgraph detection: CATCHSYNC [2016], COPYCATCH [2013]• Edge detection: ANOMRANK [2019], SPOTLIGHT [2018]
Anomaly Detection in Edge Streams:• Node detection: [YU ET AL., 2013]• Subgraph detection: DENSEALERT [2017]• Edge detection: SEDANSPOT [2018], RHSS [2016]
18Introduction Method Experiments ConclusionProblem Related Work Future Work
Comparison
T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 19Introduction Method Experiments ConclusionProblem Related Work Future Work
Roadmap
20Introduction Method Experiments ConclusionProblem Related Work
• Problem• Algorithm•MIDAS•MIDAS-R
• Related Work • Experiments• Future Work
Future Work
Datasets
T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs
1. DARPA: 4.5M IP-IP communications, 87.7K minutes 2. TwitterSecurity: 2.6M tweets (May-August, 2014)3. TwitterWorldCup: 1.7M tweets (June-July, 2014)
21Introduction Method Experiments ConclusionProblem Related Work Future Work
Accuracy
T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 22Introduction Method Experiments ConclusionProblem Related Work Future Work
Running Times
T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 23Introduction Method Experiments ConclusionProblem Related Work Future Work
Accuracy vs Time
T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 24Introduction Method Experiments ConclusionProblem Related Work
Accu
racy
(AU
C)
0
0.333
0.667
1
Wall-clock time (s)0.1 1 10 100
0.64
0.91 0.95
Midas-R Midas SedanSpot
Future Work
Scalability
T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 25Introduction Method Experiments ConclusionProblem Related Work Future Work
Real-World Effectiveness
26T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs
1. 13-05-2014. Turkey Mine Accident2. 24-05-2014. Raid3. 30-05-2014. Attack/Ambush
03-06-14. Suicide bombing4. 09-06-14. Suicide/Truck bombings5. 10-06-2014. Iraqi Militants Seize
11-06-2014. Kidnapping6. 15-06-14. Mpeketoni attack7. 26-06-14. Suicide Bombing8. 03-07-14. Gaza conflicts9. 18-07-14. Airplane shot10. 30-07-14. Ebola Virus Outbreak
Real-World Effectiveness
27T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs
1. 13-05-2014. Turkey Mine Accident2. 24-05-2014. Raid3. 30-05-2014. Attack/Ambush
03-06-14. Suicide bombing4. 09-06-14. Suicide/Truck bombings5. 10-06-2014. Iraqi Militants Seize
11-06-2014. Kidnapping6. 15-06-14. Mpeketoni attack7. 26-06-14. Suicide Bombing8. 03-07-14. Gaza conflicts9. 18-07-14. Airplane shot10. 30-07-14. Ebola Virus Outbreak
Real-World Effectiveness
28T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs
1. 13-05-2014. Turkey Mine Accident2. 24-05-2014. Raid3. 30-05-2014. Attack/Ambush
03-06-14. Suicide bombing4. 09-06-14. Suicide/Truck bombings5. 10-06-2014. Iraqi Militants Seize
11-06-2014. Kidnapping6. 15-06-14. Mpeketoni attack7. 26-06-14. Suicide Bombing8. 03-07-14. Gaza conflicts9. 18-07-14. Airplane shot10. 30-07-14. Ebola Virus Outbreak
Real-World Effectiveness
29T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs
1. 13-05-2014. Turkey Mine Accident2. 24-05-2014. Raid3. 30-05-2014. Attack/Ambush
03-06-14. Suicide bombing4. 09-06-14. Suicide/Truck bombings5. 10-06-2014. Iraqi Militants Seize
11-06-2014. Kidnapping6. 15-06-14. Mpeketoni attack7. 26-06-14. Suicide Bombing8. 03-07-14. Gaza conflicts9. 18-07-14. Airplane shot10. 30-07-14. Ebola Virus Outbreak
Real-World Effectiveness
30T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs
1
2
3
4
5
6
7
8
9
444
888
Roadmap
31Introduction Method Experiments ConclusionProblem Related Work Future Work
• Problem• Algorithm•MIDAS•MIDAS-R
• Related Work • Experiments• Future Work
MSTREAM
T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 32Introduction Method Experiments ConclusionProblem Related Work
Challenges:
• High number of dimensions• Real-valued features• Correlation between features• Constant memory & time both w.r.t. stream length
and dimensionality
Future Work
33
How to Contribute?
1. MIDAS-F: Filtering MIDAS
2. Parallel/Distributed/GPU version
3. Hardware implementation over FPGA
4. Knowledge Graphs (NLP)
5. Periodic setting
6. Erlang, F# implementations
Introduction Method Experiments ConclusionProblem Related Work Future Work
Conclusion
T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 34Introduction Method Experiments ConclusionProblem Related Work
1. Streaming Microcluster Detection:• Constant time and memory
2. Theoretical Guarantees:• False Positive Probability
3. Effectiveness:• MIDAS is 42%-48% more accurate• MIDAS processes the data 162×-644× faster
Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin and Christos Faloutsos. “MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams.” AAAI Conference on Artificial Intelligence (AAAI), 2020. https://arxiv.org/abs/1911.04464
Future Work
https://github.com/Stream-AD/MIDAS/