streaming models and algorithms for communication and information networks
DESCRIPTION
Streaming Models and Algorithms for Communication and Information Networks. Brian Thompson (joint work with James Abello ). Outline. Introduction and Motivation. A Streaming Model. Our Approach. Algorithms. Experimental Results. Conclusions and Future Work. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/1.jpg)
Streaming Models and Algorithms for
Communication and Information NetworksBrian Thompson (joint work with James
Abello)
![Page 2: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/2.jpg)
Outline
Introduction and MotivationA Streaming Model
Algorithms
Experimental Results
Conclusions and Future Work
Streaming Models and Algorithms for Communication and Information Networks
Our Approach
![Page 3: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/3.jpg)
Outline
Introduction and MotivationA Streaming Model
Algorithms
Experimental Results
Conclusions and Future Work
Streaming Models and Algorithms for Communication and Information Networks
Our Approach
![Page 4: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/4.jpg)
Streaming Models and Algorithms for Communication and Information Networks
Data: A network (G;T)G = (V,E) is a graphT is a set of time-stamped events corresponding to
nodes or edges in G
Goals: Identify recent correlated activityMeasure influence between entities
Challenges:Scalability – networks may be very large, limited
spaceEfficiency – high data rate, time-sensitive
informationVariability – entities have different temporal
dynamics
Problem Description
![Page 5: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/5.jpg)
Streaming Models and Algorithms for Communication and Information Networks
Time-evolving graph model - sequence of “snapshots”
Time series analysis
t = 1 t = 2 t = 3 t = 4
12:0
0 AM
1:00
AM
2:00
AM
3:00
AM
4:00
AM
5:00
AM
6:00
AM
7:00
AM
8:00
AM
9:00
AM
10:0
0 AM
11:0
0 AM
12:0
0 PM
1:00
PM
2:00
PM
3:00
PM
4:00
PM
5:00
PM
6:00
PM
7:00
PM
8:00
PM
9:00
PM
10:0
0 PM
11:0
0 PM
IP Traffic (MB Per Hour)
Related Work
![Page 6: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/6.jpg)
Streaming Models and Algorithms for Communication and Information Networks
Cascade model – set of seed nodes, information (product, news, virus) propagates through network
Related Work
![Page 7: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/7.jpg)
Outline
Introduction and MotivationA Streaming Model
Algorithms
Experimental Results
Conclusions and Future Work
Streaming Models and Algorithms for Communication and Information Networks
Our Approach
![Page 8: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/8.jpg)
G is a graph
T is a set of time-stamped events corresponding to nodes or edges in G
Source
Recipient
Content Timestamp
Alice (public) “Fire at 2nd & Main!”
Tuesday, 9:25am
Bob Cheng (private message) Tuesday, 9:27am
Cheng (public) “RT @Alice Fire ...” Tuesday, 9:28am
Alice
BobChen
g
Devika
Elina
Streaming Models and Algorithms for Communication and Information Networks
Data Model
![Page 9: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/9.jpg)
(Node-centric)
Alice
Bob
Cheng
Devika
Elina
Streaming Models and Algorithms for Communication and Information Networks
Data Model
![Page 10: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/10.jpg)
(Edge-centric)
Streaming Models and Algorithms for Communication and Information Networks
Data Model
Bob
Cheng
Alice Devika
Elina
![Page 11: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/11.jpg)
Streaming Models and Algorithms for Communication and Information Networks
A renewal process is a continuous-time Markov process where state transitions occur with holding times sampled independently from a positive distribution .
Let be samples from , and consider a sequence of events corresponding to those holding times.
We call inter-arrival times, and refer to the sequence as the discrete-event sequence for .
t1 t2 t3 t4 t50
:
S3
Renewal Theory
![Page 12: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/12.jpg)
Streaming Models and Algorithms for Communication and Information Networks
The age of a renewal process at time is the amount of time elapsed since the last event:
𝐴𝑔𝑒Φ (𝑡 )={𝑡−max {𝑡𝑖 :𝑡𝑖<𝑡 } i f 𝑡≥ 𝑡1∞otherwise
t1 t2 t3 t4 t50 t
:
𝐴𝑔𝑒Φ (𝑡 )
Renewal Theory
![Page 13: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/13.jpg)
We model a stream of communication data from a node or across an edge as a renewal process
Streaming Models and Algorithms for Communication and Information Networks
xmin xmax
Inter-Arrival Time Distribution
Discrete-event sequence:
t1 t2 t3 t4 t5
REneWal theory Approach for Real-time Data StreamsThe REWARDS Model
![Page 14: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/14.jpg)
Given a stream of time-stamped events, we estimate the parameters of the renewal process for each nodeor edge based on the inter-arrival times
Streaming Models and Algorithms for Communication and Information Networks
xmin xmax
Inter-Arrival Time Distribution
REneWal theory Approach for Real-time Data Streams
Discrete-event sequence:
t1 t2 t3 t4 t5
The REWARDS Model
![Page 15: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/15.jpg)
Outline
Introduction and MotivationA Streaming Model
Algorithms
Experimental Results
Conclusions and Future Work
Streaming Models and Algorithms for Communication and Information Networks
Our Approach
![Page 16: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/16.jpg)
Streaming Models and Algorithms for Communication and Information Networks
Goal: highlight recent activityKey idea: more recent = more relevant
Challenge: The most frequent communicators will always seem “recent”, overshadowing others’ behavior.
We call this time-scale bias.
8:00 am 10:00 am 12:00 pm NOW!
alice1337
bob_iz_kewl
User:
User:
Recency
![Page 17: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/17.jpg)
Streaming Models and Algorithms for Communication and Information Networks
We can overcome time-scale bias by using the REWARDS Model
We first derive the limit distribution of the function:
We define the recency of at time to be:
𝑅𝑒𝑐Φ (𝑡 )=1−𝐹Φ𝐴𝑔𝑒∗ ( 𝐴𝑔𝑒Φ (𝑡 ) )
𝐹Φ𝐴𝑔𝑒∗ (𝜏 )=lim
𝑡→∞Pr (𝐴𝑔𝑒Φ (𝑡 )≤𝜏 )
Recency
![Page 18: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/18.jpg)
Streaming Models and Algorithms for Communication and Information Networks
is a decreasing function on every interval . It also satisfies the uniformity property: for any renewal process , the limit distribution of is Uniform(0,1).
Recency effectively normalizes the age of a process relative to its own temporal dynamics, making our approach robust to differences in time scale between networks or between entities within the same network.
Recency of Edge <3,22> in Bluetooth Dataset
Recency
![Page 19: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/19.jpg)
Streaming Models and Algorithms for Communication and Information Networks
Goal: measure influence of entity A on entity BKey idea: study pairwise (A,B)-gaps
Challenge: More frequent communicators will tend to always have shorter “gaps”.
8:00 am 10:00 am 12:00 pm NOW!
alice1337
bob_iz_kewl
User:
User:
Another example of time-scale bias.
Delay
![Page 20: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/20.jpg)
Streaming Models and Algorithms for Communication and Information Networks
Given renewal processes and , we say the ordered pair of events are adjacent if and . We refer to the elapsed time as the pairwise gap. We denote by the most recent such gap at time .
If and are independent processes, then we can derive the limit distribution of pairwise gaps between consecutive event pairs.
We define the -delay at time to be:
𝐷𝑒𝑙Φ ,Ψ (𝑡 )=1−𝐹Φ ,Ψ𝐺𝑎𝑝∗ (𝐺𝑎𝑝Φ ,Ψ (𝑡 ) )
Delay
![Page 21: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/21.jpg)
Streaming Models and Algorithms for Communication and Information Networks
is a constant function on every interval , and also satisfies the uniformity property: for any pair of independent renewal process and , the limit distribution of is Uniform(0,1).
By comparing an observed gap to the theoretical joint distribution of inter-arrival times for and , delay effectively normalizes the gap relative to the temporal dynamics of and individually.
Similarly to the recency function, this makes our approach robust to differences in time scale between networks or between entities within the same network.
Delay
![Page 22: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/22.jpg)
Outline
Introduction and MotivationA Streaming Model
Algorithms
Experimental Results
Conclusions and Future Work
Streaming Models and Algorithms for Communication and Information Networks
Our Approach
![Page 23: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/23.jpg)
Divergence
Based on the Kolmogorov-Smirnov statistic:
Recency divergence compares recency values for a set of nodes or edges to the CDF for Uniform(0,1)
Delay divergence compares delay values for a set of edges, or for all (A,B)-gaps, to the CDF for Uniform(0,1)Streaming Models and Algorithms for Communication and Information Networks
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
Fn(x) F(x)
Compares empirical EDF Fn(x)to hypothetical CDF F(x)
𝑲𝑺 (𝑭𝒏∨¿𝑭 )=𝐬𝐮𝐩 (𝑭𝒏(𝒙 )−𝑭 (𝒙 ))KS = 0.32
![Page 24: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/24.jpg)
Streaming Node-Centric Algorithm
• Goal: Flag times at which a node exhibits anomalous activity (indicated by an unusually high concentration of recent outgoing communication)
• Approach: Since the recency function is decreasing between consecutive communication, measure the recency divergence at a node only at times at which new activity occurs
Streaming Models and Algorithms for Communication and Information Networks
![Page 25: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/25.jpg)
The MCD Algorithm
• Goal: Identify subgraphs with correlated behavior
• Recency divergence to find recent anomalous activity
• Delay divergence to identify spheres of influence
Streaming Models and Algorithms for Communication and Information Networks
Challenge: How do we overcome the combinatorial explosion?
Maximal Component Divergence Algorithm
![Page 26: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/26.jpg)
2.9
2.7
The MCD Algorithm
V2
V3
V1
V5
V4
0.9
0.750.7
0.1
0.5
0.3
2.4
V1 V2
V3
V4 V5
θ Component Div(C)
0.9 {V1,V2} 2.908
0.75 {V1,V2,V3} 2.723
0.7 {V1,V2,V3} 6.132
0.5 {V4,V5} 1.143
0.3 {V1,V2,V3,V4,V5} 2.380
0.1 {V1,V2,V3,V4,V5} 1.882
1. Calculate edge weights using recency or delay function
2. Gradually decrease the threshold, updating components and divergence values as necessary
3. Output: Disjoint components with max divergence
6.1
2.9 1.1
Streaming Models and Algorithms for Communication and Information Networks
Maximal Component Divergence Algorithm
![Page 27: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/27.jpg)
Sample OutputMCD θ #V(C) E-frac %E(C) %E(G)
14.57 0.07 54 53/212 0.25 0.08
12.84 0.08 32 31/88 0.35 0.08
3.70 0.10 6 5/7 0.71 0.10
2.97 0.18 5 4/4 1.00 0.14
1.91 0.05 7 6/41 0.15 0.04
Streaming Models and Algorithms for Communication and Information Networks
![Page 28: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/28.jpg)
Outline
Introduction and MotivationA Streaming Model
Algorithms
Experimental Results
Conclusions and Future Work
Streaming Models and Algorithms for Communication and Information Networks
Our Approach
![Page 29: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/29.jpg)
Robustness to Time Scale
Streaming Models and Algorithms for Communication and Information Networks
• Simulation: R-MAT model, 128 vertices, avg. degree 16
• IATs for edge activity sampled from Bounded Pareto distributions, rate parameter btwn 10 mins. and 1 week
• Every 5 days, a randomly selected node has anomalous activity at 10x its normal rate
![Page 30: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/30.jpg)
Robustness to Time Scale
Streaming Models and Algorithms for Communication and Information Networks
![Page 31: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/31.jpg)
Robustness to Time Scale
Streaming Models and Algorithms for Communication and Information Networks
• Conclusion: While it takes longer for anomalous activity to be recognized at nodes with lower rates, the magnitude of the peak seems to be independent of activity rate but highly correlated with degree
![Page 32: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/32.jpg)
Accuracy and Precision
Streaming Models and Algorithms for Communication and Information Networks
• Simulation: star network, 100 trials w/ only normal activity and 100 trials including a period of anomalous activity
• ROC curves show accuracy and precision for several methods for distinguishing between the two scenarios
• Conclusion: Especially when variability is introduced, our approach out-performs the WtdDeg and Z-Score metrics
![Page 33: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/33.jpg)
Detection Latency
Streaming Models and Algorithms for Communication and Information Networks
• Data: Enron corpus, 1k nodes, 2k edges, 4k timestamps
• Compare our approach with GraphScope Algorithm
• Conclusion: The two algorithms seem to identify similar times of anomalous activity, but our approach based on the REWARDS model has shorter response time
![Page 34: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/34.jpg)
Anomaly Detection in IP Traffic
Streaming Models and Algorithms for Communication and Information Networks
• Data: LBNL network trace, > 9 million timestamps during one hour on December 15, 2004
• Compare our approach with total network volume and with “scanning activity” labeled by LBNL analysts
![Page 35: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/35.jpg)
Anomaly Detection in IP Traffic
Streaming Models and Algorithms for Communication and Information Networks
• Three of the four times of highest correspond to labeled scanning activity
• The peak in scanning activity at 12:07pm is primarily due to an increase in DNS and NBNS lookups
• The peak at 12:26pm was not flagged by the analysts since the sequence of IP addresses was not monotonic
![Page 36: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/36.jpg)
Complexity Analysis
Dataset: Twitter messages, Nov. 2008 – Oct. 2009 (263k nodes, 308k edges, 1.1 million timestamps)
Updates O(1) per communication
MCD Algorithm O(m log m), where m = # of edges; can be approximated in effectively O(m) time
0 15,000 30,000 45,000 60,0000
500
1000
1500
2000
Runtime for MCD Algorithm
number of live edges
runti
me (
milliseco
nds)
Streaming Models and Algorithms for Communication and Information Networks
![Page 37: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/37.jpg)
Outline
Introduction and MotivationA Streaming Model
Algorithms
Experimental Results
Conclusions and Future Work
Streaming Models and Algorithms for Communication and Information Networks
Our Approach
![Page 38: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/38.jpg)
Future Work
Incorporate duration of communication and other node or edge attributes into our model
Make use of geographical and textual content
Use gap divergence to infer links, compare to approach of Gomez-Rodriguez et. al.
Develop streaming algorithm to identify emerging trends
Streaming Models and Algorithms for Communication and Information Networks
![Page 39: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/39.jpg)
Acknowledgements
Part of this work was conducted at Lawrence Livermore National Laboratory, under the guidance of Tina Eliassi-Rad.
This project is partially supported by a DHS Career Development Grant, under the auspices of CCICADA, a DHS Center of Excellence.
Streaming Models and Algorithms for Communication and Information Networks
![Page 40: Streaming Models and Algorithms for Communication and Information Networks](https://reader030.vdocuments.site/reader030/viewer/2022032606/56812d4d550346895d924fed/html5/thumbnails/40.jpg)
Questions?
Streaming Models and Algorithms for Communication and Information Networks