the latent behavior space - csiro research · the latent behavior space sequential behavior...
TRANSCRIPT
The Latent Behavior SpaceSequential Behavior Information Encoded in a Vector Space
Jan Reubold, Stephan Escher, Thorsten StrufeTU Dresden
15th August 2017
PRIVACYAND SECURITY
2 / 23
• Social Bots = automation software created to control an OSN account and tries to pose as a human
Motivation /
Social Bots in OSNs
Intentions
Distributing Information Harvesting Information
• prestige influencing (hype/denounce products, companies, people, ...)
• political influencing (framing, hype denounce political topics, ... )
• traditional malicious content (spam, phishing, malware)
• ‘legit’ distribution (news, weather, ...)
• ...
• ‘Crawler’
• Identity Theft
• Personalized information distribution (spear phishing, ...)
• ...
3 / 23
• Countermeasures: Prevention, Detection, Awarness, ...
• Graph Based - detect Sybils with social / interaction graph analysis
• Community Based - user based bot detection + experts
• Machine Learning Based• Behavior Based
• Crowd Based - Coordinated activity of multiple ‘users’
Motivation /
Detection MechanismsSocial Bots in OSNs
4 / 23
• Sybil detection using clickstreams
• Two approaches (supervised, unsuperivsed)
• SVM using 12 features: ∅ clicks per session ∅ session length ∅ time between clicks ∅ # sessions per day Clickstreams → 8 Categories → BoW
• Clustering:
Pure clickstreams + clickstreams enriched with timining (e.g. [c₁, t₁, c₂, t₂, ...]) Clickstreams + Timings → Distance Function → Graph
Motivation / Social Bots in OSNs / Detection Mechanisms /
Approaches IBehavior-Based Detection
Wang, G., Konolige, T., Wilson, C., Wang, X., Zheng, H., & Zhao, B. Y. (2013, August). You Are How You Click: Clickstream Analysis for Sybil Detection. In USENIX Security Symposium (Vol. 9, pp. 1-008).
5 / 23
• Model for normal user behavior
• Behavior that does not fit ⇒ anomalous behavior (sybil + cyborg detection)
• Unsupervised Method:
• PCA ⇒ latent subspace S
Principal Component Analysis (Feature space F)
# likes per day
# likes in specific categories (e.g. sports, politics, education)
Evolution of spatial distribution of observed like categories
Motivation / Social Bots in OSNs / Detection Mechanisms /
Approaches IIBehavior-Based Detection
Viswanath, B., Bashir, M. A., Crovella, M., Guha, S., Gummadi, K. P., Krishnamurthy, B., & Mislove, A. (2014, August). Towards Detecting Anomalous User Behavior in Online Social Networks. In USENIX Security Symposium (pp. 223-238).
F S
x
xn
xr
6 / 23
PipelinesTime-Series DataContribution /
Task: Prediction, Classification, ... e.g. Social-Bot Detection
Input: Time-Series e.g. Click-Traces
Time-Series Features:
Categorical-valued observations e.g. login, send msg, share, ...
Temporal order
TiminigsLabelsAdditional Information
7 / 23
Task: Prediction, Classification, ... e.g. Social-Bot Detection
Input: Time-Series e.g. Click-Traces
LabelsAdditional Information
PipelinesTime-Series DataContribution /
Time-Series Features:
Categorical-valued observations e.g. login, send msg, share, ...
Temporal order
Timinigs
Specialized Methods
Solve Task
Time-Series
Vector Space MethodsTransformation
8 / 23
Standardize scheme • Efficient use of vector space methods
for time-series data (SVM, kNN, PCA, ...)
Idea• Abstract & split time-series (in parallel)
Latent Behavior Space• Span vector space by found patterns• Express users by shown behavior pat-
terns
Our ApproachTime-Series DataContribution /
Segmentation
Transformation
Add. Context
LBS
Vector Space Methods
Time-Series
9 / 23
{ }
{ }
[ ],
[ ]xij ∈Y
xi i1 ilx ,...,xi
=
:
{ }
[ ]
Setup and ConceptOur ApproachContribution /
X x x x={ }1 2 M, ,...,
X x x x={ }1 2 M, ,...,
X x x x= { }1 2 M, ,...,
xij ∈Yxi i1 ilx ,...,x
i=
Time-Series Data
zij ∈C
zij ∈C
,
Data
Time-Series
Activity, Super State
Input:
• Set of time-series e.g. user sessions on Facebook
• Time-series of categorical-valued observations
e.g. ‘send message’, ‘newsfeed’, ‘like’
Concept
• Abstraction of time-series data
• Represent by behavior patterns (called super states)
10 / 23
State Importance:
Transition Probability:
• Simple transition graph
• Transition probabilities governed by importance of super states
• Each node represents super state e.g. on click traces of an OSN: manifestation of an intention
Super State GraphOur ApproachContribution /
Super State Graph
β γ γ| ~ Dir ( )
π α β αβ τ ρc cDir| , ~ -+ ( )( )1 1
11 / 23
Super StatesOur ApproachContribution /
Information Update
Waiting
Super States• Initial-State Distribution
• Transition Distribution (+End-State)
• State-Duration Model
G | ~ Dirc c cψ ψ( ) θ λ λcsT
c c| ,G ~ Dir G( )
θ λ λcI | ~ Dir ( ) µ µ µ σ κνcs cs~ | , /t
cs cs cs2( )
σ χ σ ν σcs cs cs2 2 2 2~ | ,
− ( )
12 / 23
The Latent Behavior SpaceOur ApproachContribution /
Transformation
• Count-based
• Time-based
Latent Behavior Space
ν φu u= ( )X
• Segmentation of user behavior into known patterns
• Represent user by her behavior / exhibited patterns
• Use vector space methods
φ��
�zz
zO ijX
1u
z(i, j)
u
u( ) ∈∑�
φzD
uX1
jX
jX
i
i
( ) ≤ ≤≤ ≤
≤ ≤≤ ≤
∑∑∑∑
z ijli
ijli
iju
u
t
t11
11
13 / 23
The Latent Behavior SpaceOur ApproachContribution /
• Segmentation of user behavior into known patterns
• Represent user by her behavior / exhibited patterns
• Use vector space methods
Transformation
• Count-based
• Time-based
Latent Behavior Space
ν φu u= ( )X
φ��
�zz
zO ijX
1u
z(i, j)
u
u( ) ∈∑�
φzD
uX1
jX
jX
i
i
( ) ≤ ≤≤ ≤
≤ ≤≤ ≤
∑∑∑∑
z ijli
ijli
iju
u
t
t11
11
14 / 23
The Latent Behavior SpaceOur ApproachContribution /
• Segmentation of user behavior into known patterns
• Represent user by her behavior / exhibited patterns
• Use vector space methods
Transformation
• Count-based
• Time-based
Latent Behavior Space
ν φu u= ( )X
φ��
�zz
zO ijX
1u
z(i, j)
u
u( ) ∈∑�
φzD
uX1
jX
jX
i
i
( ) ≤ ≤≤ ≤
≤ ≤≤ ≤
∑∑∑∑
z ijli
ijli
iju
u
t
t11
11
15 / 23
The Latent Behavior SpaceOur ApproachContribution /
• Segmentation of user behavior into known patterns
• Represent user by her behavior / exhibited patterns
• Use vector space methods
Transformation
• Count-based
• Time-based
Latent Behavior Space
ν φu u= ( )X
φ��
�zz
zO ijX
1u
z(i, j)
u
u( ) ∈∑�
φzD
uX1
jX
jX
i
i
( ) ≤ ≤≤ ≤
≤ ≤≤ ≤
∑∑∑∑
z ijli
ijli
iju
u
t
t11
11
16 / 23
The Latent Behavior SpaceOur ApproachContribution /
• Segmentation of user behavior into known patterns
• Represent user by her behavior / exhibited patterns
• Use vector space methods
Transformation
• Count-based
• Time-based
Latent Behavior Space
ν φu u= ( )X
φ��
�zz
zO ijX
1u
z(i, j)
u
u( ) ∈∑�
φzD
uX1
jX
jX
i
i
( ) ≤ ≤≤ ≤
≤ ≤≤ ≤
∑∑∑∑
z ijli
ijli
iju
u
t
t11
11
17 / 23
EvidenceControlled SettingEvaluations /
1 2
3 4
5
0.15
0.65 0.15
0.05
0.050.8
0.15
0.5
0.5
0.1
0.7
0.2
1.0
1 2
3 4
5
0.7
0.05 0.2
0.05
0.2
0.05
0.75
0.9
0.1
0.65
0.1
0.25
1.0
6 7
8 9
a
1.0 0.35
0.65
0.95
0.05
0.25
0.3
0.45
0.40.6
6 7
8 9
a
1.0
0.4
0.60.3 0.3
0.40.35
0.6
0.05
1.0
1
Evidence on controlled settings
• Impact of sequential information for process recovery
• 3 scenarios (sets of super states)
• Scenario I → III: Increasing state space overlap
• Recovery of processes (FNR)
18 / 23
Evidence on controlled settings
• Impact of sequential information for process recovery
EvidenceControlled SettingEvaluations /
0.05
0.10
0.00
0.15
0.20
FNR
(%)
Scenario II Scenario IIIScenario I
0.25
0.05
0.10
0.00
0.15
0.20
FNR
(%)
MMC [3] LDAOurs
1 2
3 4
5
0.15
0.65 0.15
0.05
0.050.8
0.15
0.5
0.5
0.1
0.7
0.2
1.0
1 2
3 4
5
0.7
0.05 0.2
0.05
0.2
0.05
0.75
0.9
0.1
0.65
0.1
0.25
1.0
6 7
8 9
a
1.0 0.35
0.65
0.95
0.05
0.25
0.3
0.45
0.40.6
6 7
8 9
a
1.0
0.4
0.60.3 0.3
0.40.35
0.6
0.05
1.0
1
• 3 scenarios (sets of super states)
• Scenario I → III: Increasing state space overlap
• Recovery of processes (FNR)
19 / 23
• Develop attacker models based on characteristics
• Enrich real-world data set with behavior traces of theoretical attackers
Work in ProgressSocial Bot DetectionEvaluations /
• Profile Characteristic (Sybils, Cyborgs, Zombies, ...)• Social Bot-Networks (Union of social bots) • Massattacks vs Targeted Attacks• OSN Structure• Complexity (send every hour, user based behavior)
20 / 23
SummaryOur ApproachContribution /
Feature Designby leveraging behavior patterns
Goal: Vector space + time-series
Problem: Loss of information
Approaches:
Direct:Integrate sequential information into time-series
Abstraction: Learn patterns from time-series → represent time-series by patterns
21 / 23
SummaryOur ApproachContribution /
Feature Designby leveraging behavior patterns
Goal: Vector space + time-series
Problem: Loss of information
Approaches:
Direct:Integrate sequential information into time-series
Abstraction: Learn patterns from time-series → represent time-series by patterns
Time-Series
LBS
Vector Space Methods
Specialized Methods
Solve Task
22 / 23
Future WorkDiscussion /
Current Work
• Evaluate on real-world data (replace individual transformations with LBS)• Evaluate impact of loss of information (segmentation → LBS)• Build theoretical attacker models
General
• Further work on social bot detection (behavior graph)• Investigate influence streams (e.g. by means of frames) and echo cham-
bers
23 / 23
Future WorkDiscussion /
Thank you! Questions?
Current Work
• Evaluate on real-world data (replace individual transformations with LBS)• Evaluate impact of loss of information (segmentation → LBS)• Build theoretical attacker models
General
• Further work on social bot detection (behavior graph)• Investigate influence streams (e.g. by means of frames) and echo cham-
bers