the latent behavior space - csiro research · the latent behavior space sequential behavior...

23
The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden 15th August 2017 PRIVACY AND SECURITY

Upload: others

Post on 17-Jul-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

The Latent Behavior SpaceSequential Behavior Information Encoded in a Vector Space

Jan Reubold, Stephan Escher, Thorsten StrufeTU Dresden

15th August 2017

PRIVACYAND SECURITY

Page 2: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

2 / 23

• Social Bots = automation software created to control an OSN account and tries to pose as a human

Motivation /

Social Bots in OSNs

Intentions

Distributing Information Harvesting Information

• prestige influencing (hype/denounce products, companies, people, ...)

• political influencing (framing, hype denounce political topics, ... )

• traditional malicious content (spam, phishing, malware)

• ‘legit’ distribution (news, weather, ...)

• ...

• ‘Crawler’

• Identity Theft

• Personalized information distribution (spear phishing, ...)

• ...

Page 3: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

3 / 23

• Countermeasures: Prevention, Detection, Awarness, ...

• Graph Based - detect Sybils with social / interaction graph analysis

• Community Based - user based bot detection + experts

• Machine Learning Based• Behavior Based

• Crowd Based - Coordinated activity of multiple ‘users’

Motivation /

Detection MechanismsSocial Bots in OSNs

Page 4: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

4 / 23

• Sybil detection using clickstreams

• Two approaches (supervised, unsuperivsed)

• SVM using 12 features: ∅ clicks per session ∅ session length ∅ time between clicks ∅ # sessions per day Clickstreams → 8 Categories → BoW

• Clustering:

Pure clickstreams + clickstreams enriched with timining (e.g. [c₁, t₁, c₂, t₂, ...]) Clickstreams + Timings → Distance Function → Graph

Motivation / Social Bots in OSNs / Detection Mechanisms /

Approaches IBehavior-Based Detection

Wang, G., Konolige, T., Wilson, C., Wang, X., Zheng, H., & Zhao, B. Y. (2013, August). You Are How You Click: Clickstream Analysis for Sybil Detection. In USENIX Security Symposium (Vol. 9, pp. 1-008).

Page 5: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

5 / 23

• Model for normal user behavior

• Behavior that does not fit ⇒ anomalous behavior (sybil + cyborg detection)

• Unsupervised Method:

• PCA ⇒ latent subspace S

Principal Component Analysis (Feature space F)

# likes per day

# likes in specific categories (e.g. sports, politics, education)

Evolution of spatial distribution of observed like categories

Motivation / Social Bots in OSNs / Detection Mechanisms /

Approaches IIBehavior-Based Detection

Viswanath, B., Bashir, M. A., Crovella, M., Guha, S., Gummadi, K. P., Krishnamurthy, B., & Mislove, A. (2014, August). Towards Detecting Anomalous User Behavior in Online Social Networks. In USENIX Security Symposium (pp. 223-238).

F S

x

xn

xr

Page 6: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

6 / 23

PipelinesTime-Series DataContribution /

Task: Prediction, Classification, ... e.g. Social-Bot Detection

Input: Time-Series e.g. Click-Traces

Time-Series Features:

Categorical-valued observations e.g. login, send msg, share, ...

Temporal order

TiminigsLabelsAdditional Information

Page 7: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

7 / 23

Task: Prediction, Classification, ... e.g. Social-Bot Detection

Input: Time-Series e.g. Click-Traces

LabelsAdditional Information

PipelinesTime-Series DataContribution /

Time-Series Features:

Categorical-valued observations e.g. login, send msg, share, ...

Temporal order

Timinigs

Specialized Methods

Solve Task

Time-Series

Vector Space MethodsTransformation

Page 8: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

8 / 23

Standardize scheme • Efficient use of vector space methods

for time-series data (SVM, kNN, PCA, ...)

Idea• Abstract & split time-series (in parallel)

Latent Behavior Space• Span vector space by found patterns• Express users by shown behavior pat-

terns

Our ApproachTime-Series DataContribution /

Segmentation

Transformation

Add. Context

LBS

Vector Space Methods

Time-Series

Page 9: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

9 / 23

{ }

{ }

[ ],

[ ]xij ∈Y

xi i1 ilx ,...,xi

=

:

{ }

[ ]

Setup and ConceptOur ApproachContribution /

X x x x={ }1 2 M, ,...,

X x x x={ }1 2 M, ,...,

X x x x= { }1 2 M, ,...,

xij ∈Yxi i1 ilx ,...,x

i=

Time-Series Data

zij ∈C

zij ∈C

,

Data

Time-Series

Activity, Super State

Input:

• Set of time-series e.g. user sessions on Facebook

• Time-series of categorical-valued observations

e.g. ‘send message’, ‘newsfeed’, ‘like’

Concept

• Abstraction of time-series data

• Represent by behavior patterns (called super states)

Page 10: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

10 / 23

State Importance:

Transition Probability:

• Simple transition graph

• Transition probabilities governed by importance of super states

• Each node represents super state e.g. on click traces of an OSN: manifestation of an intention

Super State GraphOur ApproachContribution /

Super State Graph

β γ γ| ~ Dir ( )

π α β αβ τ ρc cDir| , ~ -+ ( )( )1 1

Page 11: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

11 / 23

Super StatesOur ApproachContribution /

Information Update

Waiting

Super States• Initial-State Distribution

• Transition Distribution (+End-State)

• State-Duration Model

G | ~ Dirc c cψ ψ( ) θ λ λcsT

c c| ,G ~ Dir G( )

θ λ λcI | ~ Dir ( ) µ µ µ σ κνcs cs~ | , /t

cs cs cs2( )

σ χ σ ν σcs cs cs2 2 2 2~ | ,

− ( )

Page 12: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

12 / 23

The Latent Behavior SpaceOur ApproachContribution /

Transformation

• Count-based

• Time-based

Latent Behavior Space

ν φu u= ( )X

• Segmentation of user behavior into known patterns

• Represent user by her behavior / exhibited patterns

• Use vector space methods

φ��

�zz

zO ijX

1u

z(i, j)

u

u( ) ∈∑�

φzD

uX1

jX

jX

i

i

( ) ≤ ≤≤ ≤

≤ ≤≤ ≤

∑∑∑∑

z ijli

ijli

iju

u

t

t11

11

Page 13: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

13 / 23

The Latent Behavior SpaceOur ApproachContribution /

• Segmentation of user behavior into known patterns

• Represent user by her behavior / exhibited patterns

• Use vector space methods

Transformation

• Count-based

• Time-based

Latent Behavior Space

ν φu u= ( )X

φ��

�zz

zO ijX

1u

z(i, j)

u

u( ) ∈∑�

φzD

uX1

jX

jX

i

i

( ) ≤ ≤≤ ≤

≤ ≤≤ ≤

∑∑∑∑

z ijli

ijli

iju

u

t

t11

11

Page 14: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

14 / 23

The Latent Behavior SpaceOur ApproachContribution /

• Segmentation of user behavior into known patterns

• Represent user by her behavior / exhibited patterns

• Use vector space methods

Transformation

• Count-based

• Time-based

Latent Behavior Space

ν φu u= ( )X

φ��

�zz

zO ijX

1u

z(i, j)

u

u( ) ∈∑�

φzD

uX1

jX

jX

i

i

( ) ≤ ≤≤ ≤

≤ ≤≤ ≤

∑∑∑∑

z ijli

ijli

iju

u

t

t11

11

Page 15: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

15 / 23

The Latent Behavior SpaceOur ApproachContribution /

• Segmentation of user behavior into known patterns

• Represent user by her behavior / exhibited patterns

• Use vector space methods

Transformation

• Count-based

• Time-based

Latent Behavior Space

ν φu u= ( )X

φ��

�zz

zO ijX

1u

z(i, j)

u

u( ) ∈∑�

φzD

uX1

jX

jX

i

i

( ) ≤ ≤≤ ≤

≤ ≤≤ ≤

∑∑∑∑

z ijli

ijli

iju

u

t

t11

11

Page 16: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

16 / 23

The Latent Behavior SpaceOur ApproachContribution /

• Segmentation of user behavior into known patterns

• Represent user by her behavior / exhibited patterns

• Use vector space methods

Transformation

• Count-based

• Time-based

Latent Behavior Space

ν φu u= ( )X

φ��

�zz

zO ijX

1u

z(i, j)

u

u( ) ∈∑�

φzD

uX1

jX

jX

i

i

( ) ≤ ≤≤ ≤

≤ ≤≤ ≤

∑∑∑∑

z ijli

ijli

iju

u

t

t11

11

Page 17: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

17 / 23

EvidenceControlled SettingEvaluations /

1 2

3 4

5

0.15

0.65 0.15

0.05

0.050.8

0.15

0.5

0.5

0.1

0.7

0.2

1.0

1 2

3 4

5

0.7

0.05 0.2

0.05

0.2

0.05

0.75

0.9

0.1

0.65

0.1

0.25

1.0

6 7

8 9

a

1.0 0.35

0.65

0.95

0.05

0.25

0.3

0.45

0.40.6

6 7

8 9

a

1.0

0.4

0.60.3 0.3

0.40.35

0.6

0.05

1.0

1

Evidence on controlled settings

• Impact of sequential information for process recovery

• 3 scenarios (sets of super states)

• Scenario I → III: Increasing state space overlap

• Recovery of processes (FNR)

Page 18: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

18 / 23

Evidence on controlled settings

• Impact of sequential information for process recovery

EvidenceControlled SettingEvaluations /

0.05

0.10

0.00

0.15

0.20

FNR

(%)

Scenario II Scenario IIIScenario I

0.25

0.05

0.10

0.00

0.15

0.20

FNR

(%)

MMC [3] LDAOurs

1 2

3 4

5

0.15

0.65 0.15

0.05

0.050.8

0.15

0.5

0.5

0.1

0.7

0.2

1.0

1 2

3 4

5

0.7

0.05 0.2

0.05

0.2

0.05

0.75

0.9

0.1

0.65

0.1

0.25

1.0

6 7

8 9

a

1.0 0.35

0.65

0.95

0.05

0.25

0.3

0.45

0.40.6

6 7

8 9

a

1.0

0.4

0.60.3 0.3

0.40.35

0.6

0.05

1.0

1

• 3 scenarios (sets of super states)

• Scenario I → III: Increasing state space overlap

• Recovery of processes (FNR)

Page 19: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

19 / 23

• Develop attacker models based on characteristics

• Enrich real-world data set with behavior traces of theoretical attackers

Work in ProgressSocial Bot DetectionEvaluations /

• Profile Characteristic (Sybils, Cyborgs, Zombies, ...)• Social Bot-Networks (Union of social bots) • Massattacks vs Targeted Attacks• OSN Structure• Complexity (send every hour, user based behavior)

Page 20: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

20 / 23

SummaryOur ApproachContribution /

Feature Designby leveraging behavior patterns

Goal: Vector space + time-series

Problem: Loss of information

Approaches:

Direct:Integrate sequential information into time-series

Abstraction: Learn patterns from time-series → represent time-series by patterns

Page 21: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

21 / 23

SummaryOur ApproachContribution /

Feature Designby leveraging behavior patterns

Goal: Vector space + time-series

Problem: Loss of information

Approaches:

Direct:Integrate sequential information into time-series

Abstraction: Learn patterns from time-series → represent time-series by patterns

Time-Series

LBS

Vector Space Methods

Specialized Methods

Solve Task

Page 22: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

22 / 23

Future WorkDiscussion /

Current Work

• Evaluate on real-world data (replace individual transformations with LBS)• Evaluate impact of loss of information (segmentation → LBS)• Build theoretical attacker models

General

• Further work on social bot detection (behavior graph)• Investigate influence streams (e.g. by means of frames) and echo cham-

bers

Page 23: The Latent Behavior Space - CSIRO Research · The Latent Behavior Space Sequential Behavior Information Encoded in a Vector Space Jan Reubold, Stephan Escher, Thorsten Strufe TU Dresden

23 / 23

Future WorkDiscussion /

Thank you! Questions?

Current Work

• Evaluate on real-world data (replace individual transformations with LBS)• Evaluate impact of loss of information (segmentation → LBS)• Build theoretical attacker models

General

• Further work on social bot detection (behavior graph)• Investigate influence streams (e.g. by means of frames) and echo cham-

bers