identifying on-line fraudsters: anomaly detection using network effects

Post on 10-Jan-2016

41 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Identifying on-line Fraudsters: Anomaly Detection Using Network Effects. Christos Faloutsos CMU. Thanks. Saman Haqqi. Roadmap. Graph problems: G1: Fraud detection – BP G2: Botnet detection – spectral G3: Beyond graphs: tensors and ``NELL’’ Influence propagation and spike modeling - PowerPoint PPT Presentation

TRANSCRIPT

CMU SCS

Identifying on-line Fraudsters: Anomaly Detection Using

Network Effects

Christos Faloutsos

CMU

CMU SCS

Thanks

• Saman Haqqi

IBM-PBGH June 2013 C. Faloutsos (CMU) 2

CMU SCS

C. Faloutsos (CMU) 3

Roadmap

• Graph problems:– G1: Fraud detection – BP– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling– C1: spikeM model

• Conclusions

IBM-PBGH June 2013

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 4

E-bay Fraud detection

w/ Polo Chau &Shashank Pandit, CMU[www’07]

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 5

E-bay Fraud detection

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 6

E-bay Fraud detection

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 7

E-bay Fraud detection - NetProbe

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 8

E-bay Fraud detection - NetProbe

F A H

F 99%

A 99%

H 49% 49%

Compatibilitymatrix

heterophily

details

CMU SCS

C. Faloutsos (CMU) 9

Background 1: Belief Propagation Equations

mij (x j ) = φi (xi ) ⋅ψ ij (xi , x j ) ⋅ mni (xi )n∈N (i)\ j

∏xi

bi (xi ) = η ⋅φi (xi ) ⋅ mij (xi )j∈N (i)

[Pearl ‘82][Yedidia+ ‘02]…[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10]

IBM-PBGH June 2013

~bi (xi )

CMU SCS

C. Faloutsos (CMU) 10

Background 1: Belief Propagation Equations

mij (x j ) = φi (xi ) ⋅ψ ij (xi , x j ) ⋅ mni (xi )n∈N (i)\ j

∏xi

bi (xi ) = η ⋅φi (xi ) ⋅ mij (xi )j∈N (i)

[Pearl ‘82][Yedidia+ ‘02]…[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10]

IBM-PBGH June 2013

~bi (xi )

F A H

F 99%

A 99%

H 49% 49%

CMU SCS

Popular press

And less desirable attention:• E-mail from ‘Belgium police’ (‘copy of

your code?’)

IBM-PBGH June 2013 C. Faloutsos (CMU) 11

CMU SCS

C. Faloutsos (CMU) 12

Roadmap

• Graph problems:– G1: Fraud detection – BP

• Ebay• Symantec• Unification

– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Conclusions

IBM-PBGH June 2013

CMU SCS

Polo ChauMachine Learning Dept

Carey NachenbergVice President & Fellow

Jeffrey WilhelmPrincipal Software Engineer

Adam WrightSoftware Engineer

Prof. Christos FaloutsosComputer Science Dept

Polonium: Tera-Scale Graph Mining and Inference for Malware Detection

PATENT PENDING

SDM 2011, Mesa, Arizona

CMU SCS

Polonium: The Data60+ terabytes of data anonymously contributed by participants of worldwide Norton Community Watch program

50+ million machines900+ million executable files

Constructed a machine-file bipartite graph (0.2 TB+)

1 billion nodes (machines and files)37 billion edges

IBM-PBGH June 2013 14C. Faloutsos (CMU)

CMU SCS

Polonium: Key Ideas• Use “guilt-by-association” (i.e., homophily)

– E.g., files that appear on machines with many bad files are more likely to be bad

• Scalability: handles 37 billion-edge graph

IBM-PBGH June 2013 15C. Faloutsos (CMU)

CMU SCS

Polonium: One-Interaction Results

84.9% True Positive Rate1% False Positive Rate

True Positive Rate% of malware

correctly identified

False Positive Rate% of non-malware wrongly labeled as malware16

Ideal

IBM-PBGH June 2013 C. Faloutsos (CMU)

CMU SCS

C. Faloutsos (CMU) 17

Roadmap

• Graph problems:– G1: Fraud detection – BP

• Ebay• Symantec• Unification

– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Conclusions

IBM-PBGH June 2013

CMU SCS

Unifying Guilt-by-Association Approaches:

Theorems and Fast Algorithms

Danai Koutra

U Kang

Hsing-Kuo Kenneth Pao

Tai-You Ke

Duen Horng (Polo) Chau

Christos Faloutsos

ECML PKDD, 5-9 September 2011, Athens, Greece

CMU SCS

Problem Definition:GBA techniques

C. Faloutsos (CMU) 19

Given: Graph; & few labeled nodesFind: labels of rest(assuming network effects)

?

?

?

?

IBM-PBGH June 2013

CMU SCS

Homophily and Heterophily

C. Faloutsos (CMU) 20

Step 1

Step 2

homophily heterophily

All methods handle

homophily

NOT all methods handle

heterophily

BUT

proposed method

does!

IBM-PBGH June 2013

CMU SCS

Are they related?• RWR (Random Walk with Restarts)

– google’s pageRank (‘if my friends are important, I’m important, too’)

• SSL (Semi-supervised learning) – minimize the differences among neighbors

• BP (Belief propagation) – send messages to neighbors, on what you

believe about them

IBM-PBGH June 2013 C. Faloutsos (CMU) 21

CMU SCS

Are they related?• RWR (Random Walk with Restarts)

– google’s pageRank (‘if my friends are important, I’m important, too’)

• SSL (Semi-supervised learning) – minimize the differences among neighbors

• BP (Belief propagation) – send messages to neighbors, on what you

believe about them

IBM-PBGH June 2013 C. Faloutsos (CMU) 22

YES!

CMU SCS

C. Faloutsos (CMU) 23

Background 1: Belief Propagation Equations

mij (x j ) = φi (xi ) ⋅ψ ij (xi , x j ) ⋅ mni (xi )n∈N (i)\ j

∏xi

bi (xi ) = η ⋅φi (xi ) ⋅ mij (xi )j∈N (i)

[Pearl ‘82][Yedidia+ ‘02]…[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10]

IBM-PBGH June 2013

CMU SCS

Correspondence of Methods

C. Faloutsos (CMU) 24

Method Matrix Unknown knownRWR [I – c AD-1] × x = (1-c)y

SSL [I + a(D - A)] × x = y

FABP [I + a D - c’A] × bh = φh

0 1 01 0 10 1 0

? 0 1 1

d1

d2 d3

final labels/ beliefs

prior labels/ beliefs

adjacency matrix

IBM-PBGH June 2013

CMU SCS

Correspondence of Methods

C. Faloutsos (CMU) 25

Method Matrix Unknown knownRWR [I – c AD-1] × x = (1-c)y

SSL [I + a(D - A)] × x = y

FABP [I + a D - c’A] × bh = φh

0 1 01 0 10 1 0

? 0 1 1

d1

d2 d3

final labels/ beliefs

prior labels/ beliefs

adjacency matrix

IBM-PBGH June 2013

We know when it converges!

CMU SCS

Results: Scalability

C. Faloutsos (CMU) 26

FABP is linear on the number of edges.

# of edges (Kronecker graphs)

run

tim

e (m

in)

IBM-PBGH June 2013

CMU SCS

Results: Parallelism

C. Faloutsos (CMU) 27

FABP ~2x faster & wins/ties on accuracy.

runtime (min)

% a

ccu

racy

IBM-PBGH June 2013

CMU SCS

C. Faloutsos (CMU) 28

Conclusions for BP

• ‘NetProbe’, ‘Polonium’, and belief propagation: exploit network effects.

• FaBP: fast & accurate (and -> convergence conditions)

IBM-PBGH June 2013

CMU SCS

C. Faloutsos (CMU) 29

Roadmap

• Graph problems:– G1: Fraud detection – BP

• Ebay• Symantec• Unification

– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Conclusions

IBM-PBGH June 2013

CMU SCS

EigenSpokes

B. Aditya Prakash, Mukund Seshadri, Ashwin Sridharan, Sridhar Machiraju and Christos Faloutsos: EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs, PAKDD 2010, Hyderabad, India, 21-24 June 2010.

C. Faloutsos (CMU) 30IBM-PBGH June 2013

CMU SCS

EigenSpokes• Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph)

31C. Faloutsos (CMU)IBM-PBGH June 2013

CMU SCS

EigenSpokes• Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph)

32C. Faloutsos (CMU)IBM-PBGH June 2013

N

N

details

CMU SCS

EigenSpokes• Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph)

33C. Faloutsos (CMU)IBM-PBGH June 2013

N

N

details

CMU SCS

EigenSpokes• Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph)

34C. Faloutsos (CMU)IBM-PBGH June 2013

N

N

details

CMU SCS

EigenSpokes• Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph)

35C. Faloutsos (CMU)IBM-PBGH June 2013

N

N

details

CMU SCS

EigenSpokes• EE plot:• Scatter plot of

scores of u1 vs u2• One would expect

– Many points @ origin

– A few scattered ~randomly

C. Faloutsos (CMU) 36

u1

u2

IBM-PBGH June 2013

1st Principal component

2nd Principal component

CMU SCS

EigenSpokes• EE plot:• Scatter plot of

scores of u1 vs u2• One would expect

– Many points @ origin

– A few scattered ~randomly

C. Faloutsos (CMU) 37

u1

u290o

IBM-PBGH June 2013

CMU SCS

EigenSpokes - pervasiveness

•Present in mobile social graph across time and space

•Patent citation graph

38C. Faloutsos (CMU)IBM-PBGH June 2013

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

39C. Faloutsos (CMU)IBM-PBGH June 2013

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

40C. Faloutsos (CMU)IBM-PBGH June 2013

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

41C. Faloutsos (CMU)IBM-PBGH June 2013

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

42C. Faloutsos (CMU)IBM-PBGH June 2013

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

So what? Extract nodes with high

scores high connectivity Good “communities”

spy plot of top 20 nodes

43C. Faloutsos (CMU)IBM-PBGH June 2013

CMU SCS

Bipartite Communities!

magnified bipartite community

patents fromsame inventor(s)

`cut-and-paste’bibliography!

44C. Faloutsos (CMU)IBM-PBGH June 2013

CMU SCS

(maybe, botnets?)

Victim IPs?

Botnet members?

45C. Faloutsos (CMU)IBM-PBGH June 2013

Exploring itwith Dr. Eric Mao (III-Taiwan)

CMU SCS

C. Faloutsos (CMU) 46

Roadmap

• Graph problems:– G1: Fraud detection – BP– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Conclusions

IBM-PBGH June 2013

CMU SCS

GigaTensor: Scaling Tensor Analysis Up By 100 Times –

Algorithms and Discoveries

U Kang

ChristosFaloutsos

KDD’12

EvangelosPapalexakis

AbhayHarpale

IBM-PBGH June 2013 47C. Faloutsos (CMU)

CMU SCS

Background: Tensors

• Tensors (=multi-dimensional arrays) are everywhere– Hyperlinks &anchor text [Kolda+,05]

URL 1

URL 2

Anchor Text

Java

C++

C#

11

1

1

1

1 1

IBM-PBGH June 2013 48C. Faloutsos (CMU)

java

CMU SCS

Background: Tensors

• Tensors (=multi-dimensional arrays) are everywhere– Sensor stream (time, location, type)– Predicates (subject, verb, object) in knowledge base

“Barack Obama is president of U.S.”

“Eric Clapton playsguitar”

(26M)

(26M)

(48M)

NELL (Never Ending Language Learner) data

Nonzeros =144M

IBM-PBGH June 2013 49C. Faloutsos (CMU)

CMU SCS

Background: Tensors

• Tensors (=multi-dimensional arrays) are everywhere– Sensor stream (time, location, type)– Predicates (subject, verb, object) in knowledge base

IBM-PBGH June 2013 50C. Faloutsos (CMU)IP-destination

IP-source

Time-stamp Anomaly Detection inComputernetworks

CMU SCS

Problem Definition

• How to decompose a billion-scale tensor?– Corresponds to SVD in 2D case

IBM-PBGH June 2013 51C. Faloutsos (CMU)

CMU SCS

Problem Definition

• How to decompose a billion-scale tensor?– Corresponds to SVD in 2D case

IBM-PBGH June 2013 52C. Faloutsos (CMU)

‘Politicians’ ‘Artists’

CMU SCS

Problem Definition

Q1: Dominant concepts/topics? Q2: Find synonyms to a given noun phrase? (and how to scale up: |data| > RAM)

(26M)

(26M)

(48M)

NELL (Never Ending Language Learner) data

Nonzeros =144M

IBM-PBGH June 2013 53C. Faloutsos (CMU)

CMU SCS

Experiments

• GigaTensor solves 100x larger problem

Number of nonzero= I / 50

(J)

(I)

(K)

GigaTensor

Tensor

Toolbox Out ofMemory

100x

IBM-PBGH June 2013 54C. Faloutsos (CMU)

CMU SCS

A1: Concept Discovery

• Concept Discovery in Knowledge Base

IBM-PBGH June 2013 55C. Faloutsos (CMU)

CMU SCS

A1: Concept Discovery

IBM-PBGH June 2013 56C. Faloutsos (CMU)

CMU SCS

A2: Synonym Discovery

IBM-PBGH June 2013 57C. Faloutsos (CMU)

CMU SCS

C. Faloutsos (CMU) 58

Roadmap

• Graph problems:– G1: Fraud detection – BP– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Conclusions

IBM-PBGH June 2013

CMU SCS

Rise and Fall Patterns of Information Diffusion:Model and Implications

Yasuko Matsubara (Kyoto University),

Yasushi Sakurai (NTT), B. Aditya Prakash (CMU),

Lei Li (UCB), Christos Faloutsos (CMU)

KDD’12, Beijing China

CMU SCS

C. Faloutsos (CMU)

• Meme (# of mentions in blogs)– short phrases Sourced from U.S. politics in 2008

60

“you can put lipstick on a pig”

“yes we can”

Rise and fall patterns in social media

IBM-PBGH June 2013

CMU SCS

C. Faloutsos (CMU)

Rise and fall patterns in social media

61

• four classes on YouTube [Crane et al. ’08]• six classes on Meme [Yang et al. ’11]

IBM-PBGH June 2013

CMU SCS

C. Faloutsos (CMU)

Rise and fall patterns in social media

62

• Can we find a unifying model, which includes these patterns?

• four classes on YouTube [Crane et al. ’08]• six classes on Meme [Yang et al. ’11]

IBM-PBGH June 2013

CMU SCS

C. Faloutsos (CMU)

Rise and fall patterns in social media

63

• Answer: YES!

• We can represent all patterns by single model

IBM-PBGH June 2013

CMU SCS

C. Faloutsos (CMU) 64

Main idea - SpikeM- 1. Un-informed bloggers (uninformed about rumor)

- 2. External shock at time nb (e.g, breaking news)

- 3. Infection (word-of-mouth)

Time n=0 Time n=nb

β

IBM-PBGH June 2013

Infectiveness of a blog-post at age n:

- Strength of infection (quality of news)

- Decay function

Time n=nb+1

CMU SCS

C. Faloutsos (CMU) 65

- 1. Un-informed bloggers (uninformed about rumor)

- 2. External shock at time nb (e.g, breaking news)

- 3. Infection (word-of-mouth)

Time n=0 Time n=nb

β

IBM-PBGH June 2013

Infectiveness of a blog-post at age n:

- Strength of infection (quality of news)

- Decay function

Time n=nb+1

Main idea - SpikeM

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 66

-1.5 slope

J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005) . [PDF]

Response time (log)

Prob(RT > x)(log) -1.5

CMU SCS

C. Faloutsos (CMU)

SpikeM - with periodicity• Full equation of SpikeM

67

Periodicity

noonPeak 3am

Dip

Time n

Bloggers change their activity over time

(e.g., daily, weekly, yearly)

activity

Details

IBM-PBGH June 2013

CMU SCS

C. Faloutsos (CMU)

Details• Analysis – exponential rise and power-raw fall

68

Lin-log

Log-log

Rise-part

SI -> exponential SpikeM -> exponential

IBM-PBGH June 2013

CMU SCS

C. Faloutsos (CMU)

Details• Analysis – exponential rise and power-raw fall

69

Lin-log

Log-log

Fall-part

SI -> exponential SpikeM -> power law

IBM-PBGH June 2013

CMU SCS

C. Faloutsos (CMU)

Tail-part forecasts

70

• SpikeM can capture tail part

IBM-PBGH June 2013

CMU SCS

C. Faloutsos (CMU)

“What-if” forecasting

71

e.g., given (1) first spike,

(2) release date of two sequel movies

(3) access volume before the release date

?

(1) First spike

(2) Release date

(3) Two weeks before release

IBM-PBGH June 2013

?

CMU SCS

C. Faloutsos (CMU)

“What-if” forecasting

72SpikeM can forecast upcoming spikes

(1) First spike

(2) Release date

(3) Two weeks before release

IBM-PBGH June 2013

CMU SCS

Conclusions for spikes• Exp rise; PL decay• ‘spikeM’ captures all patterns, with a few

parms– And can do extrapolation– And forecasting

IBM-PBGH June 2013 C. Faloutsos (CMU) 73

CMU SCS

C. Faloutsos (CMU) 74

Roadmap

• Graph problems:– G1: Fraud detection – BP– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Future research• Conclusions

IBM-PBGH June 2013

CMU SCS

Challenge#1: Time evolving networks / tensors

• Periodicities? Burstiness?• What is ‘typical’ behavior of a node, over time• Heterogeneous graphs (= nodes w/ attributes)

IBM-PBGH June 2013 C. Faloutsos (CMU) 75

CMU SCS

Challenge #2: ‘Connectome’ – brain wiring

IBM-PBGH June 2013 C. Faloutsos (CMU) 76

• Which neurons get activated by ‘bee’• How wiring evolves• Modeling epilepsy

N. Sidiropoulos

George Karypis

V. Papalexakis

Tom Mitchell

CMU SCS

C. Faloutsos (CMU) 77

Thanks

IBM-PBGH June 2013

Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab

CMU SCS

C. Faloutsos (CMU) 78

Project info: PEGASUS

IBM-PBGH June 2013

www.cs.cmu.edu/~pegasusResults on large graphs: with Pegasus +

hadoop + M45

Apache license

Code, papers, manual, video

Prof. U Kang Prof. Polo Chau

CMU SCS

C. Faloutsos (CMU) 79

Cast

Akoglu, Leman

Chau, Polo

Kang, U

McGlohon, Mary

Tong, Hanghang

Prakash,Aditya

IBM-PBGH June 2013

Koutra,Danai

Beutel,Alex

Papalexakis,Vagelis

CMU SCS

C. Faloutsos (CMU) 80

References

• Deepayan Chakrabarti, Christos Faloutsos: Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38(1): (2006)

IBM-PBGH June 2013

CMU SCS

C. Faloutsos (CMU) 81

References• Christos Faloutsos, Tamara G. Kolda, Jimeng Sun:

Mining large graphs and streams using matrix and tensor tools. Tutorial, SIGMOD Conference 2007: 1174

IBM-PBGH June 2013

CMU SCS

References• Yasuko Matsubara, Yasushi Sakurai, B. Aditya

Prakash, Lei Li, Christos Faloutsos, "Rise and Fall Patterns of Information Diffusion: Model and Implications", KDD’12, pp. 6-14, Beijing, China, August 2012

IBM-PBGH June 2013 C. Faloutsos (CMU) 82

CMU SCS

References• Jimeng Sun, Dacheng Tao, Christos

Faloutsos: Beyond streams and graphs: dynamic tensor analysis. KDD 2006: 374-383

IBM-PBGH June 2013 C. Faloutsos (CMU) 83

CMU SCS

Overall Conclusions• G1: fraud detection

– BP: powerful method– FaBP: faster; equally accurate; known

convergence

• G2: botnets -> Eigenspokes• G3: Subject-Verb-Object ->

Tensors/GigaTensor• Spikes: ‘spikeM’ (exp rise; PL drop)

IBM-PBGH June 2013 C. Faloutsos (CMU) 84

top related