identifying on-line fraudsters: anomaly detection using network effects

84
CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

Upload: heller

Post on 10-Jan-2016

41 views

Category:

Documents


3 download

DESCRIPTION

Identifying on-line Fraudsters: Anomaly Detection Using Network Effects. Christos Faloutsos CMU. Thanks. Saman Haqqi. Roadmap. Graph problems: G1: Fraud detection – BP G2: Botnet detection – spectral G3: Beyond graphs: tensors and ``NELL’’ Influence propagation and spike modeling - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Identifying on-line Fraudsters: Anomaly Detection Using

Network Effects

Christos Faloutsos

CMU

Page 2: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Thanks

• Saman Haqqi

IBM-PBGH June 2013 C. Faloutsos (CMU) 2

Page 3: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 3

Roadmap

• Graph problems:– G1: Fraud detection – BP– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling– C1: spikeM model

• Conclusions

IBM-PBGH June 2013

Page 4: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 4

E-bay Fraud detection

w/ Polo Chau &Shashank Pandit, CMU[www’07]

Page 5: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 5

E-bay Fraud detection

Page 6: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 6

E-bay Fraud detection

Page 7: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 7

E-bay Fraud detection - NetProbe

Page 8: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 8

E-bay Fraud detection - NetProbe

F A H

F 99%

A 99%

H 49% 49%

Compatibilitymatrix

heterophily

details

Page 9: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 9

Background 1: Belief Propagation Equations

mij (x j ) = φi (xi ) ⋅ψ ij (xi , x j ) ⋅ mni (xi )n∈N (i)\ j

∏xi

bi (xi ) = η ⋅φi (xi ) ⋅ mij (xi )j∈N (i)

[Pearl ‘82][Yedidia+ ‘02]…[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10]

IBM-PBGH June 2013

~bi (xi )

Page 10: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 10

Background 1: Belief Propagation Equations

mij (x j ) = φi (xi ) ⋅ψ ij (xi , x j ) ⋅ mni (xi )n∈N (i)\ j

∏xi

bi (xi ) = η ⋅φi (xi ) ⋅ mij (xi )j∈N (i)

[Pearl ‘82][Yedidia+ ‘02]…[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10]

IBM-PBGH June 2013

~bi (xi )

F A H

F 99%

A 99%

H 49% 49%

Page 11: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Popular press

And less desirable attention:• E-mail from ‘Belgium police’ (‘copy of

your code?’)

IBM-PBGH June 2013 C. Faloutsos (CMU) 11

Page 12: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 12

Roadmap

• Graph problems:– G1: Fraud detection – BP

• Ebay• Symantec• Unification

– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Conclusions

IBM-PBGH June 2013

Page 13: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Polo ChauMachine Learning Dept

Carey NachenbergVice President & Fellow

Jeffrey WilhelmPrincipal Software Engineer

Adam WrightSoftware Engineer

Prof. Christos FaloutsosComputer Science Dept

Polonium: Tera-Scale Graph Mining and Inference for Malware Detection

PATENT PENDING

SDM 2011, Mesa, Arizona

Page 14: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Polonium: The Data60+ terabytes of data anonymously contributed by participants of worldwide Norton Community Watch program

50+ million machines900+ million executable files

Constructed a machine-file bipartite graph (0.2 TB+)

1 billion nodes (machines and files)37 billion edges

IBM-PBGH June 2013 14C. Faloutsos (CMU)

Page 15: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Polonium: Key Ideas• Use “guilt-by-association” (i.e., homophily)

– E.g., files that appear on machines with many bad files are more likely to be bad

• Scalability: handles 37 billion-edge graph

IBM-PBGH June 2013 15C. Faloutsos (CMU)

Page 16: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Polonium: One-Interaction Results

84.9% True Positive Rate1% False Positive Rate

True Positive Rate% of malware

correctly identified

False Positive Rate% of non-malware wrongly labeled as malware16

Ideal

IBM-PBGH June 2013 C. Faloutsos (CMU)

Page 17: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 17

Roadmap

• Graph problems:– G1: Fraud detection – BP

• Ebay• Symantec• Unification

– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Conclusions

IBM-PBGH June 2013

Page 18: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Unifying Guilt-by-Association Approaches:

Theorems and Fast Algorithms

Danai Koutra

U Kang

Hsing-Kuo Kenneth Pao

Tai-You Ke

Duen Horng (Polo) Chau

Christos Faloutsos

ECML PKDD, 5-9 September 2011, Athens, Greece

Page 19: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Problem Definition:GBA techniques

C. Faloutsos (CMU) 19

Given: Graph; & few labeled nodesFind: labels of rest(assuming network effects)

?

?

?

?

IBM-PBGH June 2013

Page 20: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Homophily and Heterophily

C. Faloutsos (CMU) 20

Step 1

Step 2

homophily heterophily

All methods handle

homophily

NOT all methods handle

heterophily

BUT

proposed method

does!

IBM-PBGH June 2013

Page 21: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Are they related?• RWR (Random Walk with Restarts)

– google’s pageRank (‘if my friends are important, I’m important, too’)

• SSL (Semi-supervised learning) – minimize the differences among neighbors

• BP (Belief propagation) – send messages to neighbors, on what you

believe about them

IBM-PBGH June 2013 C. Faloutsos (CMU) 21

Page 22: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Are they related?• RWR (Random Walk with Restarts)

– google’s pageRank (‘if my friends are important, I’m important, too’)

• SSL (Semi-supervised learning) – minimize the differences among neighbors

• BP (Belief propagation) – send messages to neighbors, on what you

believe about them

IBM-PBGH June 2013 C. Faloutsos (CMU) 22

YES!

Page 23: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 23

Background 1: Belief Propagation Equations

mij (x j ) = φi (xi ) ⋅ψ ij (xi , x j ) ⋅ mni (xi )n∈N (i)\ j

∏xi

bi (xi ) = η ⋅φi (xi ) ⋅ mij (xi )j∈N (i)

[Pearl ‘82][Yedidia+ ‘02]…[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10]

IBM-PBGH June 2013

Page 24: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Correspondence of Methods

C. Faloutsos (CMU) 24

Method Matrix Unknown knownRWR [I – c AD-1] × x = (1-c)y

SSL [I + a(D - A)] × x = y

FABP [I + a D - c’A] × bh = φh

0 1 01 0 10 1 0

? 0 1 1

d1

d2 d3

final labels/ beliefs

prior labels/ beliefs

adjacency matrix

IBM-PBGH June 2013

Page 25: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Correspondence of Methods

C. Faloutsos (CMU) 25

Method Matrix Unknown knownRWR [I – c AD-1] × x = (1-c)y

SSL [I + a(D - A)] × x = y

FABP [I + a D - c’A] × bh = φh

0 1 01 0 10 1 0

? 0 1 1

d1

d2 d3

final labels/ beliefs

prior labels/ beliefs

adjacency matrix

IBM-PBGH June 2013

We know when it converges!

Page 26: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Results: Scalability

C. Faloutsos (CMU) 26

FABP is linear on the number of edges.

# of edges (Kronecker graphs)

run

tim

e (m

in)

IBM-PBGH June 2013

Page 27: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Results: Parallelism

C. Faloutsos (CMU) 27

FABP ~2x faster & wins/ties on accuracy.

runtime (min)

% a

ccu

racy

IBM-PBGH June 2013

Page 28: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 28

Conclusions for BP

• ‘NetProbe’, ‘Polonium’, and belief propagation: exploit network effects.

• FaBP: fast & accurate (and -> convergence conditions)

IBM-PBGH June 2013

Page 29: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 29

Roadmap

• Graph problems:– G1: Fraud detection – BP

• Ebay• Symantec• Unification

– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Conclusions

IBM-PBGH June 2013

Page 30: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

EigenSpokes

B. Aditya Prakash, Mukund Seshadri, Ashwin Sridharan, Sridhar Machiraju and Christos Faloutsos: EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs, PAKDD 2010, Hyderabad, India, 21-24 June 2010.

C. Faloutsos (CMU) 30IBM-PBGH June 2013

Page 31: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

EigenSpokes• Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph)

31C. Faloutsos (CMU)IBM-PBGH June 2013

Page 32: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

EigenSpokes• Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph)

32C. Faloutsos (CMU)IBM-PBGH June 2013

N

N

details

Page 33: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

EigenSpokes• Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph)

33C. Faloutsos (CMU)IBM-PBGH June 2013

N

N

details

Page 34: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

EigenSpokes• Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph)

34C. Faloutsos (CMU)IBM-PBGH June 2013

N

N

details

Page 35: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

EigenSpokes• Eigenvectors of adjacency matrix

equivalent to singular vectors (symmetric, undirected graph)

35C. Faloutsos (CMU)IBM-PBGH June 2013

N

N

details

Page 36: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

EigenSpokes• EE plot:• Scatter plot of

scores of u1 vs u2• One would expect

– Many points @ origin

– A few scattered ~randomly

C. Faloutsos (CMU) 36

u1

u2

IBM-PBGH June 2013

1st Principal component

2nd Principal component

Page 37: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

EigenSpokes• EE plot:• Scatter plot of

scores of u1 vs u2• One would expect

– Many points @ origin

– A few scattered ~randomly

C. Faloutsos (CMU) 37

u1

u290o

IBM-PBGH June 2013

Page 38: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

EigenSpokes - pervasiveness

•Present in mobile social graph across time and space

•Patent citation graph

38C. Faloutsos (CMU)IBM-PBGH June 2013

Page 39: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

39C. Faloutsos (CMU)IBM-PBGH June 2013

Page 40: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

40C. Faloutsos (CMU)IBM-PBGH June 2013

Page 41: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

41C. Faloutsos (CMU)IBM-PBGH June 2013

Page 42: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

42C. Faloutsos (CMU)IBM-PBGH June 2013

Page 43: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

So what? Extract nodes with high

scores high connectivity Good “communities”

spy plot of top 20 nodes

43C. Faloutsos (CMU)IBM-PBGH June 2013

Page 44: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Bipartite Communities!

magnified bipartite community

patents fromsame inventor(s)

`cut-and-paste’bibliography!

44C. Faloutsos (CMU)IBM-PBGH June 2013

Page 45: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

(maybe, botnets?)

Victim IPs?

Botnet members?

45C. Faloutsos (CMU)IBM-PBGH June 2013

Exploring itwith Dr. Eric Mao (III-Taiwan)

Page 46: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 46

Roadmap

• Graph problems:– G1: Fraud detection – BP– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Conclusions

IBM-PBGH June 2013

Page 47: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

GigaTensor: Scaling Tensor Analysis Up By 100 Times –

Algorithms and Discoveries

U Kang

ChristosFaloutsos

KDD’12

EvangelosPapalexakis

AbhayHarpale

IBM-PBGH June 2013 47C. Faloutsos (CMU)

Page 48: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Background: Tensors

• Tensors (=multi-dimensional arrays) are everywhere– Hyperlinks &anchor text [Kolda+,05]

URL 1

URL 2

Anchor Text

Java

C++

C#

11

1

1

1

1 1

IBM-PBGH June 2013 48C. Faloutsos (CMU)

java

Page 49: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Background: Tensors

• Tensors (=multi-dimensional arrays) are everywhere– Sensor stream (time, location, type)– Predicates (subject, verb, object) in knowledge base

“Barack Obama is president of U.S.”

“Eric Clapton playsguitar”

(26M)

(26M)

(48M)

NELL (Never Ending Language Learner) data

Nonzeros =144M

IBM-PBGH June 2013 49C. Faloutsos (CMU)

Page 50: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Background: Tensors

• Tensors (=multi-dimensional arrays) are everywhere– Sensor stream (time, location, type)– Predicates (subject, verb, object) in knowledge base

IBM-PBGH June 2013 50C. Faloutsos (CMU)IP-destination

IP-source

Time-stamp Anomaly Detection inComputernetworks

Page 51: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Problem Definition

• How to decompose a billion-scale tensor?– Corresponds to SVD in 2D case

IBM-PBGH June 2013 51C. Faloutsos (CMU)

Page 52: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Problem Definition

• How to decompose a billion-scale tensor?– Corresponds to SVD in 2D case

IBM-PBGH June 2013 52C. Faloutsos (CMU)

‘Politicians’ ‘Artists’

Page 53: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Problem Definition

Q1: Dominant concepts/topics? Q2: Find synonyms to a given noun phrase? (and how to scale up: |data| > RAM)

(26M)

(26M)

(48M)

NELL (Never Ending Language Learner) data

Nonzeros =144M

IBM-PBGH June 2013 53C. Faloutsos (CMU)

Page 54: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Experiments

• GigaTensor solves 100x larger problem

Number of nonzero= I / 50

(J)

(I)

(K)

GigaTensor

Tensor

Toolbox Out ofMemory

100x

IBM-PBGH June 2013 54C. Faloutsos (CMU)

Page 55: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

A1: Concept Discovery

• Concept Discovery in Knowledge Base

IBM-PBGH June 2013 55C. Faloutsos (CMU)

Page 56: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

A1: Concept Discovery

IBM-PBGH June 2013 56C. Faloutsos (CMU)

Page 57: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

A2: Synonym Discovery

IBM-PBGH June 2013 57C. Faloutsos (CMU)

Page 58: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 58

Roadmap

• Graph problems:– G1: Fraud detection – BP– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Conclusions

IBM-PBGH June 2013

Page 59: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Rise and Fall Patterns of Information Diffusion:Model and Implications

Yasuko Matsubara (Kyoto University),

Yasushi Sakurai (NTT), B. Aditya Prakash (CMU),

Lei Li (UCB), Christos Faloutsos (CMU)

KDD’12, Beijing China

Page 60: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU)

• Meme (# of mentions in blogs)– short phrases Sourced from U.S. politics in 2008

60

“you can put lipstick on a pig”

“yes we can”

Rise and fall patterns in social media

IBM-PBGH June 2013

Page 61: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU)

Rise and fall patterns in social media

61

• four classes on YouTube [Crane et al. ’08]• six classes on Meme [Yang et al. ’11]

IBM-PBGH June 2013

Page 62: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU)

Rise and fall patterns in social media

62

• Can we find a unifying model, which includes these patterns?

• four classes on YouTube [Crane et al. ’08]• six classes on Meme [Yang et al. ’11]

IBM-PBGH June 2013

Page 63: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU)

Rise and fall patterns in social media

63

• Answer: YES!

• We can represent all patterns by single model

IBM-PBGH June 2013

Page 64: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 64

Main idea - SpikeM- 1. Un-informed bloggers (uninformed about rumor)

- 2. External shock at time nb (e.g, breaking news)

- 3. Infection (word-of-mouth)

Time n=0 Time n=nb

β

IBM-PBGH June 2013

Infectiveness of a blog-post at age n:

- Strength of infection (quality of news)

- Decay function

Time n=nb+1

Page 65: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 65

- 1. Un-informed bloggers (uninformed about rumor)

- 2. External shock at time nb (e.g, breaking news)

- 3. Infection (word-of-mouth)

Time n=0 Time n=nb

β

IBM-PBGH June 2013

Infectiveness of a blog-post at age n:

- Strength of infection (quality of news)

- Decay function

Time n=nb+1

Main idea - SpikeM

Page 66: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

IBM-PBGH June 2013 C. Faloutsos (CMU) 66

-1.5 slope

J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005) . [PDF]

Response time (log)

Prob(RT > x)(log) -1.5

Page 67: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU)

SpikeM - with periodicity• Full equation of SpikeM

67

Periodicity

noonPeak 3am

Dip

Time n

Bloggers change their activity over time

(e.g., daily, weekly, yearly)

activity

Details

IBM-PBGH June 2013

Page 68: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU)

Details• Analysis – exponential rise and power-raw fall

68

Lin-log

Log-log

Rise-part

SI -> exponential SpikeM -> exponential

IBM-PBGH June 2013

Page 69: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU)

Details• Analysis – exponential rise and power-raw fall

69

Lin-log

Log-log

Fall-part

SI -> exponential SpikeM -> power law

IBM-PBGH June 2013

Page 70: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU)

Tail-part forecasts

70

• SpikeM can capture tail part

IBM-PBGH June 2013

Page 71: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU)

“What-if” forecasting

71

e.g., given (1) first spike,

(2) release date of two sequel movies

(3) access volume before the release date

?

(1) First spike

(2) Release date

(3) Two weeks before release

IBM-PBGH June 2013

?

Page 72: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU)

“What-if” forecasting

72SpikeM can forecast upcoming spikes

(1) First spike

(2) Release date

(3) Two weeks before release

IBM-PBGH June 2013

Page 73: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Conclusions for spikes• Exp rise; PL decay• ‘spikeM’ captures all patterns, with a few

parms– And can do extrapolation– And forecasting

IBM-PBGH June 2013 C. Faloutsos (CMU) 73

Page 74: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 74

Roadmap

• Graph problems:– G1: Fraud detection – BP– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’

• Influence propagation and spike modeling• Future research• Conclusions

IBM-PBGH June 2013

Page 75: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Challenge#1: Time evolving networks / tensors

• Periodicities? Burstiness?• What is ‘typical’ behavior of a node, over time• Heterogeneous graphs (= nodes w/ attributes)

IBM-PBGH June 2013 C. Faloutsos (CMU) 75

Page 76: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Challenge #2: ‘Connectome’ – brain wiring

IBM-PBGH June 2013 C. Faloutsos (CMU) 76

• Which neurons get activated by ‘bee’• How wiring evolves• Modeling epilepsy

N. Sidiropoulos

George Karypis

V. Papalexakis

Tom Mitchell

Page 77: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 77

Thanks

IBM-PBGH June 2013

Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab

Page 78: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 78

Project info: PEGASUS

IBM-PBGH June 2013

www.cs.cmu.edu/~pegasusResults on large graphs: with Pegasus +

hadoop + M45

Apache license

Code, papers, manual, video

Prof. U Kang Prof. Polo Chau

Page 79: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 79

Cast

Akoglu, Leman

Chau, Polo

Kang, U

McGlohon, Mary

Tong, Hanghang

Prakash,Aditya

IBM-PBGH June 2013

Koutra,Danai

Beutel,Alex

Papalexakis,Vagelis

Page 80: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 80

References

• Deepayan Chakrabarti, Christos Faloutsos: Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38(1): (2006)

IBM-PBGH June 2013

Page 81: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

C. Faloutsos (CMU) 81

References• Christos Faloutsos, Tamara G. Kolda, Jimeng Sun:

Mining large graphs and streams using matrix and tensor tools. Tutorial, SIGMOD Conference 2007: 1174

IBM-PBGH June 2013

Page 82: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

References• Yasuko Matsubara, Yasushi Sakurai, B. Aditya

Prakash, Lei Li, Christos Faloutsos, "Rise and Fall Patterns of Information Diffusion: Model and Implications", KDD’12, pp. 6-14, Beijing, China, August 2012

IBM-PBGH June 2013 C. Faloutsos (CMU) 82

Page 83: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

References• Jimeng Sun, Dacheng Tao, Christos

Faloutsos: Beyond streams and graphs: dynamic tensor analysis. KDD 2006: 374-383

IBM-PBGH June 2013 C. Faloutsos (CMU) 83

Page 84: Identifying on-line Fraudsters: Anomaly Detection Using Network Effects

CMU SCS

Overall Conclusions• G1: fraud detection

– BP: powerful method– FaBP: faster; equally accurate; known

convergence

• G2: botnets -> Eigenspokes• G3: Subject-Verb-Object ->

Tensors/GigaTensor• Spikes: ‘spikeM’ (exp rise; PL drop)

IBM-PBGH June 2013 C. Faloutsos (CMU) 84