cmu scs big (graph) data analytics christos faloutsos cmu

41
CMU SCS Big (graph) data analytics Christos Faloutsos CMU

Upload: bethany-gibbs

Post on 13-Jan-2016

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

Big (graph) data analytics

Christos Faloutsos

CMU

Page 2: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

CMU SCS IC '14 C. Faloutsos 2

CONGRATULATIONS!

Page 3: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

CMU SCS IC '14 C. Faloutsos 3

Outline

• Q+A

• Problem definition / Motivation

• Graphs, tensors and brains

• Anomaly detection

• Conclusions

Page 4: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

CMU SCS IC '14 C. Faloutsos 4

Q+A

• Are you recruiting? How many?

• How many do you have?

• How frequently you meet them?

• What is your advising style?

• How do you feel about summer internships?

Page 5: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

CMU SCS IC '14 C. Faloutsos 5

Q+A

• Are you recruiting? How many?

• How many do you have?

• How frequently you meet them?

• What is your advising style?

• How do you feel about summer internships?

• 1 or 2

• 6 (+5pdocs)

• 1/week

• results

• Yes/Maybe (FB, MSR, IBM, ++)

Page 6: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

CMU SCS IC '14 C. Faloutsos 6

Outline

• Q+A

• Problem definition / Motivation

• Graphs, tensors and brains

• Anomaly detection

• Conclusions

Page 7: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

CMU SCS IC '14 C. Faloutsos 7

Motivation

• Data mining: ~ find patterns (rules, outliers)

• How do real graphs look like? Anomalies?

• Time series / Monitoring

Measles @ PA, NY, …

Page 8: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

CMU SCS IC '14 C. Faloutsos 8

Graphs - why should we care?

Page 9: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

C. Faloutsos 9

Graphs - why should we care?

Internet Map [lumeta.com]

Food Web [Martinez ’91]

~1B users

$10-$100B revenue

CMU SCS IC '14

Page 10: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

CMU SCS IC '14 C. Faloutsos 10

Outline

• Q+A

• Problem definition / Motivation

• Graphs, tensors and brains

• Anomaly detection

• Conclusions

Page 11: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

NELL & concepts (=groups)• Predicates (subject, verb, object) in knowledge

base

“Barack Obama is the president of

U.S.”

“Eric Clapton playsguitar”

(26M)

(26M)

(48M)

NELL (Never Ending Language Learner) data

Nonzeros =144M

CMU SCS IC '14 C. Faloutsos

Tom MitchellCMU/CS-MLD

11

Vagelis PapalexakisCMU-CS

Page 12: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

Answer : tensor factorization

• Recall: (SVD) matrix factorization: finds blocks

CMU SCS IC '14 C. Faloutsos 12

N users

Mproducts

‘meat-eaters’‘steaks’

‘vegetarians’‘plants’

‘kids’‘cookies’

~ + +

Page 13: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

• PARAFAC decomposition

CMU SCS IC '14 C. Faloutsos 13

= + +subject

object

verb

politicians artists athletes

Answer : tensor factorization

Page 14: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

• PARAFAC decomposition

• Results for who-calls-whom-when– 4M x 15 days

CMU SCS IC '14 C. Faloutsos 14

= + +caller

callee

time

?? ?? ??

Answer : tensor factorization

Page 15: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

Concept Discovery• Concept Discovery in Knowledge Base

CMU SCS IC '14 C. Faloutsos 15

Page 16: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

Concept Discovery• Concept Discovery in Knowledge Base

CMU SCS IC '14 C. Faloutsos 16

NP1: Internet, file, dataNP2: Protocol, software, suite

Page 17: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

Neuro-semantics• Brain Scan Data*

• 9 persons• 60 nouns

• Questions• 218 questions• ‘is it alive?’,

‘can you eat it?’

CMU SCS IC '14 17C. Faloutsos

*Mitchell et al. Predicting human brain activity associated with the meanings of nouns. Science,2008. Data@ www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html

Page 18: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

Neuro-semantics• Brain Scan Data*

• 9 persons• 60 nouns

• Questions• 218 questions• ‘is it alive?’,

‘can you eat it?’

CMU SCS IC '14 18C. Faloutsos

Patterns?

Page 19: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

Neuro-semantics• Brain Scan Data*

• 9 persons• 60 nouns

• Questions• 218 questions• ‘is it alive?’,

‘can you eat it?’

airplane

dog

perso

ns

noun

s

questions

voxelsCMU SCS IC '14 19C. Faloutsos

Patterns?

Page 20: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

Neuro-semantics

20CMU SCS IC '14 C. Faloutsos

=

Page 21: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

Neuro-semantics

21CMU SCS IC '14 C. Faloutsos

Small items ->Premotor cortex

=

Page 22: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

Neuro-semantics

22CMU SCS IC '14 C. Faloutsos

Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy, Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x, SDM 2014

Small items ->Premotor cortex

Page 23: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

CMU SCS IC '14 C. Faloutsos 23

Scalability• Google: > 450,000 processors in clusters of ~2000

processors each [Barroso+, “Web Search for a Planet: The

Google Cluster Architecture” IEEE Micro 2003]• Yahoo: 5Pb of data [Fayyad, KDD’07]• Google-NY, Aug’14: ‘graph with 1T edges, 300B

nodes’• Problem: machine failures, on a daily basis• How to parallelize data mining tasks, then?• A: map/reduce – hadoop (open-source clone) http://hadoop.apache.org/

Page 24: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

CMU SCS IC '14 C. Faloutsos 24

Outline

• Q+A

• Problem definition / Motivation

• Graphs, tensors and brains

• Anomaly/fraud detection

• Conclusions

Page 25: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

App-store fraud

Opinion Fraud Detection in Online Reviews using Network Effects

Leman Akoglu, Rishi Chandy, CF

ICWSM’13

CMU SCS IC '14 C. Faloutsos 25

(NSF grant, with Alex Beutel)

Page 26: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

Problem• Given

– user-product review network– review sign (+/-)

• Classify– objects into type-specific classes: users: `honest’ / `fraudster’ products: `good’ / `bad’ reviews: `genuine’ / `fake’

No side data! (e.g., timestamp, review text)

CMU SCS IC '14 C. Faloutsos 26

Page 27: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

Formulation: BPUser Producthonest bad

honest good

CMU SCS IC '14 C. Faloutsos 27

–+

Before

After

Page 28: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

Top scorers

CMU SCS IC '14 C. Faloutsos 28

+ positive (4-5) ratingo negative (1-2) rating

Users

Products

Page 29: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

Top scorers

CMU SCS IC '14 C. Faloutsos 29

+ positive (4-5) ratingo negative (1-2) rating

Users

Products

Page 30: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

‘Fraud-bot’ member reviews

CMU SCS IC '14 C. Faloutsos 30

Same developer! Duplicated text! Same day activity!

Page 31: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

CMU SCS IC '14 C. Faloutsos 31

Outline

• Q+A

• Problem definition / Motivation

• Graphs, tensors and brains

• Anomaly/fraud detection

• Time series, monitoring / forecasting

• Conclusions

Page 32: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

‘Tycho’ – epidemics analysis

CMU SCS IC '14 32C. Faloutsos

Yasuko Matsubara

50 states x46 diseases

Page 33: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

‘Tycho’ – epidemics analysis

CMU SCS IC '14 33C. Faloutsos

Prof. Yasuko Matsubara

Page 34: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

‘Tycho’ – epidemics analysis

CMU SCS IC '14 34C. Faloutsos

Prof. Yasuko MatsubaraFlu?Measles?August?No periodicity?

Page 35: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

‘Tycho’ – epidemics analysis

CMU SCS IC '14 35C. Faloutsos

Prof. Yasuko MatsubaraFlu?Measles?August?No periodicity?

Page 36: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

‘Tycho’ – epidemics analysis

CMU SCS IC '14 36C. Faloutsos

Prof. Yasuko MatsubaraFlu?Measles?August?No periodicity?

Page 37: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

‘Tycho’ – epidemics analysis

CMU SCS IC '14 37C. Faloutsos

Prof. Yasuko MatsubaraFlu?Measles?August?No periodicity?

Page 38: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

‘Tycho’ – epidemics analysis

CMU SCS IC '14 38C. Faloutsos

Prof. Yasuko MatsubaraFlu?Measles?August?No periodicity?

Page 39: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

‘Tycho’ – epidemics analysis

CMU SCS IC '14 39C. Faloutsos

Prof. Yasuko Matsubara

https://www.tycho.pitt.edu/resources.phpfrom U. Pitt (epidemiology dept.)

Yasuko Matsubara, Yasushi Sakurai, Willem van Panhuis, and Christos Faloutsos, FUNNEL: Automatic Mining of Spatially Coevolving Epidemics, KDD 2014, New York City, NY, USA, Aug. 24-27, 2014.

Page 40: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

Open research questions

• Patterns/anomalies for time-evolving graphs (Call graph, 3M people x 6mo)

• Spot fraudsters in soc-net (eg., Twitter ‘$10 -> 1000 followers’)

• How is the human brain wired

CMU SCS IC '14 C. Faloutsos 40

Page 41: CMU SCS Big (graph) data analytics Christos Faloutsos CMU

CMU SCS

CMU SCS IC '14 C. Faloutsos 41

Contact info

• www.cs.cmu.edu/~christos• GHC 8019• Ph#: x8.1457• www.cs.cmu.edu/~christos/TALKS/14-09-ic/

• FYI: Course: 15-826, Tu-Th 3:00-4:20

• and, again WELCOME!