big data analytics - universität hildesheimbig data analytics outline 1. graphlab application...

20
Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany GraphLab On Practice Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany GraphLab On Practice 1 / 15

Upload: others

Post on 16-Oct-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics

Big Data Analytics

Lucas Rego Drumond

Information Systems and Machine Learning Lab (ISMLL)Institute of Computer Science

University of Hildesheim, Germany

GraphLab On Practice

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 1 / 15

Page 2: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics

Outline

1. GraphLab Application Deployment

2. Relational Classification Example

3. Factorization Models on GraphLab

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 1 / 15

Page 3: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 1. GraphLab Application Deployment

Outline

1. GraphLab Application Deployment

2. Relational Classification Example

3. Factorization Models on GraphLab

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 1 / 15

Page 4: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 1. GraphLab Application Deployment

Steps

1. Install GraphLab into a specific directory

I Example: cd /home/user/

I git clone https://github.com/dato-code/PowerGraph.git

2. Create a directory for your application under/home/user/graphlab/apps

3. Create a CMakeLists.txt file into your application directory

4. Add the source files for your program under your application directory

5. Run ./configure under /home/user/graphlab

6. Go to /home/user/graphlab/release/apps/your application

and type make

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 1 / 15

Page 5: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 1. GraphLab Application Deployment

CMakeLists.txt

p r o j e c t (MyProjectName )

a dd g r a p h l a b e x e c u t a b l e ( execu tab l e name imp l ementa t i on . cpp )

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 2 / 15

Page 6: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 1. GraphLab Application Deployment

Hello World

#i n c l u d e <g raph l ab . hpp>

i n t main ( i n t argc , char ∗∗ argv ) {

graphlab : : mpi_tools : : init ( argc , argv ) ;graphlab : : distributed_control dc ;

dc . cout ( ) << ” He l l o World !\ n” ;

graphlab : : mpi_tools : : finalize ( ) ;

}

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 3 / 15

Page 7: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 2. Relational Classification Example

Outline

1. GraphLab Application Deployment

2. Relational Classification Example

3. Factorization Models on GraphLab

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 4 / 15

Page 8: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 2. Relational Classification Example

Relational Classification

v1

y(v1) : 1

v2

y(v2) : 3

v3

y(v3) :?

v4

y(v4) : 2

1

1 1 1

1

Given a graph G := (V ,E ) and a set oflabels L

I Some nodes have labels y : V → L

I Edges v , u have weights wv ,u

I Task: estimate a function y : V → L

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 4 / 15

Page 9: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 2. Relational Classification Example

Weighted voted Relational Neighbor

v1

y(v1) : 1

v2

y(v2) : 3

v3

y(v3) :?

v4

y(v4) : 2

1

1 1 1

1

Probability that a vertex v ∈ V has labelc ∈ L

P(c |v) =1

Zv

∑u∈{u|u∈Nv∧y(u)=c}

w(u,v)

Where:

Zv =∑u∈Nv

w(u,v)

I Nv denotes the neighbors of v

y(v) := arg maxc∈L

P(c |v)

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 5 / 15

Page 10: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 2. Relational Classification Example

wvRN Vertex Program:

1: procedurewvRNGatherinput: vertex v , scopeSv , ingoing edge (u → v)

2: return (w(u,v), y(u))

3: end procedure

1: procedure wvRNApplyinput: vertex v , scope Sv , gatherresult(Zv ,

(∑{u|u∈Nv∧y(u)=c} wu,v

)c∈L

),

2: y(v) :=

arg maxc∈L

(∑{u|u∈Nv∧y(u)=c} wu,v

)3: end procedure

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 6 / 15

Page 11: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 2. Relational Classification Example

wvRN Code

I Code and toy data:I http://www.ismll.uni-hildesheim.de/lehre/bd-14s/script/

gl_ex/wvRN_example.zip

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 7 / 15

Page 12: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 3. Factorization Models on GraphLab

Outline

1. GraphLab Application Deployment

2. Relational Classification Example

3. Factorization Models on GraphLab

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 8 / 15

Page 13: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 3. Factorization Models on GraphLab

Factorization modelsI Each item i ∈ I is associated with a latent feature vector qi ∈ Rk

I Each user u ∈ U is associated with a latent feature vector pu ∈ Rk

I Each entry in the original matrix can be estimated by

r(u, i) = p>u qi =k∑

f =1

pu,f qi ,f

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 8 / 15

Page 14: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 3. Factorization Models on GraphLab

Example

Titanic (t) Matrix (m) The Godfather (g) Once (o)

Alice (a) 4 2 5Bob (b) 4 3John (j) 4 3

a≈b xx

RR QQTTPP

TT

AliceAlice

BobBob

JohnJohn

4

4

4

2

3

5

3

MM GG OO

AliceAlice

BobBob

JohnJohn

TT MM GG OO

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 9 / 15

Page 15: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 3. Factorization Models on GraphLab

Learning a factorization model - Objective Function

Task:arg min

P,Q

∑(u,i ,rui )∈Dtrain

(rui − r(u, i))2 + λ(||P||2 + ||Q||2)

Where:

I r(u, i) := p>u qiI Dtrain is the training data

I λ is a regularization constant

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 10 / 15

Page 16: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 3. Factorization Models on GraphLab

Stochastic Gradient Descent Algorithm

1: procedure LearnLatentFactorsinput: DTrain, λ, α

2: (pu)u∈U ∼ N(0, σI)3: (qi )i∈I ∼ N(0, σI)4: repeat5: for (u, i , ru,i ) ∈ DTrain do . In a random order6: pu ← pu − α (−2(ru,i − r(u, i))qi + 2λpu)7: qi ← qi − α (−2(ru,i − r(u, i))pu + 2λqi )8: end for9: until convergence

10: return P,Q11: end procedure

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 11 / 15

Page 17: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 3. Factorization Models on GraphLab

Recommender System Graph

l

n

o

c

prnp = 4

rlp = 2

roc = 5

I Nodes:I Users UI Items I

I EdgesI Ratings rui

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 12 / 15

Page 18: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 3. Factorization Models on GraphLab

Factorization Models on GraphLab

l

n

o

c

prnp = 4

rlp = 2

roc = 5

I Node data:I user node: puI item node: qi

I Edge data:I Rating rui

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 13 / 15

Page 19: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 3. Factorization Models on GraphLab

Factorization Models on Graphlab

l

n

o

c

prnp = 4

rlp = 2

roc = 5

User Nodes:I Gather

I Compute the error andaccumulate the updateon each item

I ApplyI Update latent feature

vectors and computeupdates for eachneighboring item

I ScatterI Send message to items

with updates andaccumulated error

I Signal neighboringitems to execute

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 14 / 15

Page 20: Big Data Analytics - Universität HildesheimBig Data Analytics Outline 1. GraphLab Application Deployment 2. Relational Classi cation Example 3. Factorization Models on GraphLab Lucas

Big Data Analytics 3. Factorization Models on GraphLab

Factorization Models on Graphlab

l

n

o

c

prnp = 4

rlp = 2

roc = 5

Item Nodes:I Gather

I Gather messages fromusers

I ApplyI Update latent feature

vectors

I ScatterI Signal neighboring

users to execute

Lucas Rego Drumond, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

GraphLab On Practice 15 / 15