distributed machine learning and big dataacmsc/wbda2015/slides/sb/...machine learning and big data...

76
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 1 / 76

Upload: others

Post on 20-Sep-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Machine Learning and Big Data

Sourangshu Bhattacharya

Dept. of Computer Science and Engineering,IIT Kharagpur.

http://cse.iitkgp.ac.in/~sourangshu/

August 21, 2015

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 1 / 76

Page 2: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Outline

1 Machine Learning and Big DataSupport Vector MachinesStochastic Sub-gradient descent

2 Distributed OptimizationADMMConvergenceDistributed Loss MinimizationResultsDevelopment of ADMM

3 Applications and extensionsWeighted Parameter AveragingFully-distributed SVM

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 2 / 76

Page 3: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data

What is Big Data ?

6 Billion web queries per day.10 Billion display advertisements per day.30 Billion text ads per day.150 Million credit card transactions per day.100 Billion emails per day.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 3 / 76

Page 4: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data

Machine Learning on Big Data

Classification - Spam / No Spam - 100B emails.Multi-label classification - image tagging - 14M images 10K tags.Regression - CTR estimation - 10B ad views.Ranking - web search - 6B queries.Recommendation - online shopping - 1.7B views in the US.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 4 / 76

Page 5: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data

Classification example

Email spam classification.Features (ui)): Vector of counts of allwords.No. of Features (d): Words in vocabulary(∼ 100,000).No. of non-zero features: 100.No. of emails per day: 100 M.Size of training set using 30 days data: 6TB (assuming 20 B per data)Time taken to read the data once: 41.67hrs (at 20 MB per second)Solution: use multiple computers.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 5 / 76

Page 6: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data

Big Data Paradigm

3V’s - Volume, Variety, Velocity.Distributed system.Chance of failure:Computers 1 10 100Chance of a failure in an hour 0.01 0.09 0.63

Communication efficiency - Data locality.Many systems: Hadoop, Spark, Graphlab, etc.Goal: Implement Machine Learning algorithms on Big datasystems.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 6 / 76

Page 7: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data

Binary Classification Problem

A set of labeled datapoints (S) = (ui , vi), i = 1, . . . ,n, ui ∈ Rd

and vi ∈ +1,−1Linear Predictor function: v = sign(xT u)

Error function: E =∑n

i=1 1(vixT ui ≤ 0)

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 7 / 76

Page 8: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data

Logistic Regression

Probability of v is given by:

P(v |u,x) = σ(vxT u) =1

1 + e−vxT u

Learning problem is:Given dataset S, estimate x.Maximizing the regularized log likelihood:

x∗ = argminx

n∑i=1

log(1 + e−vi xT ui ) +λ

2xT x

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 8 / 76

Page 9: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data

Convex Function

f is a Convex function: f (tx1 + (1− t)x2) ≤ tf (x1) + (1− t)f (x2)

Objective function for logistic regression is convex.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 9 / 76

Page 10: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data

Convex Optimization

Convex optimization problem

minimizex f (x)

subject to: gi(x) ≤ 0, ∀i = 1, . . . , k

where:f , gi are convex functions.For convex optimization problems, local optima are also globaloptima.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 10 / 76

Page 11: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data

Optimization Algorithm: Gradient Descent

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 11 / 76

Page 12: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data Support Vector Machines

Classification Problem

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 12 / 76

Page 13: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data Support Vector Machines

SVM

Separating hyperplane: xT u = 0Parallel hyperplanes (developing margin): xT u = ±1Margin (perpendicular distance between parallel hyperplanes): 2

‖x‖

Correct classification of training datapoints: vixT ui ≥ 1, ∀iAllowing error (slack), ξi : vixT ui ≥ 1− ξi , ∀iMax-margin formulation:

minx,ξ

12‖x‖2 + C

n∑i=1

ξi

subject to: vixT ui ≥ 1− ξi , ξi ≥ 0 ∀i = 1, . . . ,n

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 13 / 76

Page 14: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data Support Vector Machines

SVM: dual

Lagrangian:

L =12

xT x + Cn∑

i=1

ξi +n∑

i=1

αi(1− ξi − vixT ui) +n∑

i=1

µiξi

Dual problem: (x∗, α∗, µ∗) = maxα,µ minx L(x, α, µ)

For strictly convex problem, primal and dual solutions are same(Strong duality).KKT conditions:

x =n∑

i=1

αiviui

C = αi + µi

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 14 / 76

Page 15: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data Support Vector Machines

SVM: dual

The dual problem:

maxα

n∑i=1

αi −12

n,n∑i=1,j=1

αiαjvivjuTi uj

subject to: 0 ≤ αi ≤ C, ∀i

The dual is a quadratic programming problem in n variables.Can be solved even if kernel function, k(ui ,uj) = uT

i uj are given.Dimension agnostic.Many efficient algorithms exist for solving it, e.g. SMO (Platt99).Worst case complexity is O(n3), usually O(n2).

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 15 / 76

Page 16: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data Support Vector Machines

SVM

A more compact form: minx∑n

i=1 max(0,1− vixT ui) + λ‖x‖22Or: minx

∑ni=1 l(x,ui , vi) + λΩ(x)

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 16 / 76

Page 17: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data Support Vector Machines

Multi-class classification

There are m classes. vi ∈ 1, . . . ,mMost popular scheme: vi = argmaxv∈1,...,mx

Tv ui

Given example (ui , vi), xTvi

ui ≥ xTj ui ∀j ∈ 1, . . . ,m

Using a margin of at least 1, lossl(ui , vi) = maxj∈1,...,vi−1,vi +1,...,m0,1− (xT

viui − xT

j ui)Given dataset D, solve the problem

minx1,...,xm

∑i∈D

l(ui , vi) + λ

m∑j=1

‖xj‖2

This can be extended to many settings e.g. sequence labeling,learning to rank, etc.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 17 / 76

Page 18: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data Support Vector Machines

General Learning Problems

Support Vector Machines:

minx

n∑i=1

max0,1− vixT ui+ λ‖x‖22

Logistic Regression:

minx

n∑i=1

log(1 + exp(−vixT ui)) + λ‖x‖22

General form:

minx

n∑i=1

l(x,ui , vi) + λΩ(x)

l : loss function, Ω: regularizer.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 18 / 76

Page 19: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data Stochastic Sub-gradient descent

Sub-gradient Descent

Sub-gradient for a non-differentiable convex function f at a pointx0 is a vector v such that:

f (x)− f (x0) ≥ vT (x− x0)

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 19 / 76

Page 20: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data Stochastic Sub-gradient descent

Sub-gradient Descent

Randomly initialize x0

Iterate xk = xk−1 − tkg(xk−1), k = 1,2,3, . . . . Where g is asub-gradient of f .tk = 1√

k.

xbest (k) = mini=1,...,k f (xk )

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 20 / 76

Page 21: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data Stochastic Sub-gradient descent

Sub-gradient Descent

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 21 / 76

Page 22: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data Stochastic Sub-gradient descent

Stochastic Sub-gradient Descent

Convergence rate is: O( 1√k

).

Each iteration takes O(n) time.Reduce time by calculating the gradient using a subset ofexamples - stochastic subgradient.Inherently serial.Typical O( 1

ε2) behaviour.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 22 / 76

Page 23: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Machine Learning and Big Data Stochastic Sub-gradient descent

Stochastic Sub-gradient Descent

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 23 / 76

Page 24: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization

Distributed gradient descent

Divide the dataset into m parts. Each part is processed on onecomputer. Total m.There is one central computer.All computers can communicate with the central computer vianetwork.Define loss(x) =

∑mj=1

∑i∈Cj

li(x) +λΩ(x), where li(x) = l(x,ui , vi)

The gradient (in case of differentiable loss):

∇loss(x) =m∑

j=1

∇(∑i∈Cj

li(x)) + λΩ(x)

Compute ∇lj(x) =∑

i∈Cj∇li(x) on the j th computer. Communicate

to central computer.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 24 / 76

Page 25: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization

Distributed gradient descent

Compute ∇loss(x) =∑m

j=1∇lj(x) + Ω(x) at the central computer.

The gradient descent update: xk+1 = xk − α∇loss(x).α chosen by a line search algorithm (distributed).For non-differentiable loss functions, we can use distributedsub-gradient descent algorithm.

Slow for most practical problems.For achieving ε tolerance,

Gradient descent (Logistic regression): O(1/ε) iterations.Sub-gradient descent (Stochastic Sub-gradient descent): O( 1

ε2 )iterations.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 25 / 76

Page 26: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization ADMM

Alternating Direction Method of Multipliers

Problem

minimizex ,z f (x) + g(z)

subject to: Ax + Bz = c

AlgorithmIterate till convergence:

xk+1 = argminx f (x) + ρ2‖Ax + Bzk − c + uk‖22

zk+1 = argminzg(z) + ρ2‖Axk+1 + Bz − c + uk‖22

uk+1 = uk + Axk+1 + Bzk+1 − c

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 26 / 76

Page 27: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization ADMM

Stopping criteria

Stop when primal and dual residuals small:

‖r k‖2 ≤ εpri and ‖sk‖2 ≤ εdual

Hence, ‖r k‖2 → 0 and ‖sk‖2 → 0 as k →∞

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 27 / 76

Page 28: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization ADMM

Observations

x- update requires solving an optimization problem

minx

f (x) +ρ

2‖Ax − v‖22

with, v = Bzk − c + uk

Similarly for z-update.Sometimes has a closed form.ADMM is a meta optimization algorithm.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 28 / 76

Page 29: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Convergence

Convergence of ADMM

Assumption 1: Functions f : Rn → R and g : Rm → R are closed,proper and convex.

Same as assuming epif = (x , t) ∈ Rn × R|f (x) ≤ t is closed andconvex.

Assumption 2: The unaugmented Lagrangian L0(x , y , z) has asaddle point (x∗, z∗, y∗):

L0(x∗, z∗, y) ≤ L0(x∗, z∗, y∗) ≤ L0(x , z, y∗)

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 29 / 76

Page 30: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Convergence

Convergence of ADMM

Primal residual: r = Ax + Bz − cOptimal objective: p∗ = infx ,zf (x) + g(z)|Ax + Bz = cConvergence results:

Primal residual convergence: r k → 0 as k →∞Dual residual convergence: sk → 0 as k →∞Objective convergence: f (x) + g(z)→ p∗ as k →∞Dual variable convergence: yk → y∗ as k →∞

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 30 / 76

Page 31: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Distributed Loss Minimization

Decomposition

If f is separable:

f (x) = f1(x1) + · · ·+ fN(xN), x = (x1, . . . , xN)

A is conformably block separable; i.e. AT A is block diagonal.Then, x-update splits into N parallel updates of xi

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 31 / 76

Page 32: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Distributed Loss Minimization

Consensus Optimization

Problem:

minx

f (x) =N∑

i=1

fi(x)

ADMM form:

minxi ,z

N∑i=1

fi(xi)

s.t. xi − z = 0, i = 1, . . . ,N

Augmented lagrangian:

Lρ(x1, . . . , xN , z, y) =N∑

i=1

(fi(xi) + yTi (xi − z) +

ρ

2‖xi − z‖22)

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 32 / 76

Page 33: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Distributed Loss Minimization

Consensus Optimization

ADMM algorithm:

xk+1i = argminxi

(fi(xi) + ykTi (xi − zk ) +

ρ

2‖xi − zk‖22)

zk+1 =1N

N∑i=1

(xk+1i +

yki )

yk+1i = yk

i + ρ(xk+1i − zk+1)

Final solution is zk .

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 33 / 76

Page 34: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Distributed Loss Minimization

Consensus Optimization

z-update can be written as:

zk+1 = xk+1 +1ρ

yk+1

Averaging the y -updates:

yk+1 = yk + ρ(xk+1 − zk+1)

Substituting first into second: yk+1 = 0. Hence zk = xk .Revised algorithm:

xk+1i = argminxi

(fi(xi) + ykTi (xi − xk ) +

ρ

2‖xi − xk‖22)

yk+1i = yk

i + ρ(xk+1i − xk+1)

Final solution is zk .Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 34 / 76

Page 35: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Distributed Loss Minimization

Distributed Loss minimization

Problem:

minx

l(Ax − b) + r(x)

Partition A and b by rows:

A =

A1...

AN

, b =

b1...

bN

,where, Ai ∈ Rmi×m and bi ∈ Rmi

ADMM formulation:

minxi ,z

N∑i=1

li(Aixi − bi) + r(z)

s.t.: xi − z = 0, i = 1, . . . ,N

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 35 / 76

Page 36: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Distributed Loss Minimization

Distributed Loss minimization

ADMM solution:

xk+1i = argminxi

(li(Aixi − bi) +ρ

2‖xi − zk + uk

i ‖22)

zk+1 = argminz(r(z) +Nρ2‖z − xk+1 + uk‖22)

uk+1i = uk

i + xk+1i − zk+1

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 36 / 76

Page 37: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Results

ADMM Results

Logistic Regression using the loss minimization formulation (Boyd etal.):

minx

n∑i=1

log(1 + exp(−vixT ui)) + λ‖x‖22

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 37 / 76

Page 38: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Results

ADMM Results

Logistic Regression using the loss minimization formulation (Boyd etal.):

minx

n∑i=1

log(1 + exp(−vixT ui)) + λ‖x‖22

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 38 / 76

Page 39: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Results

Other Machine Learning Problems

Ridge Regression.Lasso.Multi-class SVM.Ranking.Structured output prediction.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 39 / 76

Page 40: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Results

ADMM Results

Lasso Results (Boyd et al.):

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 40 / 76

Page 41: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Results

ADMM Results

SVM primal residual:

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 41 / 76

Page 42: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Results

ADMM Results

SVM Accuracy:

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 42 / 76

Page 43: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Results

Results

Risk and Hyperplane

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 43 / 76

Page 44: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Development of ADMM

Dual Ascent

Convex equality constrained problem:

minx

f (x)

subject to: Ax = b

Lagrangian: L(x , y) = f (x) + yT (Ax − b)

Dual function: g(y) = infxL(x , y)

Dual problem: maxy g(y)

Final solution: x∗ = argminxL(x , y)

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 44 / 76

Page 45: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Development of ADMM

Dual Ascent

Gradient descent for dual problem: yk+1 = yk + αk∇yk g(yk )

∇yk g(yk ) = Ax − b, where x = argminxL(x , yk )

Dual ascent algorithm:

xk+1 = argminxL(x , yk )

yk+1 = yk + αk (Axk+1 − b)

Assumptions:L(x , yk ) is strictly convex. Else, the first step can have multiplesolutions.L(x , yk ) is bounded below.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 45 / 76

Page 46: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Development of ADMM

Dual Decomposition

Suppose f is separable:

f (x) = f1(x1) + · · ·+ fN(xN), x = (x1, . . . , xN)

L is separable in x : L(x , y) = L1(x1, y) + · · ·+ LN(xN , y)− yT b,where Li(xi , y) = fi(xi) + yT Aixi

x minimization splits into N separate problems:

xk+1i = argminxi

Li(xi , yk )

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 46 / 76

Page 47: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Development of ADMM

Dual Decomposition

Dual decomposition:

xk+1i = argminxi

Li(xi , yk ), i = 1, . . . ,N

yk+1 = yk + αk (N∑

i=1

Aixi − b)

Distributed solution:Scatter yk to individual nodesCompute xi in the i th node (distributed step)Gather Aixi from the i th node

All drawbacks of dual ascent exist

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 47 / 76

Page 48: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Development of ADMM

Method of Multipliers

Make dual ascent work under more general conditionsUse augmented Lagrangian:

Lρ(x , y) = f (x) + yT (Ax − b) +ρ

2‖Ax − b‖22

Method of multipliers:

xk+1 = argminxLρ(x , yk )

yk+1 = yk + ρ(Axk+1 − b)

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 48 / 76

Page 49: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Development of ADMM

Methods of Multipliers

Optimality conditions (for differentiable f ):Primal feasibility: Ax∗ − b = 0Dual feasibility: ∇f (x∗) + AT y∗ = 0

Since xk+1 minimizes Lρ(x , yk )

0 = ∇xLρ(xk+1, yk )

= ∇x f (xk+1) + AT (yk + ρ(Axk+1 − b))

= ∇x f (xk+1) + AT yk+1

Dual update yk+1 = yk + ρ(Axk+1 − b) makes (xk+1, yk+1) dualfeasiblePrimal feasibility is achieved in the limit: (Axk+1 − b)→ 0

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 49 / 76

Page 50: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Development of ADMM

Alternating direction method of multipliers

Problem with applying standard method of multipliers fordistributed optimization:

there is no problem decomposition even if f is separable.due to square term ρ

2‖Ax − b‖22

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 50 / 76

Page 51: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Development of ADMM

Alternating direction method of multipliers

ADMM problem:

minx ,z

f (x) + g(z)

subject to: Ax + Bz = c

Lagrangian:Lρ(x , z, y) = f (x) + g(z) + yT (Ax + Bz − c) + ρ

2‖Ax + Bz − c‖22ADMM:

xk+1 = argminxLρ(x , zk , yk )

zk+1 = argminzLρ(xk+1, z, yk )

yk+1 = yk + ρ(Axk+1 + Bzk+1 − c)

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 51 / 76

Page 52: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Development of ADMM

Alternating direction method of multipliers

Problem with applying standard method of multipliers fordistributed optimization:

there is no problem decomposition even if f is separable.due to square term ρ

2‖Ax − b‖22

The above technique reduces to method of multipliers if we dojoint minimization of x and zSince we split the joint x , z minimization step, the problem can bedecomposed.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 52 / 76

Page 53: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Development of ADMM

ADMM Optimality conditions

Optimality conditions (differentiable case):Primal feasibility: Ax + Bz − c = 0Dual feasibility: ∇f (x) + AT y = 0 and ∇g(z) + BT y = 0

Since zk+1 minimizes Lρ(xk+1, z, yk ):

0 = ∇g(zk+1) + BT yk + ρBT (Axk+1 + Bzk+1 − c)

= ∇g(zk+1) + BT yk+1

So, the dual variable update satisfies the second dual feasibilityconstraint.Primal feasibility and first dual feasibility are satisfiedasymptotically.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 53 / 76

Page 54: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Development of ADMM

ADMM Optimality conditions

Primal residual: r k = Axk + Bzk − cSince xk+1 minimizes Lρ(x , zk , yk ):

0 = ∇f (xk+1) + AT yk + ρAT (Axk+1 + Bzk − c)

= ∇f (xk+1) + AT (yk + ρr k+1 + ρB(zk − zk+1)

= ∇f (xk+1) + AT yk+1 + ρAT B(zk − zk+1)

or,

ρAT B(zk − zk+1) = ∇f (xk+1) + AT yk+1

Hence, sk+1 = ρAT B(zk − zk+1) can be thought as dual residual.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 54 / 76

Page 55: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Distributed Optimization Development of ADMM

ADMM with scaled dual variables

Combine the linear and quadratic termsPrimal feasibility: Ax + Bz − c = 0Dual feasibility: ∇f (x) + AT y = 0 and ∇g(z) + BT y = 0

Since zk+1 minimizes Lρ(xk+1, z, yk ):

0 = ∇g(zk+1) + BT yk + ρBT (Axk+1 + Bzk+1 − c)

= ∇g(zk+1) + BT yk+1

So, the dual variable update satisfies the second dual feasibilityconstraint.Primal feasibility and first dual feasibility are satisfiedasymptotically.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 55 / 76

Page 56: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Weighted Parameter Averaging

Distributed Support Vector Machines

Training dataset partitioned into M partitions (Sm, m = 1, . . . ,M).Each partition has L datapoints: Sm = (xml , yml), l = 1, . . . ,L.Each partition can be processed locally on a single computer.Distributed SVM training problem [?]:

minwm,z

M∑m=1

L∑l=1

loss(wm; (xml , yml)) + r(z)

s.t.wm − z = 0,m = 1, · · · ,M, l = 1, . . . ,L

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 56 / 76

Page 57: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Weighted Parameter Averaging

Parameter Averaging

Parameter averaging, also called “mixture weights” proposed in[?], for logistic regression.Results hold true for SVMs with suitable sub-derivative.Locally learn SVM on Sm:

wm = argminw1L

L∑l=1

loss(w; xml , yml) + λ‖w‖2 , m = 1, . . . ,M

The final SVM parameter is given by:

wPA =1M

M∑m=1

wm

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 57 / 76

Page 58: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Weighted Parameter Averaging

Problem with Parameter Averaging

PA with varying number of partitions - Toy dataset.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 58 / 76

Page 59: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Weighted Parameter Averaging

Weighted Parameter Averaging

Final hypothesis is a weighted sum of the parameters wm.

w =M∑

m=1

βmwm

Also proposed in [?].How to get βm ?Notation: β = [β1, · · · , βM ]T , W = [w1, · · · , wM ]

w = Wβ

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 59 / 76

Page 60: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Weighted Parameter Averaging

Weighted Parameter Averaging

Find the optimal set of weights β which attains the lowestregularized hinge loss:

minβ,ξ

λ‖Wβ‖2 +1

ML

M∑m=1

L∑i=1

ξmi

subject to: ymi(βT WT xmi) ≥ 1− ξmi , ∀i ,m

ξmi ≥ 0, ∀m = 1, . . . ,M, i = 1, . . . ,L

W is a pre-computed parameter.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 60 / 76

Page 61: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Weighted Parameter Averaging

Distributed Weighted Parameter Averaging

Distributed version of primal weighted parameter averaging:

minγm,β

1ML

M∑m=1

L∑l=1

loss(Wγm; xml , yml) + r(β)

s.t. γm − β = 0, m = 1, · · · ,M,

r(β) = λ‖Wβ‖2, γm weights for mth computer, β consensusweight.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 61 / 76

Page 62: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Weighted Parameter Averaging

Distributed Weighted Parameter Averaging

Distributed algorithm using ADMM:

γk+1m := argminγ(loss(Aiγ) + (ρ/2)‖γ − βk + uk

m‖22)

βk+1 := argminβ(r(β) + (Mρ/2)‖β − γk+1 − uk‖22)

uk+1m = uk

m + γk+1m − βk+1.

um are the scaled Lagrange multipliers, γ = 1M∑M

m=1 γm andu = 1

M∑M

m=1 um.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 62 / 76

Page 63: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Weighted Parameter Averaging

Toy Dataset - PA and WPA

PA (left) and WPA (right) with varying number of partitions - Toydataset.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 63 / 76

Page 64: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Weighted Parameter Averaging

Toy Dataset - PA and WPA

Accuracy of PA and WPA with varying number of partitions - Toydataset.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 64 / 76

Page 65: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Weighted Parameter Averaging

Real World Datasets

Epsilon (2000 features, 6000 datapoints) test set accuracy with varyingnumber of partitions.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 65 / 76

Page 66: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Weighted Parameter Averaging

Real World Datasets

Gisette (5000 features, 6000 datapoints) test set accuracy with varyingnumber of partitions.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 66 / 76

Page 67: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Weighted Parameter Averaging

Real World Datasets

Real-sim (20000 features, 3000 datapoints) test set accuracy withvarying number of partitions.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 67 / 76

Page 68: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Weighted Parameter Averaging

Real World Datasets

Convergence of test accuracy with iterations (200 partitions).

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 68 / 76

Page 69: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Weighted Parameter Averaging

Real World Datasets

Convergence of primal residual with iterations (200 partitions).

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 69 / 76

Page 70: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Fully-distributed SVM

Distributed SVM on Arbitrary Network

Motivations:Sensor Networks.Corporate networks.Privacy.

Assumptions:Data is available at nodes ofnetworkCommunication is possibleonly along edges of thenetwork.

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 70 / 76

Page 71: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Fully-distributed SVM

Distributed SVM on Arbitrary Network

SVM optimization problem:

minw ,b,ξ

12‖w‖2 + C

J∑j=1

nj∑n=1

ξjn

s.t.: yjn(w txjn + b) ≥ 1− ξjn, ∀j ∈ J,n = 1, . . . ,Nj

ξjn ≥ 0, ∀j ∈ J,n = 1, . . . ,Nj

Node j has a copy of wj ,bj . Distributed formulation:

minwj ,bj ,ξjn

12

J∑j=1

‖wj‖2 + JCJ∑

j=1

nj∑n=1

ξjn

s.t.: yjn(w tj xjn + b) ≥ 1− ξjn, ∀j ∈ J,n = 1, . . . ,Nj

ξjn ≥ 0, ∀j ∈ J,n = 1, . . . ,Nj

wj = wi , ∀j , i ∈ Bj

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 71 / 76

Page 72: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Fully-distributed SVM

Algorithm

Using vj = [wTj bj ]

T , Xj = [[xj1, . . . , xjNj ]T 1j ] and Yj = diag([yj1, . . . , yjNj ]):

minvj ,ξjn,ωji

12

J∑j=1

r(vj ) + JCJ∑

j=1

nj∑n=1

ξjn

s.t.: YjXjvj ≥ 1− ξj , ∀j ∈ J

ξj ≥ 0, ∀j ∈ Jvj = ωji , vi = ωji , ∀j , i ∈ Bj

Surrogate augmented Lagrangian:

L(vj, ξj, ωji, αijk) =12

J∑j=1

r(vj ) + JCJ∑

j=1

nj∑n=1

ξjn

+J∑

j=1

∑i∈Bj

(αTij1(vj − ωji ) + αT

ij2(vi − ωji )) + η∑i∈Bj

(‖vj − ωji‖2 + ‖vi − ωji‖2)

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 72 / 76

Page 73: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Fully-distributed SVM

Algorithm

ADMM based algorithm:

v t+1j , ξt+1

jn = argminvj ,ξj∈WL(vj, ξj, ωtji, α

tijk)

ωt+1ji = argminωji

L(vjt+1, ξt+1j , ωji, αt

ijk)

αt+1ji1 = αt

ji1 + η(v t+1j − ωt+1

ji )

αt+1ji2 = αt

ji2 + η(ωt+1ji − v t+1

i )

From the second equation:

ωt+1ji =

12η

(αtji1 − α

tji2) +

12

(v t+1j + v t+1

i )

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 73 / 76

Page 74: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Fully-distributed SVM

Algorithm

Hence:αt+1

ji1 =12

(αtji1 + αt

ji2) +η

2(v t+1

j − v t+1i )

αt+1ji2 =

12

(αtji1 + αt

ji2) +η

2(v t+1

j − v t+1i )

Substituting ωt+1ji = 1

2(v t+1j + v t+1

i ) into surrogate augmentedlagrangian, the third term becomes:

J∑j=1

∑i∈Bj

αTij1(vj − vi) =

J∑j=1

∑i∈Bj

vTj (αt

ji1 − αtij1)

Substitute αtj =

∑i∈Bj

(αtji1 − α

tij1)

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 74 / 76

Page 75: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Fully-distributed SVM

Algorithm

The final algorithm:

v t+1j , ξt+1

jn = argminvj ,ξj∈WL(vj, ξj, αtj )

αt+1j = αt

j +η

2

∑i∈Bj

(v t+1j − v t+1

i )

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 75 / 76

Page 76: Distributed Machine Learning and Big Dataacmsc/WBDA2015/slides/sb/...Machine Learning and Big Data Big Data Paradigm 3V’s - Volume, Variety, Velocity. Distributed system. Chance

Applications and extensions Fully-distributed SVM

Thank you !

Questions ?

Sourangshu Bhattacharya (IITKGP) Distributed ML August 21, 2015 76 / 76