when machine learning meets the web

When Machine Learning Meets the Web

Chao LiuInternet Services Research CenterMicrosoft Research-Redmond

Outline

Motivation & Challenges Background on Distributed Computing Standard ML on MapReduce

Classification: Naïve Bayes Clustering: Nonnegative Matrix Factorization Modeling: EM Algorithm

Customized ML on MapReduce Click Modeling Behavior Targeting

Conclusions04/19/2023 2

Motivation & Challenges

Data on the Web Scale: terabyte-to-petabyte data

▪ Around 20TB log data per day from Bing Dynamics: evolving data streams

▪ Click data streams with evolving/emerging topics

Applications: Non-traditional ML tasks▪ Predicting clicks & ads

04/19/2023 3

Outline

Parallel vs. Distributed Computing

Parallel computing All processors have access to a shared

memory, which can be used to exchange information between processors

Distributed computing Each processor has its own private

memory (distributed memory), communicating over the network▪ Message passing ▪ MapReduce

04/19/2023 5

MPI vs. MapReduce

MPI is for task parallelism Suitable for CPU-intensive jobs Fine-grained communication control,

powerful computation model

MapReduce is for data parallelism Suitable for data-intensive jobs A restricted computation model

04/19/2023 6

Word Counting on MapReduce

Reducer

Aggregate values by keys

……

……Mapper

(docId, doc) pairs

(w1,1)(w2,1)

(w3,1)

(w1,<1,1, 1>)

(w1, 3)

Mapper

(docId, doc) pairs

(w1,1) (w3,1)

Mapper

(docId, doc) pairs

(w1,1)(w2,1)

(w3,1)

Reducer

(w2,<1, 1>)

(w2, 2)

Reducer

(w3,<1,1,1>)

(w3, 3)

Web corpus on multiple machines

Mapper: for each word w in a doc, emit (w, 1)

Intermediate (key,value) pairs are aggregated by word

Reducer is copied to each machine to run over the intermediate data locally to produce the result

Machine Learning on MapReduce

A big picture: Not Omnipotent but good enough

04/19/2023 8

Standard ML Algorithm Customized ML Algorithm

MapReduce Friendly

• Classification: Naïve Bayes, logistic regression, MART, etc• Clustering: k-means, NMF, co-clustering, etc• Modeling: EM algorithm, Gaussian mixture, Latent Dirichlet Allocation, etc

• PageRank• Click Models• Behavior Tageting

MapReduce Unfriendly

• Classification: SVM• Clustering: Spectrum clustering

• Learning-to-Rank

Outline

Classification: Naïve Bayes

P(C|X) P(C) P(X|C) =P(C)∏P(Xj|C)

……

Mapper

(x(i),y(i))

(j, xj(i),y(i))

Reduce on y(i)

Reduce on j

P(Xj|C)(x(i),y(i)) Mapp

…………

Clustering: Nonnegative Matrix Factorization [Liu et al., WWW2010]

Effective tool to uncover latent relationships in nonnegative matrices with many applications [Berry et al., 2007, Sra & Dhillon, 2006] Interpretable dimensionality reduction [Lee & Seung, 1999] Document clustering [Shahnaz et al., 2006, Xu et al, 2006]

• Challenge: Can we scale NMF to million-by-million matrices

0,0,0 HWA

NMF Algorithm [Lee & Seung, 2000]

0,0,0 HWA

Distributed NMF

Data Partition: A, W and H across machines

),,( , jiAji

W. . . . .

),( iwi

. . . . .

),( jhj

Computing DNMF: The Big Picture

… … …

),,(: , jiAjiA

),,,( , iji wAji

Reduce-I

),( , iji wAj

Map-II

),( , iji wAj

Reduce-II

),( jxj

Map-IIIMap-IV

),0( WW T

),0( iTi ww

),,,( jjj yxhj

…),( jyj

),(: iwiW ),(: jhjH

… ),( newjhj

Reduce-III

Reduce-V

… …

),,(: , jiAjiA

),,,( , iji wAji

Reduce-I

),( , iji wAj

Map-II

),( , iji wAj

Reduce-II

),( jxj

),(: iwiW

X = WTA

… …

Map-IIIMap-IV

),0( WW T

),0( iTi ww …),( jyj

),(: iwiW ),(: jhjH

Reduce-III WHWY T

T wwWWC1

. . . . .

),( iwi

Y = WTWH

),( jxj

),,,( jjj yxhj

…),( jyj

),(: jhjH

… ),( newjhj

Reduce-V

H = H.*X/Y

… … …

),,(: , jiAjiA

),,,( , iji wAji

Reduce-I

),( , iji wAj

Map-II

),( , iji wAj

Reduce-II

),( jxj

Map-IIIMap-IV

),0( WW T

),0( iTi ww

),,,( jjj yxhj

…),( jyj

),(: iwiW ),(: jhjH

… ),( newjhj

Reduce-III

Reduce-V

Scalability w.r.t. Matrix Size

3 hours per iteration, 20 iterations take around 20*3*0.72 ≈ 43 hours

Less than 7 hours on a 43.9M-by-769M matrix with 4.38 billion nonzero values

General EM on MapReduce

Map Evaluate Compute

Reduce

04/19/2023 21

Outline

Click Modeling: Motivation

Clicks are good… Are these two

clicks equally “good”?

Non-clicks may have excuses: Not relevant Not examined

04/19/2023 23

Eye-tracking User Study

2404/19/2023

Bayesian Browsing Model [Liu et al., KDD2009]

C1 C2 C3 C4

S1 S2 S3 S4 Relevance

E1 E2 E3 E4

Examine Snippet

ClickThroughs

Dependencies in BBM

the preceding click position before i

i id i r

Ultimate goal

Observation: conditional independence

Model Inference

P(C|S) by Chain Rule

Likelihood of search instance

From S to R:

Putting Things Together

Posterior with

Re-organize by Rj’s

How many times dj

was clicked

How many times dj was not clicked when it is at position (r + d) and the preceding click is on position r

What p(R|C1:n) Tells Us

Exact inference with joint posterior in closed form

Joint posterior factorizes and hence mutually independent

At most M(M+1)/2 + 1 numbers to fully characterize each posterior Count vector: 0 1 2 ( 1) 2( , , ,..., )M Me e e e e

An Example

ComputeCount vector for

N4, r, d

LearnBBM on MapReduce

Map: emit((q,u), idx)

Reduce: construct the count vector

Example on MapReduce

(U1, 0)(U2, 4)(U3, 0)

(U1, 1)(U3, 0)(U4, 7)

(U1, 1)(U3, 0)(U4, 0)

21 1 1( ) (1 )p R R R 2 2( ) 1 0.98p R R 3

3 3( )p R R 4 4 4( ) (1 )p R R R (U1, 0, 1, 1) (U2,

4)(U4, 0, 7)

(U3, 0, 0, 0)

Reduce

Petabyte-Scale Experiment

Setup: 8 weeks data, 8

jobs Job k takes first k-

week data

• Experiment platform– SCOPE: Easy and Efficient Parallel Processing of

Massive Data Sets [Chaiken et al, VLDB’08]

Scalability of BBM

Increasing computation load more queries, more urls, more impressions

Near-constant elapse time

Computation Overload Elapse Time on SCOPE

• 3 hours• Scan 265 terabyte

data• Full posteriors for

1.15 billion (query, url) pairs

Large-scale Behavior Targeting [Ye et al., KDD2009]

Behavior targeting Ad serving based on users’ historical

behaviors Complementary to sponsored Ads and

content Ads

04/19/2023 36

Problem Setting

Goal Given ads in a certain category, locate qualified users

based on users’ past behaviors

Data User is identified by cookie Past behavior, profiled as a vector x, includes ad clicks,

ad views, page views, search queries, clicks, etc

Challenges: Scale: e.g., 9TB ad data with 500B entries in Aug'08 Sparse: e.g., the CTR of automotive display ads is 0.05% Dynamic: i.e., user behavior changes over time.

04/19/2023 37

Learning: Linear Poisson Model

CTR = ClickCnt/ViewCnt A model to predict expected click count A model to predict expected view count

Linear Poisson model

MLE on w

04/19/2023 38

Implementation on MapReduce

Learning Map: Compute and Reduce: Update

Prediction

04/19/2023 39

Outline

Conclusions

Challenges imposed by Web data Scalability of standard algorithms Application-driven customized algorithms

Capability to consume huge amount of data outweighs algorithm sophistication Simple counting is no less powerful than sophisticated

algorithms when data is abundant or even infinite

MapReduce: a restricted computation model Not omnipotent but powerful enough Things we want to do turn out to be things we can do

04/19/2023 41

Thank You!

04/19/2023 SEWM‘10 Keynote, Chengdu, China 42

when machine learning meets the web

distributed computingstandard

data parallelismsuitable

nave bayesclustering

em algorithmcustomized

intermediate data

nonnegative matrices

doc pairsw1

svm clustering

Documents

when education meets innovation!

when splunk meets slack

"when india meets greece"

when tango meets eclipse

when heaven meets earth

when man meets machine - genpact as well as manage the...

compressed sensing meets machine learning - classification

when jo meets charlie

when east meets west

when coffee meets inspiration

when marketing meets the machine

when reproduction meets ageing

when machine learning meets congestion control: a survey

when social meets spatial

machine learning meets web development

pangea v3 - when engine search meets machine translation,...

when machine learning meets wi-fi

when medicine meets engineering

when networking meets wireless when networking meets...

when passion meets profession