Download - Graphical Models for the Internet (Part2)
![Page 1: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/1.jpg)
Graphical Models for the Internet
Amr Ahmed and Alexander Smola
Yahoo Research, Santa Clara, CA
![Page 2: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/2.jpg)
Thus far ...
• Mo>va>on
• Basic tools – Clustering – Topic Models
• Distributed batch inference – Local and global states – Star synchroniza>on
![Page 3: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/3.jpg)
Up next
• Inference – Online Distributed Sampling
– Single machine mul>-‐threaded inference – Online EM and Submodular Selec>on
• Applica>ons – User tracking for behavioral Targe>ng – Content understanding – User modeling for content recommenda>on
![Page 4: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/4.jpg)
4. Online Model
![Page 5: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/5.jpg)
Scenarios
• Batch Large-‐Scale – Covered in part 1
• Mini-‐batches – We already have a model
– Data arrives in batches – We would like to keep model up-‐to-‐data
• Time-‐sensi>ve – Data arrives one item at a >me
– Model should be up-‐to-‐data
Time
Time
![Page 6: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/6.jpg)
4.1 Dynamic Clustering
![Page 7: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/7.jpg)
The Chinese Restaurant Process
• Allows the number of mixtures to grow with the data
• They are called non-‐parametric models – Means the number of effec>ve parameters grow with data
– S>ll have hyper-‐parameters that control the rate of growth • α: how fast a new cluster/mixture is born? • G0: Prior over mixture component parameters
![Page 8: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/8.jpg)
The Chinese Restaurant Process
-‐ For data point xi -‐ Choose table j ∝ mj and Sample xi ~ f(φj) -‐ Choose a new table K+1 ∝ α
-‐ Sample φK+1 ~ G0 and Sample xi ~ f(φK+1)
Genera>ve Process
φ2 φ1 φ3
The rich gets richer effect CANNOT handle sequen6al data
![Page 9: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/9.jpg)
Recurrent CRP (RCRP) [Ahmed and Xing 2008]
• Adapts the number of mixture components over >me – Mixture components can die out
– New mixture components are born at any >me
– Retained mixture components parameters evolve according to a Markovian dynamics
![Page 10: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/10.jpg)
The Recurrent Chinese Restaurant Process
φ2,1 φ1,1 φ3,1 T=1
Dish eaten at table 3 at >me epoch 1 OR the parameters of cluster 3 at >me epoch 1
-‐ Customers at >me T=1 are seated as before: -‐ Choose table j ∝ mj,1 and Sample xi ~ f(φj,1) -‐ Choose a new table K+1 ∝ α
-‐ Sample φK+1,1 ~ G0 and Sample xi ~ f(φK+1,1)
Genera>ve Process
![Page 11: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/11.jpg)
The Recurrent Chinese Restaurant Process
φ2,1 φ1,1 φ3,1 T=1
T=2
φ2,1 φ1,1 φ3,1
m'1,1=2 m'2,1=3 m'3,1=1
![Page 12: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/12.jpg)
φ2,1 φ1,1 φ3,1 T=1
T=2 φ2,1 φ1,1 φ3,1
m'1,1=2 m'2,1=3 m'3,1=1
![Page 13: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/13.jpg)
φ2,1 φ1,1 φ3,1 T=1
T=2 φ2,1 φ1,1 φ3,1
m'1,1=2 m'2,1=3 m'3,1=1
![Page 14: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/14.jpg)
φ2,1 φ1,1 φ3,1 T=1
T=2 φ2,1 φ1,1 φ3,1
m'1,1=2 m'2,1=3 m'3,1=1
![Page 15: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/15.jpg)
φ2,1 φ1,1 φ3,1 T=1
T=2 φ2,1 φ1,2 φ3,1
Sample φ1,2 ~ P(.| φ1,1)
m'1,1=2 m'2,1=3 m'3,1=1
![Page 16: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/16.jpg)
φ2,1 φ1,1 φ3,1 T=1
T=2 φ2,1 φ1,2 φ3,1
And so on ……
m'1,1=2 m'2,1=3 m'3,1=1
![Page 17: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/17.jpg)
φ2,1 φ1,1 φ3,1 T=1
T=2 φ2,2 φ1,2 φ3,1
At the end of epoch 2
φ4,2
Newly born cluster Died out cluster
m'1,1=2 m'2,1=3 m'3,1=1
![Page 18: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/18.jpg)
φ2,1 φ1,1 φ3,1 T=1
T=2 φ2,2 φ1,2 φ3,1
N1,1=2 N2,1=3 m'3,1=1
φ4,2
T=3
φ2,2 φ1,2 φ4,2
m'4,2=1 m'2,2=2 m'1,2=2
![Page 19: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/19.jpg)
Recurrent Chinese Restaurant Process • Can be extended to model higher-‐order dependencies
• Can decay dependencies over >me – Pseudo-‐counts for table k at >me t is
History size
Decay factory Number of customers siing at table K at >me epoch t-‐h
∑ =
-
-
⎟ ⎠
⎞ ⎜ ⎝
⎛ H
h h t k
h
m e 1
, ρ
![Page 20: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/20.jpg)
φ2,1 φ1,1 φ3,1 T=1
T=2 φ2,2 φ1,2 φ3,1
m'1,1=2 m'2,1=3 m'3,1=1
φ4,2
T=3 φ2,2 φ1,2 φ4,2
∑ =
-
-
⎟ ⎠
⎞ ⎜ ⎝
⎛ H
h h t k
h
m e 1
, ρ
m'2,3
m'2,3 =
![Page 21: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/21.jpg)
4.2 Online Distributed Inference
Tracking Users Interest
![Page 22: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/22.jpg)
Characterizing User Interests • Short term vs long-‐term
Jan April July Oct
Music
Housing
Furniture
Buying a car
Travel plans
![Page 23: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/23.jpg)
Characterizing User Interests • Short term vs long-‐term
• Latent
Jan April July Oct
millage
fast
mortgage Gaga
seafood
Barcelona
used
![Page 24: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/24.jpg)
Problem formula>on Input
• Queries issued by the user or tags of watched content • Snippet of page examined by user
• Time stamp of each ac>on (day resolu>on)
Output
• Users’ daily distribu>on over interests • Dynamic interest representa>on • Online and scalable inference • Language independent
Flight London Hotel weather
School Supplies Loan semester
classes registra>on housing rent
![Page 25: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/25.jpg)
Problem formula>on
Flight London Hotel weather
Travel
Back To school
finance School Supplies Loan semester
classes registra>on housing rent
Input
• Queries issued by the user or tags of watched content • Snippet of page examined by user
• Time stamp of each ac>on (day resolu>on)
Output
• Users’ daily distribu>on over interests • Dynamic interest representa>on • Online and scalable inference • Language independent
![Page 26: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/26.jpg)
Problem formula>on
Flight London Hotel weather
Travel
Back To school
finance School Supplies Loan semester
classes registra>on housing rent
When to show a financing ad?
![Page 27: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/27.jpg)
Problem formula>on
Flight London Hotel weather
Travel
Back To school
finance School Supplies Loan semester
classes registra>on housing rent
When to show a financing ad?
![Page 28: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/28.jpg)
Problem formula>on
Flight London Hotel weather
Travel
Back To school
finance School Supplies Loan semester
classes registra>on housing rent
When to show a financing ad?
![Page 29: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/29.jpg)
Problem formula>on
Flight London Hotel weather
Travel
Back To school
finance School Supplies Loan semester
classes registra>on housing rent
When to show a hotel ad?
![Page 30: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/30.jpg)
Problem formula>on
Flight London Hotel weather
Travel
Back To school
finance School Supplies Loan semester
classes registra>on housing rent
When to show a hotel ad?
![Page 31: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/31.jpg)
Problem formula>on
Flight London Hotel weather
Travel
Back To school
finance School Supplies Loan semester
classes registra>on housing rent
Input
• Queries issued by the user or tags of watched content • Snippet of page examined by user
• Time stamp of each ac>on (day resolu>on)
Output
• Users’ daily distribu>on over interests • Dynamic interest representa>on • Online and scalable inference • Language independent
![Page 32: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/32.jpg)
Mixed-‐Membership Formula>on
job Career Business Assistant Hiring
Part-‐>me Recep>onis
t
Car Blue Book Kelley Prices Small Speed large
Bank Online Credit Card debt
pornolio Finance Chase
Recipe Chocolate
Pizza Food
Chicken Milk Buoer Powder
Diet Cars Job Finance
Objects
Mixtures
Degree of membership
Job Hiring speed price
part-‐6me Camry Career opening bonus package
card diet calories loan recipe milk Weight lb kg
![Page 33: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/33.jpg)
In Graphical Nota>on
![Page 34: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/34.jpg)
In Polya-‐Urn Representa>on
• Collapse mul>nomial variables:
• Fixed-‐dimensional Hierarchal Polya-‐Urn representa>on – Chinese restaurant franchise
x x
![Page 35: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/35.jpg)
Food Chicken
job Career Business Assistant Hiring
Part-‐>me Recep>o
nist
Car Blue Book Kelley Prices Small Speed large
Bank Online Credit Card debt
pornolio Finance Chase
Recipe Chocolate
Pizza Food
Chicken Milk Buoer Powder
Car speed offer camry accord career
Global topics trends
Topic word-‐distribu>ons
User-‐specific topics trends (mixing-‐vector)
User interac>ons: queries, keyword from pages viewed
![Page 36: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/36.jpg)
Food Chicken
Car speed offer camry accord career
• For each user interac>on • Choose an intent from local distribu>on
• Sample word from the topic’s word-‐distribu>on • Choose a new intent ∝ λ
• Sample a new intent from the global distribu>on • Sample word from the new topic word-‐distribu>on
Genera>ve Process
………
job Career Business Assistant Hiring
Part-‐>me Recep>o
nist
Car Blue Book Kelley Prices Small Speed large
Bank Online Credit Card debt
pornolio Finance Chase
Recipe Chocolate
Pizza Food
Chicken Milk Buoer Powder
![Page 37: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/37.jpg)
Food Chicken pizza
Car speed offer camry accord career
• For each user interac>on • Choose an intent from local distribu>on
• Sample word from the topic’s word-‐distribu>on • Choose a new intent ∝ λ
• Sample a new intent from the global distribu>on • Sample word from the new topic word-‐distribu>on
Genera>ve Process
………
job Career Business Assistant Hiring
Part-‐>me Recep>o
nist
Car Blue Book Kelley Prices Small Speed large
Bank Online Credit Card debt
pornolio Finance Chase
Recipe Chocolate
Pizza Food
Chicken Milk Buoer Powder
![Page 38: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/38.jpg)
Food Chicken pizza
Car speed offer camry accord career
• For each user interac>on • Choose an intent from local distribu>on
• Sample word from the topic’s word-‐distribu>on • Choose a new intent ∝ λ
• Sample a new intent from the global distribu>on • Sample word from the new topic word-‐distribu>on
Genera>ve Process
………
job Career Business Assistant Hiring
Part-‐>me Recep>o
nist
Car Blue Book Kelley Prices Small Speed large
Bank Online Credit Card debt
pornolio Finance Chase
Recipe Chocolate
Pizza Food
Chicken Milk Buoer Powder
![Page 39: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/39.jpg)
Food Chicken pizza hiring
Car speed offer camry accord career
• For each user interac>on • Choose an intent from local distribu>on
• Sample word from the topic’s word-‐distribu>on • Choose a new intent ∝ λ
• Sample a new intent from the global distribu>on • Sample word from the new topic word-‐distribu>on
Genera>ve Process
………
job Career Business Assistant Hiring
Part-‐>me Recep>o
nist
Car Blue Book Kelley Prices Small Speed large
Bank Online Credit Card debt
pornolio Finance Chase
Recipe Chocolate
Pizza Food
Chicken Milk Buoer Powder
![Page 40: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/40.jpg)
Food Chicken pizza millage
Car speed offer camry accord career
• For each user interac>on • Choose an intent from local distribu>on
• Sample word from topic’s word-‐distribu>on • Choose a new intent ∝ λ
• Sample a new intent from the global distribu>on • Sample from word the new topic word-‐distribu>on
Genera>ve Process
job Career Business Assistant Hiring
Part-‐>me Recep>o
nist
Car Blue Book Kelley Prices Small Speed large
Bank Online Credit Card debt
pornolio Finance Chase
Recipe Chocolate
Pizza Food
Chicken Milk Buoer Powder
![Page 41: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/41.jpg)
Food Chicken pizza millage
Car speed offer camry accord career
job Career Business Assistant Hiring
Part-‐>me Recep>o
nist
Car Blue Book Kelley Prices Small Speed large
Bank Online Credit Card debt
pornolio Finance Chase
Recipe Chocolate
Pizza Food
Chicken Milk Buoer Powder
Problems -‐ Sta>c Model -‐ Does not evolve user’s interests -‐ Does not evolve the global trend of interests -‐ Does not evolve interest’s distribu>on over terms
![Page 42: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/42.jpg)
Food Chicken pizza millage
Car speed offer camry accord career
job Career Business Assistant Hiring
Part-‐>me Recep>o
nist
Car Blue Book Kelley Prices Small Speed large
Bank Online Credit Card debt
pornolio Finance Chase
Recipe Chocolate
Pizza Food
Chicken Milk Buoer Powder
At 6me t At 6me t+1
Build a dynamic model
Connect each level using a RCRP
![Page 43: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/43.jpg)
At 6me t At 6me t+1 At 6me t+2 At 6me t+3
User 1 process
User 2 process
User 3 process
Global process
m m'
n n'
Which >me kernel to use at each level?
![Page 44: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/44.jpg)
Pseudo counts = *
Decay factor
Food Chicken pizza millage
Car speed offer camry accord career
At 6me t
Observa>on 1 -‐ Popular topics at >me t are likely to be popular at >me t+1 - φk,t+1 is likely to smoothly evolve from φk,t
At 6me t+1 job
Career Business Assistant Hiring
Part-‐>me Recep>o
nist
Car Blue Book Kelley Prices Small Speed large
Bank Online Credit Card debt
pornolio Finance Chase
Recipe Chocolate
Pizza Food
Chicken Milk Buoer Powder
![Page 45: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/45.jpg)
Food Chicken pizza millage
Car speed offer camry accord career
At 6me t At 6me t+1 job
Career Business Assistant Hiring
Part-‐>me Recep>o
nist
Car Blue Book Kelley Prices Small Speed large
Bank Online Credit Card debt
pornolio Finance Chase
Recipe Chocolate
Pizza Food
Chicken Milk Buoer Powder
Observa>on 1 -‐ Popular topics at >me t are likely to be popular at >me t+1 - φk,t+1 is likely to smoothly evolve from φk,t
Car Al6ma Accord Book Kelley Prices Small Speed
φk,t+1 ∼ Dir(βk,t+1) φk,t
Car Blue Book Kelley Prices Small Speed large
Intui>on Captures current trend of
the car industry (new release for e.g.)
~
![Page 46: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/46.jpg)
Food Chicken pizza millage
Car speed offer camry accord career
At 6me t At 6me t+1
Observa>on 2 -‐ User prior at >me t+1 is a mixture of the user short and long term interest
How do we get a prior that captures both long
and short term interest?
job Career Business Assistant Hiring
Part-‐>me Recep>o
nist
Car Al>ma Accord Blue Book Kelley Prices Small Speed
Bank Online Credit Card debt
pornolio Finance Chase
Recipe Chocolate
Pizza Food
Chicken Milk Buoer Powder
![Page 47: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/47.jpg)
All
week
job Career Business Assistant Hiring
Part-‐>me Recep>onis
t
Car Blue Book Kelley Prices Small Speed large
Bank Online Credit Card debt
pornolio Finance Chase
Recipe Chocolate
Pizza Food
Chicken Milk Buoer Powder
month
Time t t+1
Food Chicken pizza
recipe job hiring
Part-‐>me Opening salary
food chicken Pizza millage
Kelly recipe cuisine
Diet Cars Job Finance
Prior for user ac>ons at >me t
μ
μ2
μ3
Long-‐term short-‐term
![Page 48: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/48.jpg)
Food Chicken Pizza millage
Car speed offer camry accord career
At 6me t At 6me t+1
• For each user interac>on • Choose an intent from local distribu>on
• Sample word from the topic’s word-‐distribu>on • Choose a new intent ∝ λ
• Sample a new intent from the global distribu>on • Sample word from the new topic word-‐distribu>on
Genera>ve Process
short-‐term priors
job Career Business Assistant Hiring
Part-‐>me Recep>o
nist
Car Al>ma Accord Blue Book Kelley Prices Small Speed
Bank Online Credit Card debt
pornolio Finance Chase
Recipe Chocolate
Pizza Food
Chicken Milk Buoer Powder
![Page 49: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/49.jpg)
Polya-‐Urn RCRF Process
?
![Page 50: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/50.jpg)
Simplified Graphical Model
~
~
~
~
~
~
At 6me t At 6me t+1
![Page 51: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/51.jpg)
Simplified Graphical Model
~
~
~
~
~
~
Car Al6ma Accord Book Kelley Prices Small Speed
Car Blue Book Kelley Prices Small Speed large
![Page 52: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/52.jpg)
Simplified Graphical Model
~
~
~
~
~
~
Food Chicken Pizza millage
![Page 53: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/53.jpg)
Simplified Graphical Model
~
~
~
~
~
~
![Page 54: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/54.jpg)
~
~
~
~
~
~
Simplified Graphical Model
Topics evolve over 6me?
User’s intent evolve over 6me?
Capture long and term interests of users?
![Page 55: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/55.jpg)
User 1 process
User 2 process
User 3 process
Global process
At 6me t At 6me t+1 At 6me t+2 At 6me t+3
m m'
n n'
![Page 56: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/56.jpg)
4.2 Online Distributed Inference
Work Flow
![Page 57: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/57.jpg)
Work Flow today
Current Users’ models
System state
new Users’ models
User interac>ons
User interac>ons
User interac>ons
User interac>ons
User interac>ons
Daily Update
(inference)
Hundred of millions
![Page 58: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/58.jpg)
~
~
~
~
~
~
Online Scalable Inference
• Online algorithm – Greedy 1-‐par>cle filtering algorithm
– Works well in prac>ce
– Collapse all mul>nomials except Ωt
• This makes distributed inference easier – At each >me t:
• Distributed scalable implementa>on – Used first part architecture as a subrou>ne – Added synchronous sampling capabili>es
![Page 59: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/59.jpg)
Distributed Inference (at >me t)
Ω
φ
θ
z
w
![Page 60: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/60.jpg)
client client
Distributed Inference (at >me t)
θ
z
w
θ
z
w
Ω
φ
Collapse all mul>nomial
![Page 61: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/61.jpg)
Ater collapsing
Food Chicken Pizza millage
job Career Busines
s Assistan
t Hiring Part-‐>me
Recep>onist
Car Blue Book Kelley Prices Small Speed large
Bank Online Credit Card debt
pornolio
Finance Chase
Recipe Chocolate Pizza Food
Chicken Milk Buoer Powder
client client
Car speed offer camry accord career
Use Star-‐Synchroniza>on
![Page 62: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/62.jpg)
Fully Collapsed
Food Chicken Pizza millage
job Career Busines
s Assistan
t Hiring Part-‐>me
Recep>onist
Car Blue Book Kelley Prices Small Speed large
Bank Online Credit Card debt
pornolio
Finance Chase
Recipe Chocolate Pizza Food
Chicken Milk Buoer Powder
Shared memory
client client
Car speed offer camry accord career
![Page 63: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/63.jpg)
Distributed Inference (at >me t)
Food Chicken Pizza millage
client client
Car speed offer camry accord career
P (ztij = k|wtij = w,⌦t, nt
i) / nt,�jik + nt
ik + �mt
k + mtk + ↵
KPl m
tl + mt
l +↵K
!nt,�jkw + �t
kw + �P
l nt,�jkl + �t
kl + �
P (ztij = k|wtij = w,⌦t, nt
i) / nt,�jik + nt
ik + �mt
k + mtk + ↵
KPl m
tl + mt
l +↵K
!nt,�jkw + �t
kw + �P
l nt,�jkl + �t
kl + �
Local trend Global trend Topic factor
![Page 64: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/64.jpg)
Semi-‐Collapsed
Food Chicken Pizza millage
job Career Busines
s Assistan
t Hiring Part-‐>me
Recep>onist
Car Blue Book Kelley Prices Small Speed large
Bank Online Credit Card debt
pornolio
Finance Chase
Recipe Chocolate Pizza Food
Chicken Milk Buoer Powder
Shared memory
client client
Car speed offer camry accord career
![Page 65: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/65.jpg)
Semi-‐Collapsed
Food Chicken Pizza millage
job Career Busines
s Assistan
t Hiring Part-‐>me
Recep>onist
Car Blue Book Kelley Prices Small Speed large
Bank Online Credit Card debt
pornolio
Finance Chase
Recipe Chocolate Pizza Food
Chicken Milk Buoer Powder
Shared memory
client client
Ωt Ωt Ωt
Car speed offer camry accord career
![Page 66: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/66.jpg)
Semi-‐Collapsed
Food Chicken Pizza millage
client client
Ωt Ωt
Car speed offer camry accord career
P (ztij = k|wtij = w,⌦t, nt
i)
/ nt,�jik + nt
ik + �⌦t
!nt,�jkw + �t
kw + �P
l nt,�jkl + �t
kl + �
P (ztij = k|wtij = w,⌦t, nt
i)
/ nt,�jik + nt
ik + �⌦t
!nt,�jkw + �t
kw + �P
l nt,�jkl + �t
kl + �
![Page 67: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/67.jpg)
Distributed Sampling Cycle
Sample Z For users
Sample Z For users
Sample Z For users
Sample Z For users
Barrier
Write counts to
memcached
Write counts to
memcached
Write counts to
memcached
Write counts to
memcached
Collect counts and sample Ω
Do nothing Do nothing Do nothing
Barrier
Read Ω from memcached
Read Ω from
memcached
Read Ω from
memcached
Read Ω from
memcached
Sample Ωt Requires a reduc>on step
![Page 68: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/68.jpg)
Distributed Sampling Cycle
Sample Z For users
Sample Z For users
Sample Z For users
Sample Z For users
Barrier
Write counts Write counts Write counts Write counts
Collect counts and sample Ω
Do nothing Do nothing Do nothing
Barrier
Read Ω from Read Ω from Read Ω from Read Ω from
![Page 69: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/69.jpg)
Sampling Ω
• Introduce auxiliary variable mkt
– How many >mes the global distribu>on was visited
– ~ AnotniaK
Ω
m
P (ztij = k|wtij = w,⌦t, nt
i)
/ nt,�jik + nt
ik + �⌦t
!nt,�jkw + �t
kw + �P
l nt,�jkl + �t
kl + �
![Page 70: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/70.jpg)
Distributed Sampling Cycle
Sample Z For users
Sample Z For users
Sample Z For users
Sample Z For users
Barrier
Write counts Write counts Write counts Write counts
Collect counts and sample Ω
Do nothing Do nothing Do nothing
Barrier
Read Ω from Read Ω from Read Ω from Read Ω from
![Page 71: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/71.jpg)
4.2 Online Distributed Inference
Behavioral Targe>ng
![Page 72: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/72.jpg)
• Tasks is predic>ng convergence in display adver>sing • Use two datasets
– 6 weeks of user history – Last week responses to Ads are used for tes>ng
• Baseline: – User raw data as features – Sta>c topic model
Experimental Results
![Page 73: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/73.jpg)
Interpretability User-‐1 User-‐2
![Page 74: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/74.jpg)
Performance in Display Adver>sing
ROC
Number of conversions
![Page 75: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/75.jpg)
Performance in Display Adver>sing
Weighted ROC measure
Sta>c Batch models Effect of number of topics
![Page 76: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/76.jpg)
How Does It Scale?
2 Billion instances with 5M vocabulary using 1000 machines
one itera>on took ~ 3.8 minutes
![Page 77: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/77.jpg)
Distributed Inference Revisited
![Page 78: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/78.jpg)
To collapse or not to collapse?
• Not collapsing – Keeps condi>onal independence
• Good for paralleliza>on • Requires synchronous sampling
– Might mix slowly
• Collapsing – Mixes faster – Hinder parallelism – Use star-‐synchroniza>on
• Works well if sibling depends on each others via aggregates
• Requires asynchronous communica>on
![Page 79: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/79.jpg)
Inference Primi>ve
• Collapse a variable – Star synchroniza>on for the sufficient sta>s>cs
• Sampling a variable – Local
• Sample it locally (possibly using the synchronized sta>s>cs)
– Shared • Synchronous sampling using a barrier
• Op>mizing a variable – Same as in the shared variable case – Ex. Condi>onal topic models
![Page 80: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/80.jpg)
Online Models
• Batch Large-‐Scale – Covered in part 1
• Mini-‐batches – We already have a model
– Data arrives in batches – We would like to keep model up-‐to-‐data
• Time-‐sensi>ve – Data arrives one item at a >me
– Model should be up-‐to-‐data
Time
Time
![Page 81: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/81.jpg)
What Is Coming?
• Inference – Online Distributed Sampling
– Single machine mul>-‐threaded inference – Online EM and Submodular Selec>on
• Applica>ons – User tracking for behavioral Targe>ng – Content understanding – User modeling for content recommenda>on
![Page 82: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/82.jpg)
4.2 Scalable SMC Inference
Storylines
![Page 83: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/83.jpg)
News Stream
![Page 84: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/84.jpg)
News Stream
• Real>me news stream – Mul>ple sources (Reuters, AP, CNN, ...) – Same story from mul>ple sources – Stories are related
• Goals – Aggregate ar>cles into a storyline – Analyze the storyline (topics, en>>es)
• How does the story develop over >me? • Who are the main en>>es? • What topics are addressed?
![Page 85: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/85.jpg)
A Unified Model
• Jointly solves the three main tasks – Clustering, – Classifica>on – Analysis
• Building blocks – A Topic model
• High-‐level concepts (unsupervised classifica>on) – Dynamic clustering (RCRP)
• Discover >ghtly-‐focused concepts – Named en>>es – Story developments
![Page 86: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/86.jpg)
Infinite Dynamic Cluster-‐Topic Hybrid Sports
games Won Team Final Season League held
Poli6cs
Government Minister
Authori>es Opposi>on Officials Leaders group
Unrest
Police Aoack run man group arrested move
UEFA-soccer
Champions Goal Coach Striker Midfield penalty
Juventus AC Milan Lazio Ronaldo Lyon
γ
Tax-Bill
Tax Billion Cut Plan Budget Economy
Bush Senate Fleischer White House Republican
![Page 87: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/87.jpg)
Infinite Dynamic Cluster-‐Topic Hybrid Sports
games Won Team Final Season League held
Poli6cs
Government Minister
Authori>es Opposi>on Officials Leaders group
Unrest
Police Aoack run man group arrested move
UEFA-soccer
Champions Goal Coach Striker Midfield penalty
Juventus AC Milan Lazio Ronaldo Lyon
γ
Tax-Bill
Tax Billion Cut Plan Budget Economy
Bush Senate Fleischer White House Republican
Border-Tension
Nuclear Border Dialogue Diploma6c militant Insurgency missile
Pakistan India Kashmir New Delhi Islamabad Musharraf Vajpayee
![Page 88: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/88.jpg)
The Graphical Model
• Topic model
• Topics per cluster
• RCRP for cluster
• Hierarchical DP for ar>cle
• Separate model for named en>>es
• Story specific correc>on
![Page 89: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/89.jpg)
4.2 Fast SMC Inference
Inference via SMC
![Page 90: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/90.jpg)
Online Inference Algorithm
• A Par>cle filtering algorithm • Each par>cle maintains a hypothesis
– What are the stories
– Document-‐story associa>ons – Topic-‐word distribu>ons
• Collapsed sampling – Sample (zd,sd) only for each document
![Page 91: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/91.jpg)
Par>cle Filter Representa>on
![Page 92: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/92.jpg)
Fold the document into the structure of each filter
-‐ s and z are >ghtly coupled -‐ Alterna>ves
-‐ Sample s then sample z (high variance)
w w w w w w en>>es Document td
z z z z z z s
![Page 93: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/93.jpg)
Fold the document into the structure of each filter
-‐ s and z are >ghtly coupled -‐ Alterna>ves
-‐ Sample s then sample z (high variance) -‐ Sample z then sample s (doesn’t make sense)
w w w w w w en>>es Document td
z z z z z z s
![Page 94: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/94.jpg)
Fold the document into the structure of each filter
-‐ s and z are >ghtly coupled -‐ Alterna>ves
-‐ Sample s then sample z (high variance) -‐ Sample z then sample s (doesn’t make sense)
-‐ Idea -‐ Run a few itera>ons of MCMC over s and z -‐ Take last sample as the proposed value
![Page 95: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/95.jpg)
How good each filter look now?
![Page 96: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/96.jpg)
Get rid of bad filter Replicate good one
![Page 97: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/97.jpg)
Get rid of bad filter Replicate good one
![Page 98: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/98.jpg)
Efficient Computa>on and Storage • Par>cles get replicated
1
State P1
1 2
State P2 State P1
1 23State P2 State P3 State P1
![Page 99: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/99.jpg)
Efficient Computa>on and Storage • Par>cles get replicated
– Use thread-‐safe Inheritance tree
1
State P1
1 2
State P2 State P1
1 23State P2 State P3 State P1
![Page 100: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/100.jpg)
Efficient Computa>on and Storage • Par>cles get replicated
– Use thread-‐safe Inheritance tree [extends Canini et. Al 2009]
1
state 1
![Page 101: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/101.jpg)
Efficient Computa>on and Storage • Par>cles get replicated
– Use thread-‐safe Inheritance tree [extends Canini et. Al 2009]
Root
state
2
state
1
State
1
1 2
![Page 102: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/102.jpg)
Efficient Computa>on and Storage • Par>cles get replicated
– Use thread-‐safe Inheritance tree [extends Canini et. Al 2009]
Root
3
state
1
2
State State
state
State
1
1 2
1 23
![Page 103: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/103.jpg)
Efficient Computa>on and Storage • Par>cles get replicated
– Use thread-‐safe Inheritance tree [extends Canini et. Al 2009] – Inverted representa6on for fast lookup
Root
3
India: story 5 Pakistan: story 1
1
2
(empty) Congress: story 2
Bush: story 2 India: story 3
India: story 1, US: story 2, story 3
1
1 2
1 23
![Page 104: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/104.jpg)
Efficient Computa>on and Storage
• Why this is useful?
• Only focus on stories that men>on at least one en>ty – Otherwise pre-‐compute and reuse
• We can use fast samplers for z as well [Yao et. Al. KDD09]
![Page 105: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/105.jpg)
Experiments
• Yahoo! News datasets over two months – Three sub-‐sampled sets with different characteris>cs
• Editorially-‐labeled documents – Cannot-‐like and must-‐link pairs
• Performance measures using clustering accuracy • Baseline
– A strong offline Correla>on clustering algorithm [WSDM 11] • Scaled with LSH to compute neighborhood graph (similar to Petrovic 2010)
![Page 106: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/106.jpg)
Structured Browsing Sports
games Won Team Final Season League held
Poli6cs
Government Minister
Authori>es Opposi>on Officials Leaders group
Unrest
Police Aoach run man group arrested move
Border-Tension
Nuclear Border Dialogue Diploma>c militant Insurgency missile
Pakistan India Kashmir New Delhi Islamabad Musharraf Vajpayee
UEFA-soccer
Champions Goal Leg Coach Striker Midfield penalty
Juventus AC Milan Real Madrid Milan Lazio Ronaldo Lyon
Tax-bills
Tax Billion Cut Plan Budget Economy lawmakers
Bush Senate US Congress Fleischer White House Republican
![Page 107: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/107.jpg)
Structured Browsing
More Like India-‐Pakistan story
Middle-east-conflict
Peace Roadmap Suicide Violence Seolements bombing
Israel Pales>nian West bank Sharon Hamas Arafat
Based on topics
Nuclear programs
Nuclear summit warning policy missile program
North Korea South Korea U.S Bush Pyongyang
Nuclear+ topics [poli>cs]
Border-Tension
Nuclear Border Dialogue Diploma>c militant Insurgency missile
Pakistan India Kashmir New Delhi Islamabad Musharraf Vajpayee
![Page 108: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/108.jpg)
Structured Browsing Sports
games Won Team Final Season League held
Poli6cs
Government Minister
Authori>es Opposi>on Officials Leaders group
Unrest
Police Aoach run man group arrested move
Border-Tension
Nuclear Border Dialogue Diploma>c militant Insurgency missile
Pakistan India Kashmir New Delhi Islamabad Musharraf Vajpayee
UEFA-soccer
Champions Goal Leg Coach Striker Midfield penalty
Juventus AC Milan Real Madrid Milan Lazio Ronaldo Lyon
Tax-bills
Tax Billion Cut Plan Budget Economy lawmakers
Bush Senate US Congress Fleischer White House Republican
More on Personaliza>on later on the talk
![Page 109: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/109.jpg)
Quan>ta>ve Evalua>on
Number of topics = 100
Effect of number of topics
![Page 110: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/110.jpg)
Scalability
![Page 111: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/111.jpg)
Model Contribu>on
• Named en>>es are very important
• Removing >me increase processing up to 2 seconds per document
![Page 112: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/112.jpg)
Puing Things Together
![Page 113: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/113.jpg)
Time vs. Machines
• Data arrives dynamically
• How to keep models up to date?
Batch Mini-‐batches Truly online
Single-‐Machine Gibbs Varia>onal
Online-‐LDA SMC
Mul>-‐Machine Star-‐Synch. Star-‐Synch + Synchronous step
?
![Page 114: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/114.jpg)
4.3 User Preference
Online EM and Submodularity
![Page 115: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/115.jpg)
Storyline Summariza>on
• How to summarize a storyline with few ar>cles?
Earthquake & Tsunami Nuclear Power Rescue & Relief Economy
Time
![Page 116: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/116.jpg)
Storyline Summariza>on Earthquake & Tsunami Nuclear Power Rescue & Relief Economy
Time
• How to summarize a storyline with few ar>cles?
• How to personalize the summary?
![Page 117: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/117.jpg)
Storyline Summariza>on Earthquake & Tsunami Nuclear Power Rescue & Relief Economy
Time
• How to summarize a storyline with few ar>cles?
• How to personalize the summary?
![Page 118: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/118.jpg)
User Interac>on
• Passive – We observe the user generated contents
– Model user based on those content using unsupervised techniques
• Explicit – We present users with content – User give explicit feedback
• Like/dislike
– Learn user preference using supervised techniques • Implicit
– Mixture between the two
– Present the user with items
– Observe which items the user interact with – Learning user preference using semi-‐supervised models
![Page 119: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/119.jpg)
User Sa>sfac>on
• Modular – Present users with items she prefers
• Regardless of the context – Targets relevance – Ex: vector space models
• Submodular – More of the same thing is not always beoer
• Dimensioning return
– Targets diversity – Ex: TDN [ElArini et. Al. KDD 09]
![Page 120: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/120.jpg)
Sequen>al Click-‐View Model
p(vi = 1 | vi�1 = 1, ci�1 = 1) =
1
(1 + exp(�↵i))
p(vi = 1 | vi�1 = 1, ci�1 = 0) =
1
(1 + exp(��i))
Modeling Views based on posi>on
![Page 121: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/121.jpg)
Sequen>al Click-‐View Model
p(ci = 1 | vi = 1, c1,...,i�1) =1
(1 + exp(��i �Pi�1
j=1 cj � ⇢(Ai) + ⇢(Ai�1)))
Threshold Informa>on gain #clicks
Modeling clicks using posi>on and informa>on gain
Coverage func>on’s weights are learnt
![Page 122: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/122.jpg)
Sequen>al Click-‐View Model
⇢(D|S) :=X
s2S
X
j
[s]j⇣aj
X
d2D
[d]j + bj⇢j(D)⌘.
Selected Summary
Story
Features
Modular
Submodular
![Page 123: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/123.jpg)
Online Inference
• Treat missing views as hidden variables – Realis>c interac>on model
• Use the online EM algorithm – Infer the value of hidden variables
• Op>mize parameters using SGD – Use addi>ve weights
• Background + story + category + user
![Page 124: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/124.jpg)
Online Inference
⇤= argmin
X
(c,d)
� log p(c| , d) + �⌦( )
= 0 + u + s + c.
![Page 125: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/125.jpg)
How Does it Work?
![Page 126: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/126.jpg)
How Does It Work?
![Page 127: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/127.jpg)
5. Summary Future Direc>ons
![Page 128: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/128.jpg)
Summary • Tools
– Load distribu>on, balancing and synchroniza>on – Clustering, Topic Models
• Models – Dynamic non-‐parametric models – Sequen>al latent variable models
• Inference Algorithms – Distributed batch – Sequen>al Monte Carlo
• Applica>ons – User profiling – News content analysis & recommenda>on
![Page 129: Graphical Models for the Internet (Part2)](https://reader034.vdocuments.site/reader034/viewer/2022052619/5550b5aab4c90504628b4b9b/html5/thumbnails/129.jpg)
Future Direc>ons
• Theore>cal bounds and guarantees • Network data
– Graph par>>oning • Non-‐parametric models
– Learning structure from data
• Working under communica>on constraints
• Data distribu>on for par>cle filters