statistic models for web/sponsored search click log analysis the chinese university of hong kong 1...
TRANSCRIPT
![Page 1: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/1.jpg)
Statistic Models for Web/Sponsored Search Click Log Analysis
The Chinese University of Hong Kong
1
Some slides are revised from Mr Guo Fan’s tutorial at CIKM 2009.
![Page 2: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/2.jpg)
Index
• Background.• A Simple Click Model.– Dependent click model [WSDM09].
• Advanced Design.– Five extension directions.
• Advanced Estimation.– Bayesian framework and the rationale.– Bayesian browsing model (BBM) [Liu09].– Click chain model (CCM) [Guo09].
• Course Project.
2
![Page 3: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/3.jpg)
Scenario: Web Search
3
![Page 4: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/4.jpg)
User Click Log
4
36
23
1811
36
1
2
3
4
5
![Page 5: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/5.jpg)
Eye-tracking User Study
• Users have bias to examine the top results.
5
![Page 6: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/6.jpg)
Position-bias Identification
6
• Higher positions receive more user attention (eye fixation) and clicks than lower positions.
• This is true even in the extreme setting where the order of positions is reversed.
• “Clicks are informative but biased”.
Normal Position
Perc
enta
ge
Reversed Impression
Perc
enta
ge
[Joachims07]
![Page 7: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/7.jpg)
Answer to Previous Example
• Result 5 is more relevant compared with Result 1. • Because Result 5 has less opportunity to be examined.
7
36
23
1811
36
1
2
3
4
5
![Page 8: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/8.jpg)
Click Model Motivation
• Modeling the user’s click behavior in an interpreted manner and estimate the pure relevance of a query-document/ad pair regardless of bias. – Position-bias is the main problem.– Other kinds of bias.
• Influence among documents/ads• Attractiveness bias• Search intent bias• …
• Pure relevance of a query-document/ad pair intuition.– When the query is submitted to the search engine and only one single
document/ad is shown, what is the click-through rate of this query-document/ad pair?
8
![Page 9: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/9.jpg)
Examination Hypothesis [Richardson07]• A document must be examined before a click.• The probability of click conditioned on being examined
depends on the pure relevance of the query-document/ad pair.
• The click probability could be decomposed.– Global component.
• the examination probability which reflects the position-bias.
– Local component (pure relevance).• click probability of the (query, URL) pair conditioned on being examined.
9
![Page 10: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/10.jpg)
Click Models
• Key tasks.– How to design the user examination behavior? – How to estimate the relevance of a query-doc/ad pair?
• Desired Properties.– Effective: aware of the position-bias/other-bias and address it
properly.– Scalable: linear complexity for both time and space, easy to parallel.– Incremental: flexible for model update based on new data.
10
From this slide, “relevance” is equal to “pure relevance”.
![Page 11: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/11.jpg)
Importance of Understanding Logs
• Better matching query and documents/ads.• All the participants would benefit.
– Users: better relevance.– Search engines: more revenue from advertisers and more users.– Advertisers: more return on investment (ROI).
11
Advertiser
User PublisherBetterMatch
![Page 12: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/12.jpg)
Growth of Web Users
12
![Page 13: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/13.jpg)
Growth of Web Revenue
13
![Page 14: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/14.jpg)
Index
• Background.• A Simple Click Model.– Dependent Click Model [WSDM09].
• Advanced Design.• Advanced Estimation.• Projects.
14
![Page 15: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/15.jpg)
Notations
– Ei• binary r.v. for Examination Event on position i;
– Ci• binary r.v. for Click Event on position i;
– ri = p(Ci = 1| Ei = 1)• relevance for the query-document pair on position i.
15
![Page 16: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/16.jpg)
Click Model Design
16
1( 1) 1
( 1| 0) 0
( 1| 1)i i
i i i
p E
p C E
p C E r
1
1
1
( 1| 0) 0
( 1| 0, 1) 1
( 1| 0, 1)
i i
i i i
i i i i
p E E
p E C E
p E C E
Dependent Click Model (DCM) [GUO09]
![Page 17: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/17.jpg)
Parameters in DCM
• r=p(C=1|E=1) is local parameter.– Modeling the relevance of a query-document/ad pair.
– The position-bias has been modeled by p(E=1).
• λ is global parameter.– Modeling p(Ei+1=1|Ci=1,Ei=1).
17
Parameters estimationMaximum log-likelihood method
![Page 18: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/18.jpg)
Estimation of r: Step 1
• Define as last click position.• When there is no click, is the last position.
18
l
l
Query cikmPos URL Click
1 cikm2008.org 02 www.cikm.org 13 www.fc.ul.pt/cikm 04 cikmconf.org 05 www.cikm.com/... 16 Ir.iit.edu/cikm2004 0
Query cikmPos URL Click
1 cikm2008.org 02 www.cikm.org 03 www.fc.ul.pt/cikm 04 cikmconf.org 05 www.cikm.com/... 06 Ir.iit.edu/cikm2004 0
![Page 19: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/19.jpg)
Estimation of r: Step 2
• Log-likelihood of a query session.
19
1
1
1
1
1
1
( (log log ) (1 ) log(1 ))
log (1 ) log(1 )
log(1 (1 ))
( log (1 ) log(1 ))
log(1 )
l
DCM i i i i ii
l l l l
M
l l jj l
l
i i i ii
l
i i li
L C r C r
C r C r
r
C r C r
C
![Page 20: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/20.jpg)
Estimation of r: Step 3
• By maximizing the lower bound of the log-likelihood, we have
20
1
1 1
( log (1 ) log(1 )) log(1 )
01
#click
#impression before or on position
l l
DCM i i i i i i li i
AllDCM
L C r C r C
L M N
r r rM
rM N l
Suppose the current pair has occurred in different sessions. For M sessions, it occurs before/on l and has been clicked; for N sessions, it occurs before/on l and is not clicked.
![Page 21: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/21.jpg)
Estimation of λ
• For a specific , By maximizing the lower bound of the log-likelihood, we have
21
i
1
1 1
( log (1 ) log(1 )) log(1 )
01
#query sessions when last clicked position =1
#query sessions when position is clicked
l l
DCM i i i i i i li i
AllDCM
i i i
i
L C r C r C
L B C
B i
B C i
Suppose there are totally A sessions. In B sessions, the position l is large than position i and click event happens in position i. In C sessions, the position l is just equal to position i. Other cases happen in the other A-B-C sessions.
![Page 22: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/22.jpg)
Property Verification
• Effective.
• Scalable and Incremental.
22
#click
#impression before or on position r
l
#query sessions when last clicked position =1
#query sessions when position is clickedi
i
i
![Page 23: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/23.jpg)
Evaluation Criteria for DCM
• Log-likelihood.– Given the document impression in the test set.– Compute the chance to recover the entire click vector.– Averaged over different query sessions.
23
![Page 24: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/24.jpg)
Experimental Result for DCM
24
![Page 25: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/25.jpg)
Some Other Evaluations
• Log-likelihood.– http://en.wikipedia.org/wiki/Likelihood_function#Log-likelihood
• Perplexity.– http://en.wikipedia.org/wiki/Perplexity
• Root mean square error (RMSE).– http://en.wikipedia.org/wiki/Root-mean-square_deviation
• Area under ROC curve.– http://en.wikipedia.org/wiki/Receiver_operating_characteristic
25
![Page 26: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/26.jpg)
Index
• Background.• A Simple Click Model.• Advanced Design.– Five extension directions.
• Advanced Estimation.• Project.
26
![Page 27: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/27.jpg)
1 Dependency from Previous Docs/Ads
• For position 4 in the following two cases, do they have the same chance to be examined?
• Intuitively, the left one has less chance, since user may find the URL he/she wants in position 2 and stops the session.
27
Query cikmPos URL Click
1 cikm2008.org 02 www.cikm.org 13 www.fc.ul.pt/cikm 04 cikmconf.org 05 www.cikm.com/... 06 Ir.iit.edu/cikm2004 0
Query cikmPos URL Click
1 cikm2008.org 02 www.cikm.org 03 www.fc.ul.pt/cikm 04 cikmconf.org 15 www.cikm.com/... 06 Ir.iit.edu/cikm2004 0
![Page 28: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/28.jpg)
Solution: Click Chain Model [Guo09]
• The chance of being examined depend on the relevance of previous documents/ads.
• Other similar work includes [Dupret08][Liu09].
28
![Page 29: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/29.jpg)
2 Perceived v.s. Actual Relevance
• After clicking the docs/ads, the actual relevance, by judging from the landing page, might be different from user’s perceived relevance.
29
Pizza
Query
Ad1
Ad2
before examination
after examination
![Page 30: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/30.jpg)
Solution: Dynamic Bayesian Network [Chapelle09]• For each ad, two kinds of relevance are defined, perceived
relevance r and actual relevance s. s would influence the examination probability of the latter docs/ads.
30
![Page 31: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/31.jpg)
3 Aggregate v.s. Instance Relevance
• Users might have different intents for the same query.• The click event could indicate the intent.
31
Aggregate search. E.g., learn the parameters
Instance search. E.g., buy a camera
CanonQuery
Ad1
Ad2
Canon
Ad1
Ad2
Canon
Ad1
Ad2
![Page 32: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/32.jpg)
Solution: Joint Relevance Examination Model [Srikant10]• Add a correction factor , which is determined by the click
events of other docs/ads.• Other similar work includes [Hu11].
32
( )i
![Page 33: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/33.jpg)
4 Competing Influence in Docs/Ads
• When co-occurred with a high-relevant doc/ad, the perceived relevance of the current doc/ad would be decreased.
33
![Page 34: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/34.jpg)
Solution: Temporal Click Model [Xu10]
• The docs/ads are competed to win the priority to be examined.
34
![Page 35: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/35.jpg)
5 Incorporating Features
• Feature example: dwelling time.
35
![Page 36: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/36.jpg)
Solution: Post-Clicked Click Model [Zhong 10]• Incorporating features to determine the relevance. • Other similar work include [Zhu 10].
36
![Page 37: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/37.jpg)
Index
• Background.• A Simple Click Model.• Advanced Design.• Advanced Estimation.– Bayesian framework and the rationale.– Bayesian browsing model.– Click chain model.
• Project.
37
![Page 38: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/38.jpg)
Limitation of Maximum Log-likelihood
• Cannot fit the scalable and incremental properties.– It has difficulty in getting closed-form formula, when the model is
complex.– Even in DCM as shown in this page, we need to approximate a lower
bound for easy calculation. • No prior information could be utilized in such sparse data
environment.
38
Log-likelihood of DCM
1
1
1
( (log log ) (1 ) log(1 ))
log (1 ) log(1 )
log(1 (1 ))
l
DCM i i i i ii
l l l l
M
l l jj l
L C r C r
C r C r
r
1
1
1
( log (1 ) log(1 ))
log(1 )
l
i i i ii
l
i i li
C r C r
C
![Page 39: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/39.jpg)
An Coin-Toss Example for Bayesian Framework
• Scenario: to estimate the probability of tossing a head according to the following five training samples.
• The probability is a variable X = x.• Each training sample is denoted by Ci , e.g., C1 = 1, C4=0.
• According to Bayesian rule, we have
39
1:5 1:5
1:5
1:5 1:5
( | ) ( ) ( | ) ( )( | )
( ) ( | ) ( )x
p C X x p X x p C X x p X xp X x C
p C p C X x p X x dx
![Page 40: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/40.jpg)
Bayesian Estimation of Coin-tossing
40
X
C1 C2 C3 C4 C5
1:5 1:5
1:5
1:5 1:5
( | ) ( ) ( | ) ( )( | )
( ) ( | ) ( )x
p C X x p X x p C X x p X xp X x C
p C p C X x p X x dx
( ) 1p x
1:5
5 51
1 1
( | ) ( | ) (1 )i iC Ci
i i
p C X x p C X x x x
Bayesian rule:
Uniform prior:
Independent sampling :
Distribution : 51
1:51
( | ) ( ) (1 )i iC C
i
p X x C p x x x
Estimation:
1:5( | )E X C
![Page 41: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/41.jpg)
Density Function Update of Coin-tossing
41
Prior Posterior
Density Function(not normalized)
x1(1-x)0 x2(1-x)0 x3(1-x)0
x3(1-x)1 x4(1-x)1
![Page 42: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/42.jpg)
Click Data Scenario
42
a
b
c
d
a
c
e
a
b
a
c
b
a
f
g
query
1:5
1:5
1:5
( | ) ( )( | )
( | ) ( )x
p C X x p X xp X x C
p C X x p X x dx
Bayesian rule:
( ) 1p x Uniform prior:
1:5
5
1
( | ) ( | )ii
p C X x p C X x
Independent sampling :
Distribution : 5
1:51
( | ) ( ) ( | )ii
p X x C p x p C X x
![Page 43: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/43.jpg)
Factor Trick
• If the factors of p(C|X) are arbitrary, for each training sample, a unique factor of p(X) must be stored. Thus it is space consuming;
• However if the factors of p(C|X) are from a small discrete set, only the exponents are needed to be stored.
43
Distribution : 5
1:51
( | ) ( ) ( | )ii
p X x C p x p C X x
![Page 44: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/44.jpg)
Updating Example
44
Prior
Density Function(not normalized)
x1
(1-x)0
(1-0.6x)0
(1+0.3x)1
(1-0.5x)0
(1-0.2x)0
…
x1
(1-x)1
(1-0.6x)0
(1+0.3x)1
(1-0.5x)0
(1-0.2x)0
…
x2
(1-x)1
(1-0.6x)0
(1+0.3x)2
(1-0.5x)0
(1-0.2x)0
…
x3
(1-x)1
(1-0.6x)1
(1+0.3x)2
(1-0.5x)0
(1-0.2x)0
…
x3
(1-x)1
(1-0.6x)1
(1+0.3x)2
(1-0.5x)1
(1-0.2x)0
…
![Page 45: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/45.jpg)
How to realize the factor trick?
• Setting a global parameter for all cases.– Bayesian browsing model (BBM) [Liu09].
• Assuming all other docs/ads follows the same distribution and integrating them.– Click chain model (CCM) [Guo09].
45
In the following two example, we only concern the estimation of r using Bayesian framework. The estimation of other parameters are all based on maximizing the log-likelihood similarly as shown in DCM. Please refer the original paper for details.
![Page 46: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/46.jpg)
Index
• Background.• A Simple Click Model.• Advanced Design.• Advanced Estimation.– Bayesian framework and the rationale.– Bayesian browsing model.– Click chain model.
• Project.
46
![Page 47: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/47.jpg)
BBM Variable Definition
47
• For a specific query session, let– ri, the relevance variable at position i. – Ei, the binary examination variable at position i. – Ci, the binary click variable at position i. – ni, last click position before position i. – di, the distance between position i and its previous clicked
position.
![Page 48: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/48.jpg)
Small Discrete Set of Beta
• Suppose M = 3 for simplicity illustration. • There are only 6 values of beta.
48
n=0d=1
n=0d=2
n=0d=3
n=1d=1
n=1d=2
n=2d=1
![Page 49: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/49.jpg)
Estimation Algorithms
49
1 2,
0, 0,
( | ) ( ) (1 )N Nn d
r d r d M
p r C p r r r
How many times the Doc/ad was clicked
How many times the Doc/ad was not clicked with the probability of betan,d
5
1:51
51
1
( | ) ( ) ( | )
( ) ( ( 1) ) (1 ( 1) )i i
ii
C Ca a
i
p X x C p x p C X x
p x p E x p E x
![Page 50: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/50.jpg)
Toy Example Step 1
50
• Only top M=3 positions are shown, 3 query sessions and 4 distinct URLs.
41
4
3
1 3
31 2
Position 1 2 3
Query Session 3
Query Session 2
Query Session 1
![Page 51: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/51.jpg)
Toy Example Step 2
51
• Initialize M(M+1)/2+1 counts for each URL.
URL Clicks n=0d=1
n=0d=2
n=0d=3
n=1d=1
n=1d=2
n=2d=1
4 0 0 0 0 0 0 0
![Page 52: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/52.jpg)
Toy Example Step 3
52
• Update counts for URL 4.– If not impressed, do nothing;– If clicked, increment “clicks” by 1;– Otherwise, locate the right r and d to increment.
URL Clicks n=0d=1
n=0d=2
n=0d=3
n=1d=1
n=1d=2
n=2d=1
4 0 0 0 0 0 0 0
![Page 53: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/53.jpg)
Toy Example Step 4
53
• Update counts for URL 4.– If not impressed, do nothing;– If clicked, increment “clicks” by 1;– Otherwise, locate the right r and d to increment.
URL Clicks n=0d=1
n=0d=2
n=0d=3
n=1d=1
n=1d=2
n=2d=1
4 0 0 0 0 0 0 1
![Page 54: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/54.jpg)
Toy Example Step 5
54
• Update counts for URL 4.– If not impressed, do nothing;– If clicked, increment “clicks” by 1;– Otherwise, locate the right r and d to increment.
URL Clicks n=0d=1
n=0d=2
n=0d=3
n=1d=1
n=1d=2
n=2d=1
4 1 0 0 0 0 0 1
![Page 55: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/55.jpg)
Toy Example Step 6
55
• The posterior for URL 4.
• Interpretation: – The larger the probability of examination, the stronger the penalty for
a non-click.
URL Clicks n=0d=1
n=0d=2
n=0d=3
n=1d=1
n=1d=2
n=2d=1
4 1 0 0 0 0 0 1
![Page 56: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/56.jpg)
Algorithm Complexities
56
• Let
• Initializing and updating the counts:– Time: Space:
Linear to the size of the click log
Almost constant storage required
![Page 57: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/57.jpg)
Index
• Background.• A Simple Click Model.• Advanced Design.• Advanced Estimation.– Bayesian framework and the rationale.– Bayesian browsing model.– Click chain model.
• Project.
57
![Page 58: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/58.jpg)
User Behavior Description
58
Examine the Document
Click?
See Next Doc?
DoneNo
Yes
Yes
No
Yes
iR
1 iRSee Next
Doc?
DoneNo
2 31 i iR R
![Page 59: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/59.jpg)
Estimation Algorithms
• By assuming other docs/ads in a session follow the same distribution and integrate them, the factors f p(C|R) could be
described from a small discrete set.
59
1
| |N
nj j j
n
p R p R P C R
C
![Page 60: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/60.jpg)
Five Cases
• The current doc/ad may occur in five different cases. • For each case, there would be unique factors for p(C|Ri).
60
![Page 61: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/61.jpg)
Case 1
61
( | ) ( 0 | 1, ) 1i i i i i iP C R P C E R R
• The doc/ad must be examined. • Other R can seen as constants.
![Page 62: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/62.jpg)
Case 2
62
![Page 63: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/63.jpg)
Case 3
63
![Page 64: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/64.jpg)
All Cases
64
• By assuming other docs/ads in a session follows the same distribution and integrate them, the factors f p(C|R) could be
described from a small discrete set.
1
| |N
nj j j
n
p R p R P C R
C
![Page 65: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/65.jpg)
Index
• Background.• A Simple Click Model.• Advanced Design.• Advanced Estimation.• Project.
65
![Page 66: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/66.jpg)
Description• Fake dataset.• Format.
– queryId– ad1Id, click– ad2Id, click– ad3Id, click
• Evaluation Metric: ROC.• Baseline.
– Average (Avg).• Current competitive method.
– Simplified CCM (SCCM).• Task.
– Implement another advanced click model. – Compare the result with the Avg and SCCM.– Analyzing the reasons of improvement.
66
![Page 67: Statistic Models for Web/Sponsored Search Click Log Analysis The Chinese University of Hong Kong 1 Some slides are revised from Mr Guo Fan’s tutorial at](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649d1f5503460f949f3428/html5/thumbnails/67.jpg)
End
67