thore graepel online services and advertising group microsoft research cambridge
TRANSCRIPT
Analysing and Modelling Large-Scale Analysing and Modelling Large-Scale Enterprise DataEnterprise Data
Thore GraepelThore GraepelOnline Services and Advertising GroupOnline Services and Advertising Group
Microsoft Research CambridgeMicrosoft Research Cambridge
OverviewOverview
• Complex large-scale data in the enterpriseComplex large-scale data in the enterprise– What kind of data is available?What kind of data is available?– What technologies are used?What technologies are used?– Tasks and enterprise-specific challenges?Tasks and enterprise-specific challenges?
• Methodology: Methodology: – Bayesian Inference in Factor Graph ModelsBayesian Inference in Factor Graph Models– PQL: Using SQL to describe probability modelsPQL: Using SQL to describe probability models
• Applications:Applications:– Gamer Rating and Matchmaking: TrueSkillGamer Rating and Matchmaking: TrueSkill– Click-Through Rate Prediction: AdPredictorClick-Through Rate Prediction: AdPredictor– Large-Scale Recommendations: MatchboxLarge-Scale Recommendations: Matchbox
Complex DataComplex Data
Joint work with Tom Minka & Phillip TrelfordJoint work with Tom Minka & Phillip Trelford
Data Sources at Microsoft (External)Data Sources at Microsoft (External)
• Online Services DivisionOnline Services Division– Web indexWeb index– Search and Ad click logs (12-15 TB / day)Search and Ad click logs (12-15 TB / day)– Hotmail, Instant messaging, Internet Explorer (100s Hotmail, Instant messaging, Internet Explorer (100s
million users)million users)– MSN portal and Bing maps MSN portal and Bing maps
• Xbox Live Gaming ServiceXbox Live Gaming Service– User transaction log dataUser transaction log data– Ranking and matchmaking dataRanking and matchmaking data– Game instrumentation for user testingGame instrumentation for user testing
Data Sources at Microsoft (Internal)Data Sources at Microsoft (Internal)
• Development and Software InstrumentationDevelopment and Software Instrumentation– Watson (customer feedback data)Watson (customer feedback data)– Source depot (MS source code, e.g., Office, Source depot (MS source code, e.g., Office,
Windows)Windows)– Multilingual technical documentationMultilingual technical documentation
• BusinessBusiness– Customer databasesCustomer databases– Sales and MarketingSales and Marketing
Data-Intensive Tasks at MicrosoftData-Intensive Tasks at Microsoft
• Prediction of user behaviour and preferencesPrediction of user behaviour and preferences– Improve web searchImprove web search– Improve targeting for advertisingImprove targeting for advertising– Spam filtering and content prioritisationSpam filtering and content prioritisation
• Improve user experienceImprove user experience– Matchmaking for gamesMatchmaking for games– Multi-modal user interfaces (Natal, speech)Multi-modal user interfaces (Natal, speech)
• Improve software development processImprove software development process– Improve productivity of developersImprove productivity of developers– Analyse software for defectsAnalyse software for defects
Technical InfrastructureTechnical Infrastructure
• Relational Databases/SQLRelational Databases/SQL– Great agility for analysis and reliability for businessGreat agility for analysis and reliability for business– Limited scalabilityLimited scalability– Need to import data into SQLNeed to import data into SQL
• Windows HPCWindows HPC– Complex computations / fine grained parallelismComplex computations / fine grained parallelism– Need to move data to HPC clusterNeed to move data to HPC cluster
• CosmosCosmos– Take the computation to the dataTake the computation to the data– Super efficient stream based computationsSuper efficient stream based computations
Cosmos
Cosmos ArchitectureCosmos Architecture
Stream Stream Stream Stream
Cluster Machine
Cluster Machine
Cluster Machine
Cluster Machine
Dryad
SCOPE DryadLINQ Sputnik
Enterprise/Online specific challengesEnterprise/Online specific challenges
• PrivacyPrivacy– Privacy limit the ways in which data can be usedPrivacy limit the ways in which data can be used– Interesting trade-offs (differential privacy)Interesting trade-offs (differential privacy)
• IncentivesIncentives– Data produced by self-interested agentsData produced by self-interested agents– Need to design incentive compatible mechanismsNeed to design incentive compatible mechanisms
• Exploration/ExploitationExploration/Exploitation– Results of inference feed back into business process Results of inference feed back into business process
and determine future observations.and determine future observations.– Need to aim at long-term benefitsNeed to aim at long-term benefits
Factor GraphsFactor Graphs
Factor Graphs / TreesFactor Graphs / Trees
• Definition: Definition: Graphical representation of product Graphical representation of product structure of a function (Wiberg, 1996)structure of a function (Wiberg, 1996)– Nodes: = Factors = VariablesNodes: = Factors = Variables– Edges: Dependencies of factors on variables.Edges: Dependencies of factors on variables.
• Question:Question:– What are the marginals of the function (all but one What are the marginals of the function (all but one
variable are summed out)?variable are summed out)?– What is the mode of the function?What is the mode of the function?
ss ss22ss11
Factor Graphs and Bayesian InferenceFactor Graphs and Bayesian Inference
• Bayes’ lawBayes’ law
• Factorising priorFactorising prior
• Factorising likelihoodFactorising likelihood
• Sum out latent variablesSum out latent variables
tt11tt11 tt22
tt22
dddd
yyyy
Factor Trees: SeparationFactor Trees: Separation
vv ww xx
f1(v,w)f1(v,w) f2(w,x)f2(w,x)
Observation: Observation: Sum of products becomes product of sums of all Sum of products becomes product of sums of all messages from neighbouring factors to variable!messages from neighbouring factors to variable!
Observation: Observation: Sum of products becomes product of sums of all Sum of products becomes product of sums of all messages from neighbouring factors to variable!messages from neighbouring factors to variable!
yy
f3(x,y)f3(x,y)
zz
f4(x,z)f4(x,z)
Messages: From Factors To VariablesMessages: From Factors To Variables
ww xx
f2(w,x)f2(w,x)
Observation: Observation: Factors only need to sum out all their Factors only need to sum out all their local variables!local variables!
Observation: Observation: Factors only need to sum out all their Factors only need to sum out all their local variables!local variables!
yy
f3(x,y)f3(x,y)
zz
f4(x,z)f4(x,z)
Messages: From Variables To FactorsMessages: From Variables To Factors
xx
f2(w,x)f2(w,x)
Observation: Observation: Variables pass on the product of all Variables pass on the product of all incoming messages!incoming messages!
Observation: Observation: Variables pass on the product of all Variables pass on the product of all incoming messages!incoming messages!
yy
f3(x,y)f3(x,y)
zz
f4(x,z)f4(x,z)
The Sum-Product AlgorithmThe Sum-Product Algorithm
• Three update equations (Aji & McEliece, 1997)Three update equations (Aji & McEliece, 1997)
• Update equations can be directly derived from the Update equations can be directly derived from the distributive law.distributive law.
• Efficient for messages in the exponential family.Efficient for messages in the exponential family.• Calculate all marginals at the same time.Calculate all marginals at the same time.
Approximate Message PassingApproximate Message Passing
• Problem: Problem: The exact messages from factors to The exact messages from factors to variables may not be closed under products.variables may not be closed under products.
• Solution: Solution: Approximate the marginal as well as Approximate the marginal as well as possible in the sense of minimal KL divergence.possible in the sense of minimal KL divergence.
• Expectation Propagation (Minka, 2001):Expectation Propagation (Minka, 2001): Approximate Approximate the marginal by moment-matching resulting inthe marginal by moment-matching resulting in
Distributed Message PassingDistributed Message Passing
• Map-Reduce for IID data– Map: Nodes compute messages
mfis from data yi and mfis
– Reduce: Combine messages mfis into ps by multiplication
• Caveats:– All approximate data factors
need the incoming message msfi!
– All messages m fi s need to be stored if the same data point is considered multiple times
s
y1 y2 y3
PQLPQL
Joint work with Ralf Herbrich & Jurgen Van GaelJoint work with Ralf Herbrich & Jurgen Van Gael
PQL as a PlatformPQL as a Platform
PQL I – Augmenting SchemasPQL I – Augmenting Schemas
People = AUGMENT DB.People ADD weight FLOAT
PQL II – Factor TypesPQL II – Factor Types
People
PQL III – Single Relation FactorsPQL III – Single Relation Factors
FACTOR Normal(p.weight,75.0,25.0) FROM People p
People
DrVisit
People
PQL IV – Cross Relation FactorsPQL IV – Cross Relation Factors
FACTOR Normal(g.weight, p.weight, 1.0)FROM People p, DrVisit gWHERE p.PersonID = g.PersonID
DrVisit
People
PQL as a Unifying PlatformPQL as a Unifying Platform
TrueSkill™TrueSkill™Joint work with Tom Minka & Phillip TrelfordJoint work with Tom Minka & Phillip Trelford
• GivenGiven::– Match outcomes: Orderings among Match outcomes: Orderings among k k teams consisting teams consisting
of of nn11, , n n2 2 , ..., , ..., nnkk players, respectively players, respectively
• Questions:Questions:– Skill Skill ssii for each player such that for each player such that
– Global ranking among all playersGlobal ranking among all players– Fair matches between teams of playersFair matches between teams of players
TrueSkill™TrueSkill™
Efficient Approximate InferenceEfficient Approximate Inference
y1
2
y2
3
s1
s2
s3
s4
t1 t2 t3
Gaussian Prior Factors
Ranking Likelihood Factors
Fast and efficient approximate message passing Fast and efficient approximate message passing using Expectation Propagationusing Expectation Propagation
TrueSkill: Superfast convergence to True Skills
0 100 200 300 4000
5
10
15
20
25
30
35
40
Leve
l
char (Halo 2 Beta)
SQLwildman (Halo 2 Beta)
char (TrueSkill™)
SQLwildman (TrueSkill™)
Games played
• LeaderboardLeaderboard– Global ranking of all playersGlobal ranking of all players
• MatchmakingMatchmaking– For gamers: Most uncertain outcomeFor gamers: Most uncertain outcome– For inference: Most informativeFor inference: Most informative– Both are equivalent! Both are equivalent!
Applications to Online GamingApplications to Online Gaming
Trueskill in Xbox 360 and Halo 3Trueskill in Xbox 360 and Halo 3
AdPredictorAdPredictorJoint work with Joaquin Quiñonero Candela, Onno Zoeter, Tom Borchert , Phillip TrelfordJoint work with Joaquin Quiñonero Candela, Onno Zoeter, Tom Borchert , Phillip Trelford
• Advantages of improved probability estimates:Advantages of improved probability estimates:– Increase user satisfaction by better targetingIncrease user satisfaction by better targeting– Fairer charges to advertisersFairer charges to advertisers– Increase revenue by showing ads with high click-thru rateIncrease revenue by showing ads with high click-thru rate
• Display (according to expected revenue)–
• Charge (per click)–
Why Predict Probability-of-Click?Why Predict Probability-of-Click?
$1.00
$2.00
$0.10
* 10%
* 4%
* 50%
=$0.10
=$0.08
=$0.05
$0.80
$1.25
$0.05
adPredictor DetailsadPredictor Details
102.34.12.201
15.70.165.9
221.98.2.187
92.154.3.86
Client IP
Exact Match
Broad Match
MatchType
Position
ML-1
SB-1
SB-2
P(pClick)++
Training Algorithm in ActionTraining Algorithm in Action
w1
w2
s
c
+
Client IP: Mean & VarianceClient IP: Mean & Variance
UserAgent: Mean Posterior EffectsUserAgent: Mean Posterior Effects
AdPredictor in Bing Search EngineAdPredictor in Bing Search Engine
• AdPredictor is now running 100% Paid Search AdPredictor is now running 100% Paid Search traffic in Microsoft’s Bing Search Enginetraffic in Microsoft’s Bing Search Engine
• Relevance and Click-Through Rate of Ads Relevance and Click-Through Rate of Ads improvedimproved
• Calibrated CTR prediction provides solid Calibrated CTR prediction provides solid foundation for further improvementsfoundation for further improvements
• AdPredictor explored for other tasks such as AdPredictor explored for other tasks such as contextual and display advertisingcontextual and display advertising
MatchboxMatchboxJoint work with David Stern and Ralf HerbrichJoint work with David Stern and Ralf Herbrich
Collaborative FilteringCollaborative Filtering
11 22 33 44 55 66
AA
BB
CC
DD
Use
rsU
sers
ItemsItems
?? ?? ??
Metadata?
Map Sparse Features To ‘Trait’ SpaceMap Sparse Features To ‘Trait’ Space
234566
456457
13456
654777
User ID
Male
FemaleGender
CountryUK
USA
1.2mHeight
34
345
64
5474
Item ID
Horror
Movie Genre
Drama
Documentary
Comedy
Message Passing For MatchboxMessage Passing For Matchbox
rr
**ss11++
uu1111 uu2121
ss22++
uu1212 uu2222
tt11 ++
vv1111 vv2121
tt22 ++
vv1212 vv2222
uu0101
uu0202
**
++
Message update functions powered by Infer.net
User/Item Taste SpaceUser/Item Taste Space
‘Preference Cone’ for user 145035
ApplicationsApplications
ConclusionsConclusions
ConclusionsConclusions
• Great variety of data sources and tasksGreat variety of data sources and tasks• Challenges: privacy, incentives, explorationChallenges: privacy, incentives, exploration• Tools: SQL, No-SQL , HPCTools: SQL, No-SQL , HPC• Modelling platform (Factor Graphs & PQL):Modelling platform (Factor Graphs & PQL):
– Represent uncertaintyRepresent uncertainty– Composable modelsComposable models– Distributed, data-centric computationDistributed, data-centric computation
• Applications: TrueSkill, AdPredictor, MatchboxApplications: TrueSkill, AdPredictor, Matchbox• Thanks!Thanks!