thore graepel online services and advertising group microsoft research cambridge

46
Analysing and Modelling Large- Analysing and Modelling Large- Scale Enterprise Data Scale Enterprise Data Thore Graepel Thore Graepel Online Services and Advertising Online Services and Advertising Group Group Microsoft Research Cambridge Microsoft Research Cambridge

Upload: kylie-perkins

Post on 28-Mar-2015

225 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Analysing and Modelling Large-Scale Analysing and Modelling Large-Scale Enterprise DataEnterprise Data

Thore GraepelThore GraepelOnline Services and Advertising GroupOnline Services and Advertising Group

Microsoft Research CambridgeMicrosoft Research Cambridge

Page 2: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

OverviewOverview

• Complex large-scale data in the enterpriseComplex large-scale data in the enterprise– What kind of data is available?What kind of data is available?– What technologies are used?What technologies are used?– Tasks and enterprise-specific challenges?Tasks and enterprise-specific challenges?

• Methodology: Methodology: – Bayesian Inference in Factor Graph ModelsBayesian Inference in Factor Graph Models– PQL: Using SQL to describe probability modelsPQL: Using SQL to describe probability models

• Applications:Applications:– Gamer Rating and Matchmaking: TrueSkillGamer Rating and Matchmaking: TrueSkill– Click-Through Rate Prediction: AdPredictorClick-Through Rate Prediction: AdPredictor– Large-Scale Recommendations: MatchboxLarge-Scale Recommendations: Matchbox

Page 3: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Complex DataComplex Data

Joint work with Tom Minka & Phillip TrelfordJoint work with Tom Minka & Phillip Trelford

Page 4: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Data Sources at Microsoft (External)Data Sources at Microsoft (External)

• Online Services DivisionOnline Services Division– Web indexWeb index– Search and Ad click logs (12-15 TB / day)Search and Ad click logs (12-15 TB / day)– Hotmail, Instant messaging, Internet Explorer (100s Hotmail, Instant messaging, Internet Explorer (100s

million users)million users)– MSN portal and Bing maps MSN portal and Bing maps

• Xbox Live Gaming ServiceXbox Live Gaming Service– User transaction log dataUser transaction log data– Ranking and matchmaking dataRanking and matchmaking data– Game instrumentation for user testingGame instrumentation for user testing

Page 5: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Data Sources at Microsoft (Internal)Data Sources at Microsoft (Internal)

• Development and Software InstrumentationDevelopment and Software Instrumentation– Watson (customer feedback data)Watson (customer feedback data)– Source depot (MS source code, e.g., Office, Source depot (MS source code, e.g., Office,

Windows)Windows)– Multilingual technical documentationMultilingual technical documentation

• BusinessBusiness– Customer databasesCustomer databases– Sales and MarketingSales and Marketing

Page 6: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Data-Intensive Tasks at MicrosoftData-Intensive Tasks at Microsoft

• Prediction of user behaviour and preferencesPrediction of user behaviour and preferences– Improve web searchImprove web search– Improve targeting for advertisingImprove targeting for advertising– Spam filtering and content prioritisationSpam filtering and content prioritisation

• Improve user experienceImprove user experience– Matchmaking for gamesMatchmaking for games– Multi-modal user interfaces (Natal, speech)Multi-modal user interfaces (Natal, speech)

• Improve software development processImprove software development process– Improve productivity of developersImprove productivity of developers– Analyse software for defectsAnalyse software for defects

Page 7: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Technical InfrastructureTechnical Infrastructure

• Relational Databases/SQLRelational Databases/SQL– Great agility for analysis and reliability for businessGreat agility for analysis and reliability for business– Limited scalabilityLimited scalability– Need to import data into SQLNeed to import data into SQL

• Windows HPCWindows HPC– Complex computations / fine grained parallelismComplex computations / fine grained parallelism– Need to move data to HPC clusterNeed to move data to HPC cluster

• CosmosCosmos– Take the computation to the dataTake the computation to the data– Super efficient stream based computationsSuper efficient stream based computations

Page 8: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Cosmos

Cosmos ArchitectureCosmos Architecture

Stream Stream Stream Stream

Cluster Machine

Cluster Machine

Cluster Machine

Cluster Machine

Dryad

SCOPE DryadLINQ Sputnik

Page 9: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Enterprise/Online specific challengesEnterprise/Online specific challenges

• PrivacyPrivacy– Privacy limit the ways in which data can be usedPrivacy limit the ways in which data can be used– Interesting trade-offs (differential privacy)Interesting trade-offs (differential privacy)

• IncentivesIncentives– Data produced by self-interested agentsData produced by self-interested agents– Need to design incentive compatible mechanismsNeed to design incentive compatible mechanisms

• Exploration/ExploitationExploration/Exploitation– Results of inference feed back into business process Results of inference feed back into business process

and determine future observations.and determine future observations.– Need to aim at long-term benefitsNeed to aim at long-term benefits

Page 10: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Factor GraphsFactor Graphs

Page 11: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Factor Graphs / TreesFactor Graphs / Trees

• Definition: Definition: Graphical representation of product Graphical representation of product structure of a function (Wiberg, 1996)structure of a function (Wiberg, 1996)– Nodes: = Factors = VariablesNodes: = Factors = Variables– Edges: Dependencies of factors on variables.Edges: Dependencies of factors on variables.

• Question:Question:– What are the marginals of the function (all but one What are the marginals of the function (all but one

variable are summed out)?variable are summed out)?– What is the mode of the function?What is the mode of the function?

Page 12: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

ss ss22ss11

Factor Graphs and Bayesian InferenceFactor Graphs and Bayesian Inference

• Bayes’ lawBayes’ law

• Factorising priorFactorising prior

• Factorising likelihoodFactorising likelihood

• Sum out latent variablesSum out latent variables

tt11tt11 tt22

tt22

dddd

yyyy

Page 13: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Factor Trees: SeparationFactor Trees: Separation

vv ww xx

f1(v,w)f1(v,w) f2(w,x)f2(w,x)

Observation: Observation: Sum of products becomes product of sums of all Sum of products becomes product of sums of all messages from neighbouring factors to variable!messages from neighbouring factors to variable!

Observation: Observation: Sum of products becomes product of sums of all Sum of products becomes product of sums of all messages from neighbouring factors to variable!messages from neighbouring factors to variable!

yy

f3(x,y)f3(x,y)

zz

f4(x,z)f4(x,z)

Page 14: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Messages: From Factors To VariablesMessages: From Factors To Variables

ww xx

f2(w,x)f2(w,x)

Observation: Observation: Factors only need to sum out all their Factors only need to sum out all their local variables!local variables!

Observation: Observation: Factors only need to sum out all their Factors only need to sum out all their local variables!local variables!

yy

f3(x,y)f3(x,y)

zz

f4(x,z)f4(x,z)

Page 15: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Messages: From Variables To FactorsMessages: From Variables To Factors

xx

f2(w,x)f2(w,x)

Observation: Observation: Variables pass on the product of all Variables pass on the product of all incoming messages!incoming messages!

Observation: Observation: Variables pass on the product of all Variables pass on the product of all incoming messages!incoming messages!

yy

f3(x,y)f3(x,y)

zz

f4(x,z)f4(x,z)

Page 16: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

The Sum-Product AlgorithmThe Sum-Product Algorithm

• Three update equations (Aji & McEliece, 1997)Three update equations (Aji & McEliece, 1997)

• Update equations can be directly derived from the Update equations can be directly derived from the distributive law.distributive law.

• Efficient for messages in the exponential family.Efficient for messages in the exponential family.• Calculate all marginals at the same time.Calculate all marginals at the same time.

Page 17: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Approximate Message PassingApproximate Message Passing

• Problem: Problem: The exact messages from factors to The exact messages from factors to variables may not be closed under products.variables may not be closed under products.

• Solution: Solution: Approximate the marginal as well as Approximate the marginal as well as possible in the sense of minimal KL divergence.possible in the sense of minimal KL divergence.

• Expectation Propagation (Minka, 2001):Expectation Propagation (Minka, 2001): Approximate Approximate the marginal by moment-matching resulting inthe marginal by moment-matching resulting in

Page 18: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Distributed Message PassingDistributed Message Passing

• Map-Reduce for IID data– Map: Nodes compute messages

mfis from data yi and mfis

– Reduce: Combine messages mfis into ps by multiplication

• Caveats:– All approximate data factors

need the incoming message msfi!

– All messages m fi s need to be stored if the same data point is considered multiple times

s

y1 y2 y3

Page 19: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

PQLPQL

Joint work with Ralf Herbrich & Jurgen Van GaelJoint work with Ralf Herbrich & Jurgen Van Gael

Page 20: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

PQL as a PlatformPQL as a Platform

Page 21: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

PQL I – Augmenting SchemasPQL I – Augmenting Schemas

People = AUGMENT DB.People ADD weight FLOAT

Page 22: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

PQL II – Factor TypesPQL II – Factor Types

Page 23: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

People

PQL III – Single Relation FactorsPQL III – Single Relation Factors

FACTOR Normal(p.weight,75.0,25.0) FROM People p

People

Page 24: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

DrVisit

People

PQL IV – Cross Relation FactorsPQL IV – Cross Relation Factors

FACTOR Normal(g.weight, p.weight, 1.0)FROM People p, DrVisit gWHERE p.PersonID = g.PersonID

DrVisit

People

Page 25: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

PQL as a Unifying PlatformPQL as a Unifying Platform

Page 26: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

TrueSkill™TrueSkill™Joint work with Tom Minka & Phillip TrelfordJoint work with Tom Minka & Phillip Trelford

Page 27: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

• GivenGiven::– Match outcomes: Orderings among Match outcomes: Orderings among k k teams consisting teams consisting

of of nn11, , n n2 2 , ..., , ..., nnkk players, respectively players, respectively

• Questions:Questions:– Skill Skill ssii for each player such that for each player such that

– Global ranking among all playersGlobal ranking among all players– Fair matches between teams of playersFair matches between teams of players

TrueSkill™TrueSkill™

Page 28: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Efficient Approximate InferenceEfficient Approximate Inference

y1

2

y2

3

s1

s2

s3

s4

t1 t2 t3

Gaussian Prior Factors

Ranking Likelihood Factors

Fast and efficient approximate message passing Fast and efficient approximate message passing using Expectation Propagationusing Expectation Propagation

Page 29: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

TrueSkill: Superfast convergence to True Skills

0 100 200 300 4000

5

10

15

20

25

30

35

40

Leve

l

char (Halo 2 Beta)

SQLwildman (Halo 2 Beta)

char (TrueSkill™)

SQLwildman (TrueSkill™)

Games played

Page 30: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

• LeaderboardLeaderboard– Global ranking of all playersGlobal ranking of all players

• MatchmakingMatchmaking– For gamers: Most uncertain outcomeFor gamers: Most uncertain outcome– For inference: Most informativeFor inference: Most informative– Both are equivalent! Both are equivalent!

Applications to Online GamingApplications to Online Gaming

Page 31: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Trueskill in Xbox 360 and Halo 3Trueskill in Xbox 360 and Halo 3

Page 32: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

AdPredictorAdPredictorJoint work with Joaquin Quiñonero Candela, Onno Zoeter, Tom Borchert , Phillip TrelfordJoint work with Joaquin Quiñonero Candela, Onno Zoeter, Tom Borchert , Phillip Trelford

Page 33: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

• Advantages of improved probability estimates:Advantages of improved probability estimates:– Increase user satisfaction by better targetingIncrease user satisfaction by better targeting– Fairer charges to advertisersFairer charges to advertisers– Increase revenue by showing ads with high click-thru rateIncrease revenue by showing ads with high click-thru rate

• Display (according to expected revenue)–

• Charge (per click)–

Why Predict Probability-of-Click?Why Predict Probability-of-Click?

$1.00

$2.00

$0.10

* 10%

* 4%

* 50%

=$0.10

=$0.08

=$0.05

$0.80

$1.25

$0.05

Page 34: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

adPredictor DetailsadPredictor Details

102.34.12.201

15.70.165.9

221.98.2.187

92.154.3.86

Client IP

Exact Match

Broad Match

MatchType

Position

ML-1

SB-1

SB-2

P(pClick)++

Page 35: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Training Algorithm in ActionTraining Algorithm in Action

w1

w2

s

c

+

Page 36: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Client IP: Mean & VarianceClient IP: Mean & Variance

Page 37: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

UserAgent: Mean Posterior EffectsUserAgent: Mean Posterior Effects

Page 38: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

AdPredictor in Bing Search EngineAdPredictor in Bing Search Engine

• AdPredictor is now running 100% Paid Search AdPredictor is now running 100% Paid Search traffic in Microsoft’s Bing Search Enginetraffic in Microsoft’s Bing Search Engine

• Relevance and Click-Through Rate of Ads Relevance and Click-Through Rate of Ads improvedimproved

• Calibrated CTR prediction provides solid Calibrated CTR prediction provides solid foundation for further improvementsfoundation for further improvements

• AdPredictor explored for other tasks such as AdPredictor explored for other tasks such as contextual and display advertisingcontextual and display advertising

Page 39: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

MatchboxMatchboxJoint work with David Stern and Ralf HerbrichJoint work with David Stern and Ralf Herbrich

Page 40: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Collaborative FilteringCollaborative Filtering

11 22 33 44 55 66

AA

BB

CC

DD

Use

rsU

sers

ItemsItems

?? ?? ??

Metadata?

Page 41: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Map Sparse Features To ‘Trait’ SpaceMap Sparse Features To ‘Trait’ Space

234566

456457

13456

654777

User ID

Male

FemaleGender

CountryUK

USA

1.2mHeight

34

345

64

5474

Item ID

Horror

Movie Genre

Drama

Documentary

Comedy

Page 42: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

Message Passing For MatchboxMessage Passing For Matchbox

rr

**ss11++

uu1111 uu2121

ss22++

uu1212 uu2222

tt11 ++

vv1111 vv2121

tt22 ++

vv1212 vv2222

uu0101

uu0202

**

++

Message update functions powered by Infer.net

Page 43: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

User/Item Taste SpaceUser/Item Taste Space

‘Preference Cone’ for user 145035

Page 44: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

ApplicationsApplications

Page 45: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

ConclusionsConclusions

Page 46: Thore Graepel Online Services and Advertising Group Microsoft Research Cambridge

ConclusionsConclusions

• Great variety of data sources and tasksGreat variety of data sources and tasks• Challenges: privacy, incentives, explorationChallenges: privacy, incentives, exploration• Tools: SQL, No-SQL , HPCTools: SQL, No-SQL , HPC• Modelling platform (Factor Graphs & PQL):Modelling platform (Factor Graphs & PQL):

– Represent uncertaintyRepresent uncertainty– Composable modelsComposable models– Distributed, data-centric computationDistributed, data-centric computation

• Applications: TrueSkill, AdPredictor, MatchboxApplications: TrueSkill, AdPredictor, Matchbox• Thanks!Thanks!