camcube: exploring workload-specific appliances · 2014-06-20 · microsoft research semantics –...

27
Andy Gordon, Lang.NEXT 2012 Joint work with Mihhail Aizatulin, Johannes Borgström, Michael Greenberg, and James Margetson http://research.microsoft.com/fun PROBABILISTIC PROGRAMMING

Upload: others

Post on 27-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

Andy Gordon, Lang.NEXT 2012 Joint work with Mihhail Aizatulin, Johannes Borgström, Michael Greenberg, and James Margetson http://research.microsoft.com/fun

PROBABILISTIC PROGRAMMING

Page 2: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

Machine Learning and Programming

“Data widely available; what is scarce is the ability to extract wisdom from them” , Hal Varian, 2010

“Machine learning!”, Mundie and Schmidt at Davos, 2011

Researchers use Bayesian statistics as unifying principle: Models are conditional probabilities; inference algorithms separate

For the programmer. what’s the problem? Cottage industry of inflexible libraries and algorithms

Implementations are 1000s LOC , mixture of model and algorithm

Probabilistic programming offers a solution Write your model as succinct, adaptable probabilistic program

Run compiler to get efficient inference code

2

Page 3: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

Judea Pearl, Turing Award Winner 2011

For fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning.

He identified uncertainty as a core problem faced by intelligent systems and developed an algorithmic interpretation of probability theory as an effective foundation for the representation and acquisition of knowledge.

3

Page 4: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

Probabilistic Graphical Models

Pioneered by Bayes Networks (Pearl 1988) Model of world, both observed and unobserved states

Probabilistic for uncertainty: missing data, noise, how data arises

Graphical notations capture dependence, for scalability

Pearl “invented message-passing algorithms that exploit graphical structure to perform probabilistic reasoning effectively”

Many application areas: “natural language processing, speech processing, computer vision, robotics, computational biology, and error-control coding”

In last few years, large-scale deployments include: TrueSkill – How do we rank Halo players?

AdPredictor – How likely is a user to click on this ad?

4

Page 5: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

Infer.NET (since 2006)

A .NET library for probabilistic inference

Multiple inference algorithms on graphs

Far fewer LOC than coding inference directly

Designed for large scale inference

User extensible

Supports rapid prototyping and deployment of Bayesian learning algorithms

Graphs represented by object model for pseudo code, but not as runnable code

Realization: language geeks can do machine learning, without comprehensive understanding of Bayesian stats, message-passing, etc

5

Page 6: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH 6

Page 7: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

Infer.NET Fun – New Feature

Bayesian inference by functional programming Write your model in F#

Run forwards to synthesize data

Run backwards to infer parameters

Benefits: Models are simply code in F#’s simple succinct syntax

Higher-level features than C# OM: tuples, records, array comprehensions,functions

Custom graphical notations (“plates”,”gates”) just code

Testing inference by running forwards then backwards

7

Page 8: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH 8

Distributions as Programs

Page 9: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

Murder Mystery in Fun

MustardScarlett

0

0.1

0.2

0.3

0.4

0.5

0.6

pipe gun

Page 10: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

Murder Mystery in Fun

MustardScarlett

0

0.1

0.2

0.3

0.4

0.5

0.6

pipe gun

MustardScarlett

0

0.2

0.4

0.6

0.8

1

pipe gun

Page 11: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

Semantics – Runs

Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r) r true

random (Bernoulli r) 1-r false

P 1 Q if there is deterministic step from P to Q

A run of our example: let s = B(0.3) let g = (if s then B(0.03) else B(0.8)) obs(g=true) s

0.7 let g = (if false then B(0.03) else B(0.8)) obs(g=true) false

1 let g = B(0.8) obs(g=true) false

0.8 obs(true=true) false

1 false

The probability of this run is 0.7 1 0.8 1 = 0.56

The value of this run is false

This run is valid, meaning each observation succeeds

11

Page 12: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

Semantics – Distributions

The semantics of P is a probability distribution: a function from each possible value v to Pr(Value=v | Valid), the conditional probability that valid runs return v

The conditional probability is: Pr(Value=v Valid) / Pr(Valid)

In our example, we have 4 runs, and the semantics is: Pr(Value=true | Valid) = (0.009) / (0.56 + 0.009) = 0.016

Pr(Value=false | Valid) = (0.56) / (0.56 + 0.009) = 0.984

12

not gun gun not Scarlett 0.14 0.56 Scarlett 0.291 0.009

Each row consists of runs for result (Scarlett) Valid here means gun=true

Page 13: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH 13

Programming in Infer.NET Fun

Page 14: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

Run backwards by transforms to intermediate forms via Infer.NET

Edit, run forwards with F# in Visual Studio to debug distributions and to test inference by sampling synthetic data

Probabilistic Programming Platform

14

-40

-30

-20

-10

0

10

0

-40

-30

-20

-10

0

10

0

-30

-20

-10

0

10

0

Page 15: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

Demo: Linear Regression

Linear regression: Forwards, compute yi = axi + b + noise from a and b Backwards, given yi infer a and b

15

-35

-30

-25

-20

-15

-10

-5

0

5

10

0 5 10 15 20 25

true a: -1.422354626 true b: 7.171306243 true prec: 0.1829893437

Page 16: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

TrueSkill in Fun

0

0.05

0.1

0 10 20

Alice

Bob

Cyd

Page 17: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

TrueSkill in Fun

0

0.05

0.1

0 10 20

Alice

Bob

Cyd

-0.05

0

0.05

0.1

0.15

0.2

0 10 20

Alice

Bob

Cyd

Page 18: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

TrueSkill

18

0

5

10

15

20

25

30

35

40

0 20 40 60 80 100

A ANDERSSEN

L PAULSEN

W STEINITZ

CAPT MACKENZIE

I KOLISCH

P MORPHY

S WINAWER

J BLACKBURNE

J ZUKERTORT

A BURN

J MASON

M CHIGORIN

I GUNSBERG

S TARRASCH

D JANOWSKI

R TEICHMAN

E LASKER

G MAROCZY

H PILLSBURY

R CHAROUSEK

C SCHLECHTER

F MARSHALL

Page 19: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

TrueSkill

19

0

5

10

15

20

25

30

35

40

0 20 40 60 80 100

A ANDERSSEN

L PAULSEN

W STEINITZ

CAPT MACKENZIE

I KOLISCH

P MORPHY

S WINAWER

J BLACKBURNE

J ZUKERTORT

A BURN

J MASON

M CHIGORIN

I GUNSBERG

S TARRASCH

D JANOWSKI

R TEICHMAN

E LASKER

G MAROCZY

H PILLSBURY

R CHAROUSEK

C SCHLECHTER

F MARSHALL

Page 20: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

Topic Modeling

Discover main themes in collection of documents Organise and present collection according to themes

Latent Dirichlet Allocation: Intuitively, each document exhibits several topics

A topic is a distribution over a vocabulary

20

Document Bag of words Cats cat, milk, meow, kitten Dogs dog, puppy, bark, bone Cats and Dogs cat, dog, kitten, random word

Page 21: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

Topic Modeling

Discover main themes in collection of documents Organise and present collection according to themes

Latent Dirichlet Allocation: Intuitively, each document exhibits several topics

A topic is a distribution over a vocabulary

21

Document Bag of words Cats cat, milk, meow, kitten Dogs dog, puppy, bark, bone Cats and Dogs cat, dog, kitten, random word

Page 22: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

Some Related Probabilistic Languages

BUGS (Spiegelhalter et al 1994, CU) – no conditionals

BLOG (Milch et al 2005, UCB/MIT) – Gibbs sampling

Alchemy (Domingos et al 2005, UW) – probabilistic logic programming

CHURCH (Goodman et al 2008, MIT) – recursive probabilistic functional programming

HANSEI (Kiselyov and Shan, 2009) – discrete distributions from OCaml

22

Page 23: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

Infer.NET Fun Samples

23

Basic Examples of Discrete Distributions

Coins: couple of Bernoulli distributions Murder mystery: another couple of Bernoulli distributions

Basic Examples of Continuous Distributions

LearningGaussian: Gaussian distribution MixtureOfGaussians: arrays of Gaussian distributions, illustrates symmetry breaking, and customising the default inference engine. TruncatedGaussian: RegisterArray and InferModule

Applications Linear Regression: fit line to data ClinicalTrial: answering whether drug effective BayesPointMachine: the BPM classifier ClickGraph inferring relevance of documents from clicks Conference: generative model of conference reviews; illustrates arrays of records; and zipping arrays TrueSkill: inference of players' skills, driven by chess data; illustrates arrays of tuples Latent Dirichlet Allocation: topic modeling for documents

Extensibility AddFun: extending the Infer.NET Fun API with new operators

Page 24: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

Call to Action: Share the Love

We’re after your feedback and contributions

If you like F#, you’ll enjoy learning about Bayesian reasoning by coding with Infer.NET Fun

Download our samples and code: http://research.microsoft.com/fun

Try running them

Try coding new models – we’ll cherish your help!

24

Page 25: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

The Challenge and Opportunity

25

We ran a brief analysis on the tools Kagglers used and wanted to share the results. The open source package R was a clear favorite, with 543 of the 1714 users listing their tools including it. Matlab came in second with 218 users. The graph shows the tools that at least 20 users listed in their profile. Pasted from <http://blog.kaggle.com/2011/11/27/kagglers-favorite-tools/>

Page 26: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

Summary

Probabilistic programming enables succinct, declarative expressions of Bayesian reasoning

Compiler takes care of efficient implementation

Many varieties – eg of skill or topic inference – obtained by tweaking program – advantage over fixed libraries

An opportunity for PL technology to democratize machine learning – Lang.NEXT may well be probabilistic!

http://research.microsoft.com/fun

26

Page 27: CamCube: Exploring Workload-specific appliances · 2014-06-20 · MICROSOFT RESEARCH Semantics – Runs Let P p Q mean that P evolves to Q with probability p. random (Bernoulli r)

MICROSOFT RESEARCH

Resources

Infer.NET http://research.microsoft.com/infernet

Infer.NET Fun http://research.microsoft.com/fun

Link collection http://probabilistic-programming.org

ACM A.M. Turing Award Winner 2011, Judea Pearl http://amturing.acm.org/award_winners/pearl_2658896.cfm

Daphne Koller online course on Probabilistic Graphical Models http://www.pgm-class.org/

TrueSkill http://research.microsoft.com/en-us/projects/trueskill/

27