applied bayesian inference with pymc
TRANSCRIPT
Applied Bayesian Inference with PyMC
@MrSantoni
Which color will sell more?
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.
BUY
Page B
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.
BUY
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.
BUY
Page B
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.
BUY
#buy / N #buy / N
• What if N is small?• What is N to have 90% confidence?• What if N is different on A and B?
Bayesian Inference
Probability:
Claim: we think Bayesian
FrequentistBayesian
FrequenceBelief
test 1 test 2 test 3
Claim: we think Bayesian
no-bugs confidence
Bayesian Inference =
update your beliefs
new evidence
prior belief
The Developer View
Statistical Problem
def frequentist(): return 80%
def bayesian(): return0% 100%
How to?
0% 100%
How to?
𝑃 ( 𝐴|𝐵 )=𝑃 (𝐵|𝐴 )𝑃 (𝐴)
𝑃 (𝐵)
Closed-form solution:
Realistic Cases
Toy Examples
0% 100%
PyMC
PyMC
• Perform Bayesian Inference• Markov Chain Monte Carlo techniques• A.k.a. Probabilistic Programming
Show me the code!
Example A/B test
Only one difference between A and B
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.
BUY
Page B
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.
BUY
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.
BUY
Page B
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.
BUY
Assume there isp_aprobability of clicking BUY when landing on Ap_bprobability of clicking BUY when landing on B
How to compute p_a and p_b?
Page A– N_a visitors– C_a BUY-click on page A
Page B– N_b visitors– C_b BUY-click on page B
Frequentist:C_a / N_a
BUT:Observed frequency does not necessarily equal p_a
Bayesian:Infer true frequency from observed data
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.
BUY
Bayesian Worflow
1. Define prior2. Fit to observations3. Get posteriors
from pymc import Uniform, rbernoulli, Bernoulli, MCMCfrom matplotlib import pyplot as plt
p_A_true = 0.05N = 1500occurrences = rbernoulli(p_A_true, N)
print 'Click-BUY:'print occurrences.sum()print 'Observed frequency:'print occurrences.sum() / float(N)
Click-BUY:68Observed frequency:0.0453333333333
Clicking BUY
Bernoulli distribution
𝑃 (𝑐𝑙𝑖𝑐𝑘 )={ 𝑝1−𝑝
𝑐𝑙𝑖𝑐𝑘=1𝑐𝑙𝑖𝑐𝑘=0
click=1 click=00
0.10.20.30.40.50.60.70.8
𝑝
p_A = Uniform('p_A', lower=0, upper=1)0 1 P_a
print p_A.random()print p_A.value
array(0.906086144982998)array(0.906086144982998)
print p_A.random()print p_A.value
array(0.285313846133313)array(0.285313846133313)
p_A = Uniform('p_A', lower=0, upper=1)
obs = Bernoulli('obs', p_A, value=occurrences, observed=True)
p_A = Uniform('p_A', lower=0, upper=1)
[------- 20% ] 4053 of 20000 complete in 0.5 sec[------------- 36% ] 7315 of 20000 complete in 1.0 sec[-----------------53% ] 10627 of 20000 complete in 1.5 sec[-----------------69%------ ] 13939 of 20000 complete in 2.0 sec[-----------------81%----------- ] 16376 of 20000 complete in 2.5 sec[-----------------96%---------------- ] 19342 of 20000 complete in 3.0 sec[-----------------100%-----------------] 20000 of 20000 complete in 3.1 sec[ 0.04656576 0.04656576 0.04656576 ..., 0.03803667 0.03803667 0.03803667]
mcmc = MCMC([p_A, obs])mcmc.sample(20000, 1000)
print mcmc.trace('p_A')[:]
obs = Bernoulli('obs', p_A, value=occurrences, observed=True)
plt.figure(figsize=(8, 7))plt.hist(mcmc.trace('p_A')[:], bins=35, histtype='stepfilled', normed=True)plt.xlabel('Probability of clicking BUY')plt.ylabel('Density')plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A')plt.legend()plt.savefig('p_A_hist_N_%s.png' % N)plt.show()
Confidence 90% that P is between X and Y?
There is 90% probability that p_A is between 0.0373019596856 and 0.0548052806892
p_A_samples = mcmc.trace('p_A')[:]lower_bound = np.percentile(p_A_samples, 5)upper_bound = np.percentile(p_A_samples, 95)
print 'There is 90%% probability that p_A is between %s and %s' % (lower_bound, upper_bound)
What if N_a is lower?
from pymc import Uniform, rbernoulli, Bernoulli, MCMCfrom matplotlib import pyplot as plt
p_A_true = 0.05N = 50occurrences = rbernoulli(p_A_true, N)
print 'Click-BUY:'print occurrences.sum()print 'Observed frequency:'print occurrences.sum() / float(N)
Click-BUY:2Observed frequency:0.04
p_A = Uniform('p_A', lower=0, upper=1)
obs = Bernoulli('obs', p_A, value=occurrences, observed=True)
mcmc = MCMC([p_A, obs])mcmc.sample(20000, 1000)
print mcmc.trace('p_A')[:]
[----- 14% ] 2874 of 20000 complete in 0.5 sec[----------- 30% ] 6035 of 20000 complete in 1.0 sec[-----------------47% ] 9440 of 20000 complete in 1.5 sec[-----------------63%---- ] 12775 of 20000 complete in 2.0 sec[-----------------81%---------- ] 16203 of 20000 complete in 2.5 sec[-----------------100%-----------------] 20000 of 20000 complete in 3.0 sec[ 0.06240723 0.06240723 0.06240723 ..., 0.01864419 0.01864419 0.01864419]
plt.figure(figsize=(8, 7))plt.hist(mcmc.trace('p_A')[:], bins=35, histtype='stepfilled', normed=True)plt.xlabel('Probability of clicking BUY')plt.ylabel('Density')plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A')plt.legend()plt.savefig('p_A_hist_N_%s.png' % N)plt.show()
Confidence 90% that P is between X and Y?
There is 90% probability that p_A is between 0.0160966147705 and 0.114655284797
p_A_samples = mcmc.trace('p_A')[:]lower_bound = np.percentile(p_A_samples, 5)upper_bound = np.percentile(p_A_samples, 95)
print 'There is 90%% probability that p_A is between %s and %s' % (lower_bound, upper_bound)
N_a = 1500 N_a = 50
Does the red have a larger probability of being clicked?
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.
BUY
Page B
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne.
BUY
from pymc import Uniform, rbernoulli, Bernoulli, MCMC, deterministicfrom matplotlib import pyplot as plt
p_A_true = 0.05p_B_true = 0.04N_A = 1500N_B = 750
occurrences_A = rbernoulli(p_A_true, N_A)occurrences_B = rbernoulli(p_B_true, N_B)
print 'Observed frequency:'print 'A'print occurrences_A.sum() / float(N_A)print 'B'print occurrences_B.sum() / float(N_B)
Observed frequency:A0.0533333333333B0.0413333333333
p_A = Uniform('p_A', lower=0, upper=1)p_B = Uniform('p_B', lower=0, upper=1)
@deterministicdef delta(p_A=p_A, p_B=p_B):
return p_A - p_B
obs_A = Bernoulli('obs_A', p_A, value=occurrences_A, observed=True)obs_B = Bernoulli('obs_B', p_B, value=occurrences_B, observed=True)
mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta])mcmc.sample(25000, 5000)[----- 14% ] 3561 of 25000 complete in 0.5 sec[--------- 25% ] 6332 of 25000 complete in 1.0 sec[------------ 33% ] 8454 of 25000 complete in 1.5 sec[--------------- 41% ] 10499 of 25000 complete in 2.0 sec[-----------------50% ] 12602 of 25000 complete in 2.5 sec[-----------------59%-- ] 14780 of 25000 complete in 3.0 sec[-----------------67%----- ] 16883 of 25000 complete in 3.5 sec[-----------------75%-------- ] 18954 of 25000 complete in 4.0 sec[-----------------83%----------- ] 20877 of 25000 complete in 4.5 sec[-----------------91%-------------- ] 22924 of 25000 complete in 5.0 sec[-----------------100%-----------------] 25000 of 25000 complete in 5.5 sec
p_A_samples = mcmc.trace('p_A')[:]p_B_samples = mcmc.trace('p_B')[:]delta_samples = mcmc.trace('delta')[:]
plt.subplot(3,1,1)plt.xlim(0, 0.1)plt.hist(p_A_samples, bins=35, histtype='stepfilled', normed=True, color='blue', label='Posterior of p_A')plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A (unknown)')plt.xlabel('Probability of clicking BUY via A')plt.legend()plt.subplot(3,1,2)plt.xlim(0, 0.1)plt.hist(p_B_samples, bins=35, histtype='stepfilled', normed=True, color='green', label='Posterior of p_B')plt.vlines(p_B_true, 0, 90, linestyle='--', label='True p_B (unknown)')plt.xlabel('Probability of clicking BUY via B')plt.legend()plt.subplot(3,1,3)plt.xlim(0, 0.1)plt.hist(delta_samples, bins=35, histtype='stepfilled', normed=True, color='red', label='Posterior of delta')plt.vlines(p_A_true - p_B_true, 0, 90, linestyle='--', label='True delta (unknown)')plt.xlabel('p_A - p_B')plt.legend()plt.savefig('A_and_B.png')plt.show()
p_A > p_BHow much are we confident?
print 'Probability that p_A > p_B:'print (delta_samples > 0).mean()
Probability that p_A > p_B:0.8919
N_A = 1500N_B = 750
N_A = 1500N_B = 200
print 'Probability that p_A > p_B:'print (delta_samples > 0).mean()
Probability that p_A > p_B:0.73455
MCMC
mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta])mcmc.sample(25000, 5000)
Posterior P(p_A, p_B, delta | obs_A, obs_B) as samples
25000 iterations5000 burn-in
Metropolis-Hastings algorithm
Open the black box
mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta])mcmc.sample(25000, 5000)
from pymc.Matplot import plot as mcplot
mcplot(mcmc)
PyMC
• Easy to interpret results– confidence, no p-values!
• No crazy math• Computationally expensive
Thank you
Back
Serie A 13/14
Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HTR24/08/2013 Sampdoria Juventus 0 1 A 0 0 D24/08/2013 Verona Milan 2 1 H 1 1 D25/08/2013 Cagliari Atalanta 2 1 H 1 1 D25/08/2013 Inter Genoa 2 0 H 0 0 D25/08/2013 Lazio Udinese 2 1 H 2 0 H25/08/2013 Livorno Roma 0 2 A 0 0 D25/08/2013 Napoli Bologna 3 0 H 2 0 H25/08/2013 Parma Chievo 0 0 D 0 0 D25/08/2013 Torino Sassuolo 2 0 H 1 0 H26/08/2013 Fiorentina Catania 2 1 H 2 1 H31/08/2013 Chievo Napoli 2 4 A 2 2 D31/08/2013 Juventus Lazio 4 1 H 2 1 H01/09/2013 Atalanta Torino 2 0 H 0 0 D01/09/2013 Bologna Sampdoria 2 2 D 1 1 D01/09/2013 Catania Inter 0 3 A 0 1 A01/09/2013 Genoa Fiorentina 2 5 A 0 3 A01/09/2013 Milan Cagliari 3 1 H 2 1 H01/09/2013 Roma Verona 3 0 H 0 0 D01/09/2013 Sassuolo Livorno 1 4 A 0 1 A01/09/2013 Udinese Parma 3 1 H 1 0 H14/09/2013 Inter Juventus 1 1 D 0 0 D14/09/2013 Napoli Atalanta 2 0 H 0 0 D14/09/2013 Torino Milan 2 2 D 0 0 D15/09/2013 Fiorentina Cagliari 1 1 D 0 0 D
https://datahub.io/dataset/italian-football-data-serie-a-b
Win-rate
Did it change?
Bayesian Worflow
1. Define Prior2. Fit to observations3. Get Posteriors
Winning a Match
Bernoulli distribution
𝑃 (𝑤 )={ 𝑝1−𝑝
𝑤=1𝑤=0
Win (w=1) Lose (w=0)0
0.10.20.30.40.50.60.70.8
𝑝
𝑝 : switchpoint?
Model the switchpoint
𝑝={𝑝1𝑝2 𝑡<𝜏𝑡≥𝜏
Goal -> infer
Bayesian Worflow
1. Define Prior2. Fit to observations3. Get Posteriors
Let’s model this
• goal: infer unknown p1, p2, TAU• FIRST STEP OF Bayesian Inference: assign a prior
probability to different possible values of p• what would be a good prior for p1, p2? Use
uniform:– p1 ~ Uniform(0,1)– p2 ~ Uniform(0,1)– TAU ~ DiscreteUniform(1, 38)
• P(TAU=k)=1/38 for all k
from pymc import Uniform, DiscreteUniform, deterministic, Bernoulli, Model, MCMC
p_1 = Uniform('p_1', lower=0, upper=1)p_2 = Uniform('p_2', lower=0, upper=1)tau = DiscreteUniform('tau', lower=1, upper=38)
print 'Random output: ', tau.random(), tau.random(), tau.random()
Random output: 14 24 33
@deterministicdef p_(tau=tau, p_1=p_1, p_2=p_2, num_matches=38): # concatenate p_1 and p_2 based on tau out = np.empty(num_matches) out[:tau] = p_1 out[tau:] = p_2 return out
Load Data
import pandas as pd
df = pd.read_csv('serie_a.csv', parse_dates=['Date'], date_parser=parse_date)
matches = df[(df.HomeTeam == ‘Milan’) | (df.AwayTeam == ‘Milan’)]matches = matches.set_index(['Date'])matches = compute_extra_columns(matches, team)# some pandas manipulations occur herematches[‘Win’] = … # 1 if Milan won, 0 otherwise
Fit the Model
observed_matches = Bernoulli('obs', p=p_, value=matches[['Win']], observed=True)
model = Model([observed_matches, p_1, p_2, tau])mcmc = MCMC(model)mcmc.sample(40000, 10000)
p_1_samples = mcmc.trace('p_1')[:]p_2_samples = mcmc.trace('p_2')[:]tau_samples = mcmc.trace('tau')[:]
print p_1_samples[:10]print p_2_samples[:10]print tau_samples[:10][ 0.42067236 0.42067236 0.42067236 0.43900391 0.43900391 0.43900391 0.43900391 0.43900391 0.43900391 0.43900391][ 0.49213381 0.49213381 0.49213381 0.56072562 0.79863176 0.79863176 0.67416932 0.68382528 0.6069458 0.60062698][10 10 24 35 35 35 35 27 27 27]
plt.figure(figsize=(14.5, 10))ax = plt.subplot(311)ax.set_autoscaley_on(False)plt.hist(p_1_samples, histtype='stepfilled', alpha=0.85, label='posterior of p_1', color='#A60628', normed=True, bins=30)plt.legend(loc='upper left')ax = plt.subplot(312)plt.hist(p_2_samples, histtype='stepfilled', alpha=0.85, label='posterior of p_2', color='#7A68A6', normed=True, bins=30)plt.legend(loc='upper left')ax = plt.subplot(313)plt.hist(tau_samples, histtype='stepfilled', alpha=0.85, label='posterior of tau', color='#467821', normed=True, bins=30)plt.legend(loc='upper left')plt.show()
Expected Win Probability
num_matches = 38N = tau_samples.shape[0]expected_p_per_match = np.zeros(num_matches)for match in range(num_matches): ix = match < tau_samples p_samples_match = np.concatenate([p_1_samples[ix], p_2_samples[~ix]]) expected_p_per_match[match] = np.percentile(p_samples_match, 50)
Compute Confidence Bounds
lower_p_per_match = np.zeros(num_matches)upper_p_per_match = np.zeros(num_matches)for match in range(num_matches): ix = match < tau_samples p_samples_match = np.concatenate([p_1_samples[ix], p_2_samples[~ix]]) lower_p_per_match[match] = np.percentile(p_samples_match, 5) upper_p_per_match[match] = np.percentile(p_samples_match, 95)
Bayesian returns a distribution. What have we gained? We see uncertainty in our estimates. The wider the distribution, the less certain our posterior belief should be.