christopher m. bishop pattern recognition and machine...

75
Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING

Upload: ngodiep

Post on 31-Mar-2018

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Christopher M. Bishop

PATTERN RECOGNITION AND MACHINE LEARNING

Page 2: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Polynomial Curve Fitting

Page 3: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Sum-of-Squares Error Function

Page 4: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

0th Order Polynomial

Page 5: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

1st Order Polynomial

Page 6: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

3rd Order Polynomial

Page 7: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

9th Order Polynomial

Page 8: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Over-fitting

Root-Mean-Square (RMS) Error:

Page 9: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Polynomial Coefficients

Page 10: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Data Set Size:

9th Order Polynomial

Page 11: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Data Set Size:

9th Order Polynomial

Page 12: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Regularization

Penalize large coefficient values

Page 13: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Regularization:

Page 14: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Regularization:

Page 15: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Regularization: vs.

Page 16: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Polynomial Coefficients

Page 17: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

The Gaussian Distribution

Page 18: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Gaussian Parameter Estimation

Likelihood function

Page 19: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Maximum (Log) Likelihood

Page 20: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Properties of and

Page 21: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Curve Fitting Re-visited

Page 22: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Maximum Likelihood

Determine by minimizing sum-of-squares error, .

Page 23: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Predictive Distribution

Page 24: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

MAP: A Step towards Bayes

Determine by minimizing regularized sum-of-squares error, .

Page 25: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Bayesian Curve Fitting

Page 26: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Bayesian Predictive Distribution

Page 27: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Model Selection

Cross-Validation

Page 28: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Parametric Distributions

Basic building blocks:

Need to determine given

Representation: or ?

Recall Curve Fitting

Page 29: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Binary Variables (1)

Coin flipping: heads=1, tails=0

Bernoulli Distribution

Page 30: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Binary Variables (2)

N coin flips:

Binomial Distribution

Page 31: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Binomial Distribution

Page 32: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Parameter Estimation (1)

ML for BernoulliGiven:

Page 33: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Parameter Estimation (2)

Example:

Prediction: all future tosses will land heads up

Overfitting to D

Page 34: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Beta Distribution

Distribution over .

Page 35: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Bayesian Bernoulli

The Beta distribution provides the conjugate prior for the Bernoulli distribution.

Page 36: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Beta Distribution

Page 37: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Prior ∙ Likelihood = Posterior

Page 38: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Properties of the Posterior

As the size of the data set, N , increase

Page 39: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Prediction under the Posterior

What is the probability that the next coin toss will land heads up?

Page 40: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Multinomial Variables

1-of-K coding scheme:

Page 41: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

ML Parameter estimation

Given:

Ensure , use a Lagrange multiplier, ¸.

Page 42: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

The Multinomial Distribution

Page 43: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

The Dirichlet Distribution

Conjugate prior for the multinomial distribution.

Page 44: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Bayesian Multinomial (1)

Page 45: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Bayesian Multinomial (2)

Page 46: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

The Gaussian Distribution

Page 47: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Maximum Likelihood for the Gaussian (1)

Given i.i.d. data , the log likeli-hood function is given by

Sufficient statistics

Page 48: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Maximum Likelihood for the Gaussian (2)

Set the derivative of the log likelihood function to zero,

and solve to obtain

Similarly

Page 49: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Maximum Likelihood for the Gaussian (3)

Under the true distribution

Hence define

Page 50: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Bayesian Inference for the Gaussian (1)

Assume ¾2 is known. Given i.i.d. data, the likelihood function for

¹ is given by

This has a Gaussian shape as a function of ¹ (but it is not a distribution over ¹).

Page 51: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Bayesian Inference for the Gaussian (2)

Combined with a Gaussian prior over ¹,

this gives the posterior

Completing the square over ¹, we see that

Page 52: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Bayesian Inference for the Gaussian (3)

… where

Note:

Page 53: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Bayesian Inference for the Gaussian (4)

Example: for N = 0, 1, 2 and 10.

Page 54: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Bayesian Inference for the Gaussian (5)

Sequential Estimation

The posterior obtained after observing N { 1data points becomes the prior when we observe the N th data point.

Page 55: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Bayesian Inference for the Gaussian (6)

Now assume ¹ is known. The likelihood function for ̧ = 1/¾2 is given by

This has a Gamma shape as a function of ¸.

Page 56: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Bayesian Inference for the Gaussian (7)

The Gamma distribution

Page 57: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Bayesian Inference for the Gaussian (8)

Now we combine a Gamma prior, ,with the likelihood function for ¸ to obtain

which we recognize as with

Page 58: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Bayesian Inference for the Gaussian (9)

If both ¹ and ¸ are unknown, the joint likelihood function is given by

We need a prior with the same functional dependence on ¹ and ¸.

Page 59: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Bayesian Inference for the Gaussian (10)

The Gaussian-gamma distribution

• Quadratic in ¹.• Linear in ¸.

• Gamma distribution over ¸.• Independent of ¹.

Page 60: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Bayesian Inference for the Gaussian (11)

The Gaussian-gamma distribution

Page 61: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Bayesian Inference for the Gaussian (12)

Multivariate conjugate priors• ¹ unknown, ¤ known: p(¹) Gaussian.

• ¤ unknown, ¹ known: p(¤) Wishart,

• ¤ and ¹ unknown: p(¹,¤) Gaussian-Wishart,

Page 62: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

where

Infinite mixture of Gaussians.

Student’s t-Distribution

Page 63: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Student’s t-Distribution

Page 64: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Student’s t-Distribution

Robustness to outliers: Gaussian vs t-distribution.

Page 65: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Student’s t-Distribution

The D-variate case:

where .

Properties:

Page 66: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

The Exponential Family (1)

where ´ is the natural parameter and

so g(´) can be interpreted as a normalization coefficient.

Page 67: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

The Exponential Family (2.1)

The Bernoulli Distribution

Comparing with the general form we see that

and so

Logistic sigmoid

Page 68: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

The Exponential Family (2.2)

The Bernoulli distribution can hence be written as

where

Page 69: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

The Exponential Family (3.1)

The Multinomial Distribution

where, , and

NOTE: The ´k parameters are not independent since the corresponding ¹k must satisfy

Page 70: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

The Exponential Family (3.2)

Let . This leads to

and

Here the ´k parameters are independent. Note that

and

Softmax

Page 71: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

The Exponential Family (3.3)

The Multinomial distribution can then be written as

where

Page 72: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

The Exponential Family (4)

The Gaussian Distribution

where

Page 73: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

ML for the Exponential Family (1)

From the definition of g(´) we get

Thus

Page 74: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

ML for the Exponential Family (2)

Give a data set, , the likelihood function is given by

Thus we have

Sufficient statistic

Page 75: Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNINGskirshne/teaching/STAT598L_F09/prlm-slides... · PATTERN RECOGNITION . AND. MACHINE LEARNING. Polynomial Curve Fitting

Conjugate priors

For any member of the exponential family, there exists a prior

Combining with the likelihood function, we get

Prior corresponds to º pseudo-observations with value Â.