introduction to bayesian statistics...it is unanimously agreed that statistics depends somehow on...
TRANSCRIPT
Introduction to Bayesian Statistics
James Swain
University of Alabama in Huntsville
ISEEM Department
Author Introduction
โข James J. Swain is Professor of Industrial and Systems Engineering Management at UAH. His BA,BS, and MS are from Notre Dame, and PhD in Industrial Engineering from Purdue University.
โข Research interests include statistical estimation, Monte Carlo and Discrete Event simulation and analysis.
โข Celebrating 25 years at UAH with experience at Georgia Tech, Purdue, the University of Miami, and Air Products and Chemicals.
Outline
โข Views of probability
โข Likelihood principle
โข Classical Maximum Likelihood
โข Bayes Theorem
โข Bayesian Formulation
โข Applicationsโข Proportions
โข Means
โข Approach to Hypothesis Tests
โข Conclusion
โข References
โ
โ
It is unanimously agreed that statistics depends somehow on probability. But, as to what probability is and how it is connected with statistics, there has seldom been such complete disagreement and breakdown of communication since the Tower of Babel.
Savage, 1954
Bayesian statistics are based on โsubjectiveโ rather than โobjectiveโ probability
Classical View of Probability
โข Physical or Objective probabilityโข Also known as โfrequentist probabilityโ
โข Description of physical random processesโข Dice
โข Roulette Wheels
โข Radioactive processes
โข Probability as the limiting ratio of repeated occurrences
โข Philosophically --- Popper, von Mises, etc.
Subjective Probability
โข Subjective or Evidential โข Also called โBayesianโ probability
โข Assessment of uncertainty
โข Plausibility of possible outcomesโข May be applied to uncertain but fixed quantities
โข Bayes analysis treats unknown parameters as random
โข Sometimes elucidated in terms of wagers on outcomes
โข Philosophically --- de Finetti, Savage, Carnap, etc.
Note on Notation
โข Samples: X for either single or multiple samples
โข Distributions: โข p(X) for either discrete or continuous
โข N(ยต, ฯ2) for normal with mean and variance parameters (sometimes ฯ)
โข Parameters: โข For binomial examples, ฯ is population proportion of success
โข For continuous case, it is simply ฯ
Likelihood Principle
1 61 41 21 086420
0.25
0.20
0.1 5
0.1 0
0.05
0.00
X
Pro
bab
ilit
y
0.2
0.4
p
Binomial, n=20
If X = 8 is observed, which Binomial is more likely to have produced? โข Example: X=8 --- source?
โข b(8:0.4, 20) = 0.18
โข b(8;0.2, 20) = 0.02
โข Either possible
โข First more likely
Likelihood (Discrete Sample)
โข Likelihood function ๐ฟ ๐: ๐โข Note: ๐ is unknown proportion of success
โข Here, sample X is โgivenโ
โข Parameter ๐ is variable to solve
โข Solution is MLE (max. likelihood estimator)
๐ฅ๐ = แ1 Success0 Failure
๐ฟ ๐: ๐ =เท
๐=1
๐
๐ ๐ฅ๐ 1 โ ๐ 1โ ๐ฅ๐ = ๐๐ฅ 1 โ ๐ ๐โ๐ฅ
๐ ๐: ๐ = ln ๐ฟ ๐: ๐ = ๐ฅ ln๐ + ๐ โ ๐ฅ ln 1 โ ๐
๐๐
๐๐= 0
๐ฆ๐๐๐๐๐ ๐ = เท๐ =
๐ฅ
๐
Likelihood for continuous
โข For sample x = 13
โข Normal( 12.5, 4) โข f(13: 12.5, 4) = 0.19
โข Normal (10, 4) โข f(13: 10, 4) = 0.065
MLE for Continuous (Normal)
โข Multiple samples (n)
โข Likelihood L
โข Parameters chosen to max L using Ln L
โข Estimation (MLE)
๐ฟ ๐, ๐2: ๐ =เท
๐=1
๐๐โ ฮค๐ฅ๐โ๐
2 2๐2
2๐๐2
๐ ๐, ๐2: ๐ = ln ๐ฟ ๐, ๐2: ๐ = โ๐
2ln 2๐ โ
๐
2ln ๐2 โ
1
2๐2
๐=1
๐
๐ฅ๐ โ ๐ 2
๐๐
๐๐= 0 ฦธ๐ = าง๐ฅ
๐๐
๐๐2= 0 เท๐2 =
ฯ๐=1๐ ๐ฅ๐ โ าง๐ฅ 2
๐=
๐ ๐
๐ โ 1
2
Classical Statistics: Summary
โข Theory is well developedโข E.g., most cases asymptotically unbiased and normal
โข Variance parameters based on log likelihood derivatives
โข Confidence Intervals โข Confidence level means ?
โข Hypothesis Testsโข Pesky p-value
โ
โ
Thereโs no theorem like Bayesโ theoremLike no theorem we knowEverything about it is appealingEverything about it is a wowLet out all that a priori feelingYouโve been concealing right up to now!
George E. P. Box
Bayes Theorem useful, controversial (in its day), and the bane of introductory probability students. Sometimes known as โinverse probabilityโ
Bayes Theorem I: Sets and Partition of S
โข Consider a sample space S
โข Event ๐ต โ ๐
โข Partition of sample space ๐ด1, ๐ด2, โฆ , ๐ด๐โข Exhaustive
โข Mutually exclusive
โข Decomposition of B into subsets ๐ต โฉ ๐ด๐
โข By Axioms of probability
๐ = ๐ด1 โช ๐ด2 โช โฆ โช ๐ด๐๐ด๐ โฉ ๐ด๐ = โ
๐ต = ๐ต โฉ ๐ = ๐ต โฉ ๐ด1 โช ๐ต โฉ ๐ด2 โชโฏโช ๐ต โฉ ๐ด๐
๐ ๐ต =
๐=1
๐
๐(๐ต โฉ๐ด๐)
Bayes Theorem II: Conditional Probability
โข Conditional probability
โข Multiplication rule
โข Total probability
๐ ๐ต ๐ด๐ =)๐(๐ด๐ โฉ ๐ต
)๐(๐ด๐
๐ ๐ด๐ โฉ ๐ต = ๐ ๐ด๐ ๐ ๐ต ๐ด๐
๐ ๐ต =
๐=1
๐
๐(๐ต โฉ๐ด๐) =
๐=1
๐
)๐ ๐ด๐ ๐(๐ต|๐ด๐
Bayes III: The Theorem
โข Direct result of previous definitionsโข Multiplication rule
โข Total Probability
โข Conditionals are โreversedโ or โinvertedโ
๐ ๐ด๐ ๐ต =)๐(๐ด๐ โฉ ๐ต
)๐(๐ต=
)๐ ๐ด๐ ๐(๐ต|๐ด๐
ฯ๐=1๐ )๐ ๐ด๐ ๐(๐ต|๐ด๐
Quick Bayes Theorem Example
โข Let C = โconformingโ
โข A part is sampled, tested, and found to be conforming: ๐ ๐ด3 ๐ถ = ?
VendorP(C|Vendor)
(%)
Vendor
Proportion
(%)
A1 88 30
A2 85 20
A3 95 50
Bayes Statistical Formulation
โข Parameter ฮธ uncertainty
โข Prior distribution ๐(๐)
โข Conditional on sample: Likelihood
โข Posterior distribution ๐ ๐ ๐
โข Posterior proportional to numerator
๐ ๐ ๐ =)๐ฟ ๐ ๐ ๐(๐
)๐(๐
๐ ๐ =
๐ด๐๐ ๐
)๐ฟ ๐๐ ๐ ๐(๐๐ Discrete case
เถฑ๐ฟ ๐ ๐ ๐ ๐ d๐ Continuous Case
)๐ ๐ ๐ โ ๐ฟ ๐ ๐ ๐(๐
Proportionality
โข Denominator โข Complex
โข A constant
โข Key information is numerator
โข Standardize numerically if needed
โข Bayes methods frequently computer intensive
)๐ ๐ ๐ โ ๐ฟ ๐ ๐ ๐(๐
๐ ๐ ๐ =)๐ฟ ๐ ๐ ๐(๐
)๐(๐
Updating Posterior
โข Two independent samples X, Y
โข Posterior from X is prior to Y
โข Updated posterior
๐ ๐ ๐ =)๐ฟ ๐ ๐ ๐(๐
)๐(๐
๐ ๐ ๐, ๐ =๐ฟ ๐ ๐ ๐ ๐ ๐
)๐(๐=
)๐ฟ ๐ ๐ ๐ฟ ๐ ๐ ๐(๐
)๐ ๐ ๐(๐
High Density Intervals (HDI)
โข Summarizing Posterior Distribution
โข High Density Intervals
โข Probability content (e.g., 95%)
โข Region with largest density valuesโข Similar for symmetric (e.g., normal)
โข May differ for skewed (e.g., Chi-squared)
โข Special tables may be required
โข Source: Krusche, p. 41
โข Special tables for HDI (e.g., inverse log F)
Bayes Analysis: Binomial Sample (Beta prior)
โข Estimating proportion of success ฯ
โข Noted: Likelihood
โข What prior? Beta distribution is typical
๐ฅ๐ = แ1 Success0 Failure
๐ฟ ๐: ๐ =เท
๐=1
๐
๐ ๐ฅ๐ 1 โ ๐ 1โ ๐ฅ๐ = ๐๐ฅ 1 โ ๐ ๐โ๐ฅ
Beta Prior for Proportion Success
โข Uncertainty about proportion success (ฯ)
โข Similarity to likelihood
โข โUninformativeโ prior
๐ ๐ =๐๐ผโ1 1 โ ๐ ๐ฝโ1
)๐ต(๐ผ, ๐ฝ
๐ต ๐ผ, ๐ฝ =)ฮ ฮฑ ฮ(ฮฒ
)ฮ(๐ผ + ๐ฝ
Beta Prior for Proportion Success
โข Mean and variance formulas
E ฮ =๐ผ
๐ผ + ๐ฝ
Var(ฮ ) =๐ผ๐ฝ
๐ผ + ๐ฝ 2 ๐ผ + ๐ฝ + 1
Example: Small Poll
โข Priorโข โWorthโ sample 20
โข Estimated Mean 25%
โข Sampleโข Sample size 100
โข Vote favorable 40 (40%)
๐ผ0 = ๐0 ๐0๐ฝ0 = 1 โ ๐0 ๐0
๐0 =๐ผ0
๐ผ0 + ๐ฝ0
๐1 =๐ฅ + ๐ผ0
๐ + ๐ผ0 + ๐ฝ0=๐ฅ
๐
๐
๐ + ๐ผ0 + ๐ฝ0+
๐ผ0๐ผ0 + ๐ฝ0
๐ผ0 + ๐ฝ0๐ + ๐ผ0 + ๐ฝ0
Conjugate Priors
โข Prior / Posterior distributions that are similar
โข Conjugate: Posterior/Prior same form
โข Identification of distributional constants simplified
๐ ๐ ~๐๐ผโ1 1 โ ๐ ๐ฝโ1
๐ฟ ๐: ๐ = ๐๐ฅ 1 โ ๐ ๐โ๐ฅ
๐ ๐|๐ ~๐๐ผ+๐ฅโ1 1 โ ๐ ๐ฝ+๐โ๐ฅโ1
Simple Normal-Normal Case for Mean
โข Simplified Caseโข Fixed variance
โข Mean ๐ has prior parameters ๐0, ๐0
โข Derivation steps omitted (Ref: Lee)
)๐ ~ ๐(๐0, ๐0)๐ฅ ~ ๐(๐, ๐
๐(๐|๐ฅ) โ exp โ๐2 ๐0
โ1 + ๐โ1
2+ ๐
๐0๐0
+๐ฅ
๐
Results: Normal-Normal
โข Summarized results
โข Posterior mean weighted value
๐1 =1
๐0โ1 + ๐โ1
๐1 = ๐1๐0๐0
+๐ฅ
๐
๐ ~ ๐(๐1, ๐1)๐ฅ ~ ๐(๐, ๐)
๐1 = ๐0๐0โ1
๐0โ1 + ๐โ1
+ ๐ฅ๐โ1
๐0โ1 + ๐โ1
Example: Carbon Dating I
โข Ennerdale granophyre
โข Earliest dating: K/Ar in 60โs, 70โs
โข Prior: 370 M-yr, ยฑ20 M-yr
โข Later Rb/Sr 421 ยฑ8 (M-yr)
โข Posterior estimate
Carbon Dating II
โข Substitute Alternative Expert Prior
โข 400 M-yr, ยฑ50 M-yr
โข Revised posterior
โข Note posterior roughly same
Normal-Normal (Multiple Samples)
โข Similar arrangement
โข Multiple samples
โข Variances are constant
๐ ~ ๐(๐0, ๐0)๐ฅ๐ ~ ๐(๐, ๐)
๐ฟ(๐|๐ฅ) โ exp( โ๐2 ๐0
โ1 + ๐๐โ1
2+ ๐
๐0๐0
+ฯ๐ฅ๐๐
Results: Multiple Samples
โข Similar analysis
โข As n increases, posterior converges to าง๐ฅ
๐1 =1
๐0โ1 + ๐๐โ1
๐1 = ๐1๐0๐0
+าง๐ฅ
๐/๐
๐ ~ ๐(๐1, ๐1)๐ฅ ~ ๐(๐, ๐/๐)
๐1 = ๐0๐
๐ + ๐๐0+ าง๐ฅ
๐๐0๐ + ๐๐0
Example: Menโs Fittings
โข Modified from Lee (p. 46)
โข Experience: Chest measurement N(38, 9)
โข Shop data 39.8, ยฑ2 on n = 890 samples
Normal: Prior in Variance
โข Default prior in mean uniform
โข Improper for infinite range
โข Varianceโข Uniformity in value (like mean)?
โข Transform: uniform in log ฯ
โข Conjugate prior in Variancesโข Inverse Chi-squared distribution
๐ ๐ =1
๐
๐ ๐ โ ๐โ๐2โ1ex p โ
๐02๐
๐ ๐|๐ โ ๐โ๐+๐2
โ1exp โ๐0 + ๐
2๐
Conjugate Prior for Normal Variance
โข Inverse chi-squared distribution
โข Exact match not critical (likelihood will dominate)
๐ ~ ๐0 ๐๐โ2
๐ธ ๐ =๐0
๐ โ 2
Var ๐ =2๐0
2
๐ โ 2 2 ๐ โ 4
General Approach Hypothesis Tests
โข Hypothesis Tests: Choice between two alternatives
โข Alternatives disjoint and exhaustive of course
โข Priors
โข Posterior
๐ป0: ๐ โ ฮ0 ๐ป1: ๐ โ ฮ1
๐0 = ๐ โ ฮ0 ๐1 = ๐ โ ฮ1
)๐0 = ๐(๐ โ ฮ0|๐ )๐1 = ๐(๐ โ ฮ1|๐
๐0 + ๐1 = 1 ๐0 + ๐1 = 1
Hypothesis Tests --- General
โข Prior โoddsโ
โข Posterior โoddsโ
โข Bayes Factor Bโข Odds favoring ๐ป0 against ๐ป1
โข Two simple alternativesโข Larger B denotes favor for ๐ป0
๐0๐1
=๐0
1 โ ๐0
๐0๐1
=๐0
1 โ ๐0
๐ต =
๐0๐1๐0๐1
๐ต =)๐(๐|๐0)๐(๐|๐1
Summary and Conclusion
โข Bayesian estimation โข Since parameters random, provides probability statements about values
โข Provides mechanism for non-sample information
โข Avoids technical difficulties with likelihood along
โข More advanced analysis computationally intensiveโข Tied to computer software
โข Open source R with application BUGS
โข Machine learning heavily dependent on Bayesian methods
References
โข Lee (2012), Bayesian Statistics: An Introduction, 4Ed., Wiley
โข Lindley (1972), Bayesian Statistics, A Review, SIAM.
โข Krusche (2011), Doing Bayesian Data Analysis: A Tutorial with R and BUGS, Academic Press/Elsevier.