testing hypotheses using model selection

Testing hypotheses using model selectionEric D. StolenInoMedic Health Applications, Ecological Program, Kennedy Space Center, Florida

NASA Environmental Management Branch

We h ve inv st d a l t of t m nd eff rt in cr at ng R, pl s c te it wh n us ng it f r d t n lys s.

We have invested a lot of time and effort in creating R, please cite it when using it for data analysis.

“The human understanding, once it has adopted an opinion, collects any instances that confirm it, and though the contrary instances may be more numerous and more weighty, it either does not notice them or else rejects them, in order that this opinion will remain unshaken.”

- Francis Bacon (1620)

Outline Science issues The method of multiple working

hypotheses Statistical models as science tools Making inference in science Information-theoretic model selection Multi-model inference

ScienceWhat is it?

Science is the organized process of creating testable explanations of how the natural world works.

Theory

Hypothesis

Understanding

Hypothetico-deductive modelGenerate

hypothesis (from theory)

Make a prediction from the

hypothesisConduct experiment to test prediction

Decide whether or not the theory is supported

Hypothetico-deductive model Taught in Primary through graduate-school education

Not the way science is done in many fields

Modern science is largely inductive

Null hypothesis testingH0: No effectHA: Effect of interest

Probability{ data | H0 }

Is this what we want to know?

Known as the frequentist approach Not what Fisher, Neyman nor Pearson

intended!

R. A. Fisher (1890 – 1962)

Jerzy Neyman(1894 – 1981)

Karl Pearson(1857 – 1936)

Null hypothesis testing

http://en.wikipedia.org/wiki/File:R._A._Fischer.jpg

http://en.wikipedia.org/wiki/File:Karl_Pearson_2.jpg

Oops

(c) Ian Britton - FreeFoto.com

NHT problems Some problems:

•Silly nulls•Slow progress•Many systems not amenable• Inference dependent upon the sample space

•Fosters unthinking approaches

an alternative

Probability{ HA | data }

Multiple working hypotheses

Thomas C. Chamberlin (1843-1928)- Geologist- President University of Wisconsin

- Director Walker Museum and Chair Dept. of Geology at the University of Chicago

- President of the American Association for the Advancement of ScienceChamberlin, T. C. 1890. The method of

multiple working hypotheses. Science 15:92-96 (reprinted 1965, Science 148:754-759

Alternative Hypotheses

Reality

Theory Data

Wading bird group foraging behavior


Wading bird group foragingH1: No effectH2: Group effect same for all speciesH3: Group effect differs by speciesH4: (Group by species) + prey densityH5: Group + prey densityH6: (Group by species) + prey + habitat

Mathematical models in science

“Nature's great book is written in mathematics.”

- Galileo Galilei

Mathematical models in science

EmpiricalModels

MechanisticModels

EcologyChemistry in 19th

CenturyClimatology

PhysicsModern ChemistryMolecular biology

Generalized Linear Model Three parts

•Probability distribution (error)Y i ~ N(i, 2)

•Link functionE(Y i) = i

• linear equation i = n(xi1, xi2, xi3, …xiq)

Generalized Linear Model Linear regression and ANOVA

• Link function – Identity link• linear equation• error distribution – Normal Distribution

(Gaussian)

Y = b0 + b1X1 + b2X2 + e

Generalized Linear Model Logistic Regression

• Link function - Logit link: ln(p / (1-p))• linear equation• error distribution – Binomial Distribution

Logit(p) = b0 + b1X1 + b2X2 + e

Maximum likelihood estimnation R. A. Fisher (1980-1962) The parameter estimates that are

most likely, given the data and the model

Example• Receive a cookie from the cafeteria 11 days• Observe 7 chocolate chip and 4 oatmeal

raisin• What is the best estimate of p = proportion

chocolate chip (given the observed data)

Maximum likelihood estimnation

“CC” “CC” “OR” “CC” “CC” “OR” “OR” “CC”

“OR” “CC” “CC”

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Proportion heads

Like

lihoo

d0.

000.

050.

100.

150.

200.

250.

30

Proportion Chocolate Chip

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.05

0.10

0.15

0.20

0.25

Proportion heads

Like

lihoo

d


0.0 0.2 0.4 0.6 0.8 1.0

-40

-30

-20

-10

0

Proportion heads

Log-

Like

lihoo

d



Wading bird group foragingH1: No effectH2: Group effect same for all speciesH3: Group effect differs by speciesH4: (Group by species) + prey densityH5: Group + prey densityH6: (Group by species) + prey + habitat


Wading bird group foragingH1: Foraging rate = b0 + eH2: Group effect same for all speciesH3: Group effect differs by speciesH4: (Group by species) + prey densityH5: Group + prey densityH6: (Group by species) + prey + habitat


Wading bird group foragingH1: No effectH2: FR = b0 + Group * b1 + eH3: Group effect differs by speciesH4: (Group by species) + prey densityH5: Group + prey densityH6: (Group by species) + prey + habitat

Approaches to science

ObservationalStudy

ExperimentalStudy

Strength of Inference

Experimental study What is the effect of a particular treatment (or series of treatments) on a particular aspect of the system

Experimental study

C D controlBA

7,22,21,54,67,81

6,29,33,61,77,79

11,12,69,74,91,92

10,15, 41,44,88

1,4,5,38,62,99

Treatments:A, B, C, D

Replicates:1,2,3,…,n

Experimental study

C D controlBA

7,22,21,54,67,81

6,29,33,61,77,79

11,12,69,74,91,92

10,15, 41,44,88

1,4,5,38,62,99



Randomization

Observational study

C D controlBA

7,22,21,54,67,81

6,29,33,61,77,79

11,12,69,74,91,92

10,15, 41,44,88

1,4,5,38,62,99



Bias

Approaches to science

ObservationalStudy

ExperimentalStudy

Strength of Inference

ConfirmatoryStudy

Confirmatory study Make predictions a priori Design collection of observational data including as much replication and control as possible

Weakness is still lack of randomization (not assigning treatment)

Summary so far Science is a process to postulate and

refine reliable descriptions (explanations) of reality

The method of multiple working hypotheses is a particularly useful science tool

Mathematics is the language of science

Experiments are golden, confirmatory studies are helpful

Next… Statistical model selection theory Information-theoretic tools R Model selection in practice Multi-model inference

Precision-Bias Trade-offBi

as 2

Model Complexity – increasing number of Parameters

Y = b0 + b1X1 + b2X2 + e

Precision-Bias Trade-off

varia

nce


Y = b0 + b1X1 + b2X2 + e

Precision-Bias Trade-offBi

as 2

varia

nce


Y = b0 + b1X1 + b2X2 + e

Kullbeck-Leibler information

Kullback, S., and R. A. Leibler. 1951. On Information and Sufficiency The Annals of Mathematical Statistics 22:79-86

(1907-1994) (1914-2003)

Kullback-Leibler information divergence

Full TruthG1 (best model in set)

G2G3


G1 (best model in set)

G2G3

Full Truth


G1 (best model in set)

G2G3

The relative difference between models is constant

Full Truth

I(f,g) = information lost when model g is used to approximate f (full reality)

Kullbeck-Leibler information

Hirotugu Akaike (1927-2009)

Figured out how to estimate the relative Kullback-Leibler distance between models in a set of models

Figured out how to link maximum likelihood estimation theory with expected K-L information

An Information Criterion

Akaike Information CriteriaAIC = -2 ln (L{modeli }| data) + 2KHirotugu Akaik. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19 (6): 716–723.

Akaike Information CriteriaAIC = -2 ln (L{modeli }| data) + 2K

Log-likelihood(from software)

Akaike Information CriteriaAIC = -2 ln (L{modeli }| data) + 2K

Log-likelihood(from software)

Parametersestimated

Information Criteria AIC = -2 ln (L{modeli }| data) + 2K AICc = AIC + 2*K*(K+1)/(n-K-1) QAICc = -2lnL/c + 2K + 2*K*(K+1)/(n-

K-1) BIC = -2lnL + K ln(n) DIC = -2lnL (for nested models) Etc…

What is ? Open source version of S (Bell Labs) Developed by Ross Ihaka and Robert

Gentleman A true data analysis environment Object-oriented and data-centric

programming language Maintained by “The R Foundation” http://www.r-project.org/

Model selection tablemodel k sumlogL sumaic AICc D

i wi wi/wbest

Sex + landcocver + Sex * landcocver 4 -45.34 100.69 101.69 0.00 0.29 1.00

Sex + landcocver 3 -46.70 101.40 101.98 0.29 0.25 1.16

Sex + landcocver + weeks + Sex * landcocver 5 -44.62 101.24 102.78 1.09 0.17 1.73

Sex + landcocver + weeks 4 -45.94 101.88 102.88 1.19 0.16 1.82

Sex + weeks 3 -48.06 104.12 104.71 3.02 0.06 4.53

Sex 2 -49.30 104.60 104.88 3.20 0.06 4.94

landcocver 2 -54.42 114.83 115.12 13.43 0.00 824.28

landcocver + weeks 3 -54.33 116.67 117.25 15.56 0.00 2398.06

weeks 2 -58.94 123.88 124.17 22.48 0.00 76100.46

Model weights

Model Probability

Evidence ratio of model i to model j = wi / wj

D

D R

rr

iiw

1

)2/1exp(

)2/1exp(

}|{Pr datagobw ii


i wi wi/wbest


Sex + landcocver 3 -46.70 101.40 101.98 0.29 0.25 1.16



Sex + weeks 3 -48.06 104.12 104.71 3.02 0.06 4.53

Sex 2 -49.30 104.60 104.88 3.20 0.06 4.94

landcocver 2 -54.42 114.83 115.12 13.43 0.00 824.28


weeks 2 -58.94 123.88 124.17 22.48 0.00 76100.46

Multi-model inference

Sometimes there is a clearly best model.

If not, why choose one?

Model selection uncertainty

Problems arise when we use the same data to both select a model and to estimate parameters.• Chatfield, C. 1995. Model uncertainty, data mining and statistical

inference. Journal of the Royal Statistical Society. Series A (Statistics in Society) 158:419-466.

We need to account for the information used in weighting models in our estimates of the model parameter uncertainty

Model averaging

R

iiiYwY

1

Model averaging

R

iiiYwY

1

Model-averagedPrediction

Model averaging

R

iiiYwY

1

Model i weight

Model averaging

R

iiiYwY

1

Model i prediction

Model averaging

R

iiiw

1

Model-averagedParameter estimate


i wi wi/wbest


Sex + landcocver 3 -46.70 101.40 101.98 0.29 0.25 1.16



Sex + weeks 3 -48.06 104.12 104.71 3.02 0.06 4.53

Sex 2 -49.30 104.60 104.88 3.20 0.06 4.94

landcocver 2 -54.42 114.83 115.12 13.43 0.00 824.28


weeks 2 -58.94 123.88 124.17 22.48 0.00 76100.46

5010

015

0

Hom

e R

ange

(ha)

Male-Core, Best Female-Core, Best Male-Dist, Best Female-Dist, Best

5010

015

0

Hom

e R

ange

(ha)

Male-Core, MA Male-Core, Best Female-Core, MA Female-Core, Best Male-Dist, MA Male-Dist, Best Female-Dist, MA Female-Dist, Best

Conclusions Science is a process (we never arrive at

the destination) Multiple hypotheses approach superior What we’re after is evidence for

alternative hypotheses ( Pr{ Ha|data } ) Information-theoretic model selection is a

powerful new tool in this approach to inference

Multi-model averaging acknowledges model-selection uncertainty

Thanks! Dan Hunt, IHA David R. Anderson, Colorado State

University Model-based Inference Working

Group (MBIG)• Dave Breininger, Geoff Carter, John Drese,

Brean Duncan, Carlton Hall,, Dan Hunt, Tim Kozusko, Eric Stolen

[email protected]

testing hypotheses using model selection

Documents