effect of the reference set on frequency inference donald a. pierce radiation effects research...

Effect of the Reference Set on Frequency Inference

Donald A. PierceRadiation Effects Research

Foundation, Japan

Ruggero BellioUdine University, Italy

Paper, this talk, other things at http://home.att.ne.jp/apple/pierce/

2

Frequency inferences depend to first order only on the likelihood function, and to higher order on other aspects of the probability model or reference set

That is, “other aspects” not affecting the likelihood function: e.g. censoring models, stopping rules

Here we study to what extent, and in what manner, second-order inferences depend on the reference set

Various reasons for interest in this. e.g. :

Foundational: to what extent do frequency inferences violate the Likelihood Principle?

Unattractiveness of specifying censoring models

Practical effects of stopping rules

3

Example: Sequential Clinical Trials

Patients arrive and are randomized to treatments, outcomes 1 2, ,y y Stop at n patients based on outcomes to that point.

Then the data has probability model and the likelihood function is this as a function of ,defined only up to proportionality.

( , ; )p y n ( ; , )L y n

The likelihood function does not depend on the stopping rule, including that with fixed n .

First-order inference based only on the likelihood function does not depend on the stopping rule, but higher-order inference does depend on this.

How does inference allowing for the stopping rule differ from that for fixed sample size?

4

Example: Censored Survival Data

Patients arrive and are given treatments. Outcome is response time, and when it is time for analysis some patients have yet to respond

This involves what is called the censoring model. First-order inferences depend on only the likelihood function and not on the censoring model.In what way (how much) do higher-order inferencesdepend on the censoring model? It is unattractive that they should depend at all on this.

The likelihood function based on data is 1

1( ; ) ( ; ) ( )i i

n f fi i i i ii

L t p t pr T t

but the full probability model involves matters such as the probability distribution of patient arrival times

1 1( , ), , ( , )n nt f t f

5

Typical second-order effects

Binomial regression: test for trend with 15 observations, estimate towards the boundary, P-values that should be 5%First-order: 7.3% , 0.8%

Second-order 5.3% , 4.7%

Generally, in settings with substantial numbers of nuisance parameters, and even for large samples, adjustments may be much larger than this --- or they may not be

0.25 Testing , and when stopping at following n : 10 20 30

: 5.6% 12.5% 16.9%: 2.1% 6.1% 9.1%: 5.3% 11.6% 15.8%

exact

first

second

nPPP

60n( ,1)N Sequential experiment: underlying data, stop

when

1 59: 3, 2nn y c c c or

6

Starting point: signed LR statistic, first order N(0,1)

1/ 2ˆ ˆˆ( ) sgn( ) 2 ( ; ) ( ; )r y l y l y

so to first order -value ( )obsP r y

To second order, modern likelihood asymptotics yield that

( ) ( ) ( ) ( )obs obs obspr r y r y r y ADJ y

Only the adjustment depends on the reference set, so this is what we aim to study

ADJ

MLE , constrained MLE , profile likelihood̂ ̂ ˆ( ; ) ( ; )PL y L y

Model for data , parametric function of interest ( ) ( ; )p y

Some general notation and concepts

7

Consider integrated likelihoods of form

( ; ) ( , ; ) ( | )BL y L y d where is any smooth prior on the nuisance parameter

( | ) ( ; )BL y

Computing a P-value for testing an hypotheses on requires only an ordering of datasets for evidence against the hypothesis ( )

Then, regardless of the prior, the signed LR statistic based on provides to second order the same ordering of datasets, for evidence against an hypothesis, as does

( )r y

( ; )BL y

Even to higher order, ideal inference should be based on the distribution of . Modifications of this pertain to its distribution, not to its inferential relevance

( )r y

8

The theory for this is due to Barndorff-Nielsen (Bmtrka, 1986) and he refers to as ( ) ( )obs obsr y ADJ y *r

Thinking of the data as , depends on notorious sample space derivatives

ˆ( , )a ADJ

2 ˆ ˆ ˆ ˆ ˆ ˆ ˆ( ; , ) / , ( ; , ) ( ; , ) /l a l a l a

Very difficult to compute, but Skovgaard (Bernoulli, 1996) showed they can be approximated to second order as

1

1

ˆ ˆ ˆ ˆcov ( ), ( )

ˆ ˆ ˆ ˆ ˆcov ( ) ( ), ( )

l l i j

l l l i j

( ) ( ) ( ) ( )obs obs obspr r y r y r y ADJ y

Now return to the main point of higher-order likelihood asymptotics, namely

9

It turns out that each of these approximations has a leading term depending on only the likelihood function, with a next term of one order smaller depending on the reference set

1 1

1

ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆˆ ˆ ˆ ˆcov ( ), ( ) cov ( ), ( ) ( )cov ( ), ( )

ˆ ˆ ˆ ˆˆ ˆ ˆ( )cov ( ), ( )

l l i j l l l l i j

j l l i j

For example,

Thus we need the quantity to only first order to obtain second-order final results

ˆ ˆcov ( ), ( )l l

A similar expansion gives the same result for the other sample-space derivative

10

This provides our first main result: If within some class of reference sets (models) we can write, without regard to the reference set,

( ; ) ( ; )il y l y where the are stochastically independent, then second-order inference is the same for all of the reference sets

il

The reason is that when the contributions are independent, the value of

ˆ ˆcov ( ), ( )l l

must agree to first order with the empirical mean of the contributions , and this mean does not depend on the reference set

ˆ ˆ( ) ( )i il l

Thus, in this “independence” case, second-order inference, although not determined by the likelihood function, is determined by the contributions to it

11

A main application of this pertains to censoring models, if censoring and response times for individuals are stochastically independent

Then the usual contributions to the likelihood, namely

1log ( ; ) ( )i if fi i i i i il p t pr T t

do not depend on the censoring model, and are stochastically independent

So to second order, frequency inference is the same forany censoring model --- even though some higher-order adjustment should be made

Probably should either assume some convenient censoring model, or approximate the covariances from the empirical covariances of contributions to the loglikelihood

12

Things are quite different for comparing sequential and fixed sample size experiments --- usually cannot have “contributions” that are independent in both reference sets

But first we need to consider under what conditions second-order likelihood asymptotics applies to sequential settings

We argue in our paper that it does whenever usual first-order asymptotics applies

These conditions are given by Anscombe’s Theorem: A statistic asymptotically standard normal for fixed n remains so when: (a) the CV of n approaches zero, and (b) the statistic is asymptotically suitably continuous. Discrete n in itself does not invalidate (b)

13

( ) ( ) ( ) ( )obs obs obspr r y r y r y ADJ y In the key relation

need to consider, following Pierce & Peters (JRSSB 1992), the decomposition

( ) ( ) ( )ADJ y NP y INF y

Related to Barndorff-Nielsen’s modified profile likelihood ( ; ) ( ) ( ; )

by log( ) /MP PL y M y L y

NP M r

NP pertains to effect of fitting nuisance parameters, andINF pertains to moving from likelihood to frequency inference --- INF is small when adj information is large

14

When and are chosen as orthogonal, we have that to second order

1/ 2ˆ( ) | ( , ) |M y j

depending only on the likelihood function

Thus, in sequential experiments the NP adjustment and MPL do not depend on the stopping rule, but the INF adjustment does

Except for Gaussian experiments with regression parameter , there is an INF adjustment both for fixed n and sequential, but they are different

Parameters orthogonal for fixed-size experiments remain orthogonal for any stopping rule, since (for underlying i. i. d. observations) we have from the Wald Identity that

1( ) ( ) ( )ni E n i

15

SUMMARY

When there are contributions to the likelihood that are independent under each of two reference sets, then second-order ideal frequency inference is the same for these.

In sequential settings we need to consider the nuisance parameter and information adjustments. To second order, the former and the modified profile likelihood do not depend on the stopping rule, but the latter does.

This is all as one might hope, or expect. Inference should not, for example, depend on the censoring model but it should depend on the stopping rule

16

Appendix: Basis for higher-order likelihood asymptotics

1/ 2 1

ˆ( | ; )ˆ ˆ ˆ( | ; ) ( | ; )ˆ ˆ( | ; )

ˆ( , ; ) ˆ ˆ( | ; )ˆ ˆ( , ; )

( ; ) ˆ ˆ( | ; )ˆ( ; )

( ; ) ˆˆ| ( ) | ( )ˆ( ; )

p ap a p a

p a

p ap a

p a

L yp a

L y

L yj O n

L y

Transform from to , integrate out ̂ ˆ ˆ{ ( , ), }r a ̂

Provides a second-order approximation to the distributionof . The Jacobian and resultant from the integration are what comprise

( )r y( )ADJ y

effect of the reference set on frequency inference donald a. pierce radiation effects research...

Documents