variances are not always nuisance parameters

49
1 Variances are Not Always Nuisance Parameters Raymond J. Carroll Department of Statistics Texas A&M University http://stat.tamu.edu/~carroll

Upload: xia

Post on 07-Jan-2016

25 views

Category:

Documents


1 download

DESCRIPTION

Variances are Not Always Nuisance Parameters. Raymond J. Carroll Department of Statistics Texas A&M University http://stat.tamu.edu/~carroll. Dedicated to the Memory of Shanti S. Gupta. Head of the Purdue Statistics Department for 20 years I was student #11 (1974). Palo Duro - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Variances are Not Always Nuisance Parameters

1

Variances are Not Always Nuisance Parameters

Raymond J. CarrollDepartment of StatisticsTexas A&M University

http://stat.tamu.edu/~carroll

Page 2: Variances are Not Always Nuisance Parameters

2

Dedicated to the Memory of Shanti S. Gupta

• Head of the Purdue Statistics Department for 20 years

• I was student #11 (1974)

Page 3: Variances are Not Always Nuisance Parameters

3

College Station, home of Texas A&M University

I-35

I-45

Big Bend National Park

Wichita Falls, my hometown

West Texas

Palo DuroCanyon, the Grand Canyon of Texas

Guadalupe Mountains National Park

East Texas

Page 4: Variances are Not Always Nuisance Parameters

4

Overview Main point: there are problems/methods

where variance structure essentially determines the answer

Assay Validation

Measurement error

Other Examples mentioned brieflyLogistic mixed models

Quality technology

DNA Microarrays for gene expression (Fisher!)

Page 5: Variances are Not Always Nuisance Parameters

5

Variance Structure My Definition: Encompasses

Systematic dependence of variability on known factors

Random effects: their inclusion, exclusion or dependence on covariates

My point:

Variance structure can be important in itself

Variance structure can have a major impact on downstream analyses

Page 6: Variances are Not Always Nuisance Parameters

6

Collaborators on This Talk Statistics: David

Ruppert

Assays: Marie Davidian, Devan Devanarayan, Wendell Smith

Measurement error: Larry Freedman, Victor Kipnis, Len Stefanski

David Ruppert also works with me outside the office

Page 7: Variances are Not Always Nuisance Parameters

7

Matt Wand Alan Welsh

Naisyin Wang Mitchell GailXihong Lin (who nominated me!)

Peter Hall

Acknowledgments

Page 8: Variances are Not Always Nuisance Parameters

8

Assay Validation

• Immunoassays: used to estimate concentrations in plasma samples from outcomes• Intensities

• Counts

• Calibration problem: predict X from Y

• My Goal: to show you that cavalier handling of variances leads to wrong answers in real life

• David Finney: anticipates just this point

Page 9: Variances are Not Always Nuisance Parameters

9

Assay Validation

• “Here the weighted analysis has also disclosed evidence of invalidity”

• “This needs to be known and ought not to be concealed by imperfect analysis”

David Finney is the author of a classic text

Page 10: Variances are Not Always Nuisance Parameters

10

Assay Validation

• Assay validation is an important facet of the drug development process

• One goal: find a working range of concentrations for which the assay has• small bias (< 30%

say)• small coefficient of

variation (< 20% say)

Wendell Smith motivated this work

Page 11: Variances are Not Always Nuisance Parameters

11

Assay Validation

The Data

These data are from a paper by M. O'Connell, B. Belanger and P. Haaland

Journal of Chemometrics and Intelligent Laboratory Systems (1993)

Page 12: Variances are Not Always Nuisance Parameters

12

Assay Validation

•Main trends: any method will do

•Typical to fit a 4 parameter logistic model

4

1 22 β

3

E(Y| X)=f(x,β)

(β -β ) =β +

1+ X/ β

Unweighted and Weighted Fits

Page 13: Variances are Not Always Nuisance Parameters

13

Assay Validation: Unweighted Prediction Intervals

Page 14: Variances are Not Always Nuisance Parameters

14

Assay Validation

• The data exhibit heteroscedasticity

• Typical to model variance as a power of the mean

• Most often:

var(Y| X) E(Y| X)

1 2

David Rodbard (L) and Peter Munson (R) in 1978 proposed the 4-parameter logistic for assays

Page 15: Variances are Not Always Nuisance Parameters

15

Assay Validation: Weighted Prediction Intervals

Marie Davidian andDavid Giltinan have written extensively on this topic

Page 16: Variances are Not Always Nuisance Parameters

16

Assay Validation: Working Range

• Goal: predict X from observed Y• Working Range (WR): the range where the

cv < 20%• Validation experiments (accuracy and

precision): done on working range• If WR is shifted away from small

concentrations: never validate assay for those small concentrations

• No success, even if you try (see %-recovery plots)

Page 17: Variances are Not Always Nuisance Parameters

17

Assay Validation: Variances Matter

Concentration

CV

500 1000 5000 10000

0.00.1

0.20.3

0.4

No weighting: LQL=1,057: UQL=9,505

Concentration

CV

50 100 500 1000 5000 10000 50000

0.0

0.1

0.2

0.3

0.4

Weighting, LQL=84, UQL=3,866

LQL UQL UQLLQL

Page 18: Variances are Not Always Nuisance Parameters

Concentration

OD

10 100 1000 10000

0.2

0.4

0.6

0.8

1.0

Unweighted

Weighted

LQL = 84

UQL = 3,866

Working Ranges for Different Variance Functions

Page 19: Variances are Not Always Nuisance Parameters

19

Assay Validation: % Recovery

•Goal: predict X from observed Y

•Measure: = % recovered

•Want confidence interval to be within 30% of actual concentration

X̂/ X

Devan Devanarayan, my statistical grandson, organized this example

Page 20: Variances are Not Always Nuisance Parameters

20

Assay Validation: % Recovery

•Note Acceptable ranges (IL-10 Validation Experiment) depend on accounting for variability

% Recovery with 90% C.I.

607080

90100110120

130140

10 100 1000

True Concentration

% R

ecov

ery

% Recovery with 90% C.I.

60

70

80

90

100

110

120

130

140

10 100 1000

True Concentration

% R

ecov

ery

Unweighted Weighted

Page 21: Variances are Not Always Nuisance Parameters

21

Assay Validation: Summary

• Accounting for changing variability is pointless if the interest is merely in fitting the curve• In other contexts, standard errors actually

matter (power is important after all!)

• The gains in precision from a weighted analysis can change conclusions about statistical significance

• Accounting for changing variability is crucial if you want to solve the problem

• Concentrations for which the assay can be used depend strongly on a model for variability

Page 22: Variances are Not Always Nuisance Parameters

22

The Structure of Measurement Error Measurement error

has an enormous literature

Hundreds of papers on the structure for covariates

W = X + Here X = “truth”, W =

“observed”

X is a latent variable

See Wayne Fuller’s 1987 text

Page 23: Variances are Not Always Nuisance Parameters

23

The Structure of Measurement Error For most regressions, if

X is the only predictor

W = X +

then

biased parameter estimates when error is ignored

power is lost (my focus today)

Page 24: Variances are Not Always Nuisance Parameters

24

The Structure of Measurement Error My point: the simple measurement

error model is too simple

W = X +

A different variance structure suggests different conclusions

Page 25: Variances are Not Always Nuisance Parameters

25

The Structure of Measurement Error

Nutritional epidemiology: dietary intake measured via food frequency questionnaires (FFQ)

Prospective studies: none have found a statistically significant fat intake effect on breast cancer

Controversy in post-hoc power calculations: what is the power to detect

such an effect?

Ross Prentice has

written extensively on this topic

Page 26: Variances are Not Always Nuisance Parameters

26

Dietary Intake Data

The essential quantity controlling power is the attenuation

Let Q = FFQ, , X = “long-term dietary intake”

Attenuation = % of variation that is due to true intake

100% is good

0% is bad

slope of regression of X on Q

Sample size needed for fixed power can be thought of as proportional to

Page 27: Variances are Not Always Nuisance Parameters

27

Post hoc Power Calculation

FFQ: known to be biased

F: “reference instrument” thought to be unbiased (but much more expensive than Q) F = X + F = 24-hour recall or

some type of diary

Then = slope of regression of F on Q

Larry Freedman has done fundamental work on dietary instrument validation

Page 28: Variances are Not Always Nuisance Parameters

28

Post hoc Power Calculation

If “reference instrument” is unbiased then

Can estimate attenuation

Can estimate mean of X

Can estimate variance of X

Can estimate power in the study at hand

Many, many papers assume that the reference instrument is unbiased in this way

Plenty of power

Walt Willett: a leader in nutritional epidemiology

Page 29: Variances are Not Always Nuisance Parameters

29

Dietary Intake Data

The attenuation ~= 0.30 for absolute amounts, ~= 0.50 for food composition Remember, attenuation is the % of

variability that is not noise

All based on the validity of the reference instrument

F = X + Pearson and Cochran now weigh in

Page 30: Variances are Not Always Nuisance Parameters

30

The Structure of Measurement Error 1902: “On the

mathematical theory of errors of judgment”

Interested in nature of errors of measurement when the quantity is fixed and definite, while the measuring instrument is a human being

Individuals bisected lines of unequal length freehand, errors recorded

Karl Pearson

Page 31: Variances are Not Always Nuisance Parameters

31

The Structure of Measurement Error

•FFQ’s are also self-report

•Findings have relevance today

• Individuals were biased

•Biases varied from individual to individual

Karl Pearson

Page 32: Variances are Not Always Nuisance Parameters

32

Measurement Error Structure

• Classic 1968 Technometrics paper

• Used Pearson’s paper

• Suggested an error model that had systematic and random biases

• This structure seems to fit dietary self-report instruments

William G. Cochran

Page 33: Variances are Not Always Nuisance Parameters

33

Measurement Error Structure: Cochran

Fij = FFXij +rFi+ Fij

rFi = Normal(0,Fr2)

• We call rFi the “person-specific bias”

• We call Fthe “group-level bias”

• Similarly, for FFQ,

Qij = QQXij +rQi+ Qij

rQi = Normal(0,Qr2)

Page 34: Variances are Not Always Nuisance Parameters

34

Measurement Error Structure

The horror: the model is unidentified

Sensitivity analyses suggest potential that measurement error

causes much greater loss of power than previously suggested

Needed: Unbiased measures of intake

Biomarkers Protein via urinary nitrogen

Calories via doubly-labeled water

Page 35: Variances are Not Always Nuisance Parameters

35

Biomarker Data

Protein: Available from a

number of European studies

Calories and Protein: Available from NCI’s

OPEN study

Results are stunning

Victor Kipnis was the driving force behind OPEN

Page 36: Variances are Not Always Nuisance Parameters

36

Biomarker Data: Attenuations

Protein (and Calories and Protein Density for OPEN)

0

0.1

0.2

0.3

0.4

0.5

0.6

OP

EN

-%

P

OP

EN

-C

OP

EN

-P

UK

: D

iary

UK

: W

FR

EP

IC

#1

EP

IC

#2

EP

IC

#3

EP

IC

#4

EP

IC

#5

BiomarkerStandard

Page 37: Variances are Not Always Nuisance Parameters

37

Biomarker Data: Sample Size Inflation

Protein (and Calories and Protein Density for OPEN)

0

2

4

6

8

10

12

OP

EN

-%

P

OP

EN

-C

OP

EN

-P

UK

: D

iary

UK

: W

FR

EP

IC

#1

EP

IC

#2

EP

IC

#3

EP

IC

#4

EP

IC

#5

Sample Size

Page 38: Variances are Not Always Nuisance Parameters

38

Measurement Error Structure

The variance structure of the FFQ and other self-report instruments appears to have individual-level biases Pearson and Cochran model

Ignoring this: Overestimation: of power

Underestimation: of sample size

It may not be possible to understand the effect of total intakes Food composition more hopeful

Page 39: Variances are Not Always Nuisance Parameters

39

Other Examples of Variance Structure

Nonlinear and generalized linear mixed models (NLMIX and GLIMMIX)

Quality Technology: Robust parameter design

Microarrays

Page 40: Variances are Not Always Nuisance Parameters

40

Nonlinear Mixed Models

Mixed models have random effects

Typical to assume normality

Robustness to normality has been a major concern

Many now conclude that this is not that major an issue There are exceptions!!

Page 41: Variances are Not Always Nuisance Parameters

41

Logistic Mixed Models

Heagerty & Kurland (2001) “Estimated regression

coefficients for cluster-level covariates

Can be highly sensitive to assumptions about whether the variance of a random intercept depends on a cluster-level covariate”,

i.e., heteroscedastic random effects or variance structure

Patrick Heagerty

Page 42: Variances are Not Always Nuisance Parameters

42

Logistic Mixed Models

Heagerty (Biometrics, 1999, Statistical Science 2000, Biometrika 2001)

See also Zeger, Liang & Albert (1988), Neuhaus & Kalbfleisch (1991) and Breslow & Clayton (1993)

Gender is a cluster-level variable

Allowing cluster-level variability to depend on gender results in a large change in the estimated gender regression coefficient and p-value.

Marginal contrasts can be derived and are less sensitive

In the presence of variance structure, regression coefficients alone cannot be interpreted marginally

Page 43: Variances are Not Always Nuisance Parameters

43

Robust Parameter Design

“The Taguchi Method”

From Wu and Hamada: “aims to reduce the variation of a system by choosing the setting of control factors to make it less sensitive to noise variation”

Set target, optimize variance

Jeff Wu and Mike Hamada’s text is an excellent introduction

Page 44: Variances are Not Always Nuisance Parameters

44

Robust Parameter Design

Modeling variability is an intrinsic part of the method Maximizing the signal to noise ratio

(Taguchi)

Modeling location and dispersion separately

Modeling location and then minimizing the transmitted variance

Ideas are used in optimizing assays, among many other problems

Page 45: Variances are Not Always Nuisance Parameters

45

Robust Parameter Design: Microarrays for Gene Expression cDNA and oligo-

microarrays have attracted immense interest

Multiple steps (sample preparation, imaging, etc.) affect the quality of the results

Processes could clearly benefit from robust parameter design (Kerr & Churchill)

R. A. Fisher

Page 46: Variances are Not Always Nuisance Parameters

46

Robust Parameter Design: Microarrays

Experiment (oligo-arrays): 28 rats given different diets (corn oil, fish oil

and olive oil enhanced)

15 rats have duplicated arrays

How much of the variability in gene expression is due to the array?

We have consistently found that 2/3 of the variability is noise within animal rather than between animal

Page 47: Variances are Not Always Nuisance Parameters

47

Intraclass Correlations in the Nutrition Data Set

Simulated ICC for 8,000 independent genes with common = 0.35

Estimated ICC for 8,000 genes from mixed models

Clearly, more control of noise via robust parameter design has the potential to impact power for analyses

Page 48: Variances are Not Always Nuisance Parameters

48

Conclusion My Definition: Variance Structure

encompasses

Systematic dependence of variability on known factors

Random effects: their inclusion or exclusion

My point:

Variance structure can be important in itself

Variance structure can have a major impact on downstream analyses

Page 49: Variances are Not Always Nuisance Parameters

49

And Finally

I’m really happy to be on the faculty at A&M (and to be the Fisher Lecturer!)

At the Falls on the Wichita River, West Texas