model- vs. design-based sampling and variance estimation

25
Model- vs. design-based sampling and variance estimation on continuous domains Cynthia Cooper OSU Statistics September 11, 2004 R82-9096-01

Upload: fanny-sylvia-c

Post on 14-Nov-2014

1.006 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Model- vs. design-based sampling and variance estimation

Model- vs. design-based sampling and variance estimation

on continuous domains

Cynthia CooperOSU Statistics

September 11, 2004

R82-9096-01

Page 2: Model- vs. design-based sampling and variance estimation

2

Introduction

• Research on model- and design-based sampling and estimation on continuous domains

Compare ...• Basis of inference of each• Sampling concepts• Interpretation of variance• Variance estimation

Page 3: Model- vs. design-based sampling and variance estimation

3

Duality in Environmental Monitoring

• Design-based Estimates– Status and trend– No model of underlying stochastic process

• Defensible– Probability sample

• Avoid selection bias• Control sample process variance

• Model-based predictions– Stochastic behavior of response

– Forecasting/prediction conditional on the observed data

Page 4: Model- vs. design-based sampling and variance estimation

4

General Outline

• Introduction• Summary comparison of approaches• Summary characterization of variance estimators• Proposed model-assisted variance estimator• Simulation methods• Design-based context results• Model-based (kriging) results• Conclusion

Page 5: Model- vs. design-based sampling and variance estimation

5

• Probability samples – unbiased estimates• Basis for long-run frequency properties

– Design-induced randomness – sample process variance

• Basic linear estimator scales up sample responses to extrapolate to population– Inclusion probabilities

• Examples– EPA EMAP

– ODFW Monitoring Plan Augmented Rotating Panel

– USFS Forest Inventory and Analysis

Comparison of approaches - Design-based

Page 6: Model- vs. design-based sampling and variance estimation

6

Comparison of approaches - Design-based

• Inclusion probability– Element-wise – Sum of probabilities of all samples

which include the ith elementi

– Pair-wise -- Sum of … which include ith & jth elementsij

• For continuous domains– Inclusion probability densities (IPD) (Cordy (1993))

Page 7: Model- vs. design-based sampling and variance estimation

7

• Response generated by a stochastic process

• Likelihood-based approaches to estimating parameters of model

• BLUP – Conditional on values observed in sample

• Examples– Mining surveys

– Soil and hydrology surveys

Comparison of approaches - Model-based

Page 8: Model- vs. design-based sampling and variance estimation

8

Variance estimators - Design-based

• Quantifies variability induced by sampling process• Variance of linear estimators

– Scale up square and cross-product terms with inverse marginal and pair-wise inclusion probability densities (IPDs)

• For continuous domains– Congruent tessellation stratified samples w/ one

observation per stratum• Require randomized grid origin to achieve non-zero

cross-product terms (πij-πiπj) (Stevens (1997))

Page 9: Model- vs. design-based sampling and variance estimation

9

Variance estimators - Design-based

• Horvitz-Thompson (HT)

• Can be negative– Especially samples with a point pair in close proximity

• Requires randomly-located tessellation grid

i ij

jiijj

j

i

i

iji i

i

Cordy

HTHT

zzzVSzwV

1ˆ|'ˆ

2

2)1993(

Page 10: Model- vs. design-based sampling and variance estimation

10

Variance estimators - Design-based

• Yates-Grundy (YG)

• Assumes fixed effective sample size• Point pairs with close proximity can destabilize

(Stevens (2003))• Requires randomly-located tessellation grid

i ijijji

j

j

i

i

ij

Cordy

YGYG

zzVSzwV

2)1993( 1

ˆ|'ˆ

Page 11: Model- vs. design-based sampling and variance estimation

11

Variance estimators - Model-based

• Estimating MSPE of BLUP– Involves variances and covariances associated with

square and cross-product terms of error

• Assume form of covariance that describes rate of decay of covariance

• Exponential• Spherical• Must result in positive-definite covariance matrix

• Incremental stationarity– E[(z(si) -z(so))2] = g(||si-so||) = g(h)– Typically, h E[…]

Page 12: Model- vs. design-based sampling and variance estimation

12

Variance estimators - Model-based

• Variance – Quantifies stochastic variability of expected value of

response– Vanishes as ||si-so|| → 0

• Mean-square prediction error (MSPE)– a.k.a. MSE– Variance + bias2

• Sample process variability of BLUP– Weighted averages vary less– Varies more as sample range increases relative to

resolution

Page 13: Model- vs. design-based sampling and variance estimation

13

Proposed model-assisted variance (VMA)

• Predict variance within a stratum• Variance is reduced by mean covariance

(assuming positively correlated elements)– Similar to error variance computations (Ripley (1981))

• Within-stratum estimated as– Sill reduced by within-stratum average covariance

• Linear estimator variance estimated as sum of squared coefficients times within-stratum variance

Use covariance structure of response to model variability due to sampling process

Page 14: Model- vs. design-based sampling and variance estimation

14

Precursors of and precedence formodeling covariance

• Cochran (1946)– Finite population

– Serial correlation w/ discrete lags

• Bellhouse (1977)– Continued extension of Cochran’s work to finite

populations ordered on two dimensions

• Small-area estimation model-assisted approaches– J.N.K Rao (2003)

Page 15: Model- vs. design-based sampling and variance estimation

15

Random field (background) generated in R• M. Schlather's GaussRF() of R package

RandomFields • Exponential covariance structure b*exp(-h/r)

– (e.g. 4*exp(-h/2))

• h is distance; b and r are "sill" and "range" parameters

Methods – part 1

Page 16: Model- vs. design-based sampling and variance estimation

16

Methods – part 1a

Repeat 1000 times per realization

• Stratified sample– n=100; one observation per stratum; stratum size 2x2

– Simple square-grid tessellation

• Randomized origin

• Constant origin

• REML estimate of covariance parameters (b,r)

Page 17: Model- vs. design-based sampling and variance estimation

17

Methods – part 2

Repeat 1000 times per realization (continued)• For the design-based context

– Estimate total (zhat)

• HT estimator for continuous domain

– Compute VHT, VYG and VMA

– Compare estimated variances with empirical variance (V[zhat])

• For the model-based context example (Kriging)– Randomly selected zo at fixed location over 1000 trials

– Obtain zhat, VOK, VMA

Page 18: Model- vs. design-based sampling and variance estimation

18

0 5 10 15 20

05

1015

20

Random field overlayed by stratified sample w/ constant origin Exponential covariance with range= 2 and sill= 1

Page 19: Model- vs. design-based sampling and variance estimation

19

Results – Design-based application

Empirical median relative error

Compares estimated variances with empirical variance of estimate of total (V[zhat])(Stratified sample with randomized origin)

hat

hat

zV

zVVEMRE

)50(...

Sill 4 1 Range 0.5 1 2 4 0.5 1 2 4 VHT 0.068 0.063 0.189 0.161 *0.044 0.185 *0.070 0.372 VYG

*-0.045 -0.122 -0.117 -0.176 -0.052 *-0.032 -0.228 -0.133 VMA 0.055 *-0.024 *-0.001 *-0.035 0.049 0.084 -0.138 *0.004

Page 20: Model- vs. design-based sampling and variance estimation

20

Results – Design-based application Exponential covariance with range= 2 and sill= 4

1000 2000 3000 4000 5000

02

00

Model-assisted Variance

Obs

erv

ed

V[z

hat]

1000 1500 2000 2500 3000 3500 4000 4500

0

Yates-Grundy Variance

Obs

erv

ed

V[z

hat]

-6000 -4000 -2000 0 2000 4000 60000

20

0Horvitz-Thompson Variance

Obs

erv

ed

V[z

hat]

Avg

Me

d

Avg

Me

d

Avg

Me

d

Page 21: Model- vs. design-based sampling and variance estimation

21

Results – Design-based application

Ratios of empirical standard deviations

(Stratified sample with randomized origin)

HTVsd

Vsd ...

Sill 4 1 Range 0.5 1 2 4 0.5 1 2 4 MA/HT 0.56 0.43 0.27 0.24 0.66 0.36 0.20 0.14 YG/HT 0.77 0.62 0.35 0.28 0.84 0.43 0.27 0.17

Page 22: Model- vs. design-based sampling and variance estimation

22

Results – Model-based application

0.0 0.2 0.4 0.6 0.8 1.0 1.2

010

0

Kriging variance (MSPE)

Obs

erve

d V

[zha

t]

Avg

0.0 0.2 0.4 0.6 0.8 1.0 1.2

0

Model-assisted variance

Avg

Exponential covariance with range= 1 and sill= 1(stratified sample with randomized origin)

Obs

erve

d V

[zha

t]

Page 23: Model- vs. design-based sampling and variance estimation

23

Concluding - Model-assisted approach

• Small-area precedence• Application to systematic and one-observation-

per-stratum samples• Effective alternative to direct estimators of

continuous-domain randomized-origin tessellation stratified samples– Empirical results – less bias, better efficiency

• Doesn’t require randomly-located tessellation grid on continuous domain for non-zero πij

Page 24: Model- vs. design-based sampling and variance estimation

24

Acknowledgements

Thanks to Don Stevens

Committee members

OSU Statistics Faculty

UW QERM Faculty

Page 25: Model- vs. design-based sampling and variance estimation

25

The research described in this presentation has been funded by the U.S. Environmental Protection Agency

through the STAR Cooperative Agreement CR82-9096-01 National Research Program on Design-

Based/Model-Assisted Survey Methodology for Aquatic Resources at Oregon State University. It has not

been subjected to the Agency's review and therefore does not necessarily reflect the views of the Agency,

and no official endorsement should be inferred

R82-9096-01