here’s a data analysis problem: for the 2002 forest...

32
Here’s a data analysis problem: For the 2002 forest inventory data (Finley et al 2008; BEF.dat, spBayes). Problem: Replace a laborious outcome measurement with a function of predictors measured by satellites. I Outcome: red maple total basal area (metric tons biomass). I Predictors: Elevation, slope, brightness (TC1), greenness (TC2), wetness (TC3). I 437 observations. This problem could have been on the MS exam I wrote in 1982 . . .

Upload: others

Post on 17-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Here’s a data analysis problem:

For the 2002 forest inventory data (Finley et al 2008; BEF.dat, spBayes).

Problem: Replace a laborious outcome measurement with a function ofpredictors measured by satellites.

I Outcome: red maple total basal area (metric tons biomass).

I Predictors: Elevation, slope, brightness (TC1), greenness (TC2),wetness (TC3).

I 437 observations.

This problem could have been on the MS exam I wrote in 1982 . . .

Page 2: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

EXCEPT these observations are spatially referenced

Red maple basal area Elevation

Page 3: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Here’s a standard model for analyzing data like this

At spatial locations indexed by s, model outcome y(s) as

y(s) = x(s)� + w(s) + ✏(s)

Ix(s) are covariates, including an intercept

Iw(s) is a stationary GP, mean 0, covariance function �2

s K (⇢)

I ✏(s) is iid N with mean 0, variance �2e

I Unknown parameters: �, �2s, ⇢, and �2

e .

Page 4: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

. . . giving this mixed linear model and restricted likelihood

For observations at {s1, . . . , sn}, the mixed linear model is

y = X� + Inw+ ✏

IX ’s rows are the x(si )

Iw = (w(s1), . . . ,w(sn))0 ⇠ N(0,G ) for G = �2

s K ({si}; ⇢)I ✏ ⇠ N(0,R) for R = �2

e I

�2s, ⇢, �

2e can be estimated by maximizing the log restricted likelihood

� log |V |� log |X 0V

�1X |� y

0[V�1 � V

�1X (X 0

V

�1X )�1

X

0V

�1]y

where V = G + R , a dense matrix.

Page 5: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

For models like this, we don’t have LM-quality tools

The RL doesn’t have a closed form; e↵ects of data features are obscure.

Variograms are useful but

I don’t give specific info about how the data determine �2s, ⇢, �

2e

I aren’t much help in assessing non-stationarity.

The usual residuals are problematic:

I This model can fit any dataset arbitrarily well.

I If the model smooths much, residuals are biased.

I Residuals don’t tell us how the data determine �2s, ⇢, �

2e .

Page 6: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Bose, Hodges, Banerjee Biometrics 2018

BHB Biometrics (2018) is Step 1 (maybe) in filling this gap.

This talk emphasizes ideas using a simplified problem:

data collected on a 1-D regular grid with no fixed e↵ects.

I’ll mention how we’ve addressed these simplifications.

Page 7: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

The three ideas that make this tractable

1. Approximate the GP; transform the data.

2. The resulting (approximate) restricted likelihood has a simple form;use that form to understand how the data determine the fit.

3. Extend tools from linear models.

Page 8: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Idea #1: Spectral approximation for a stationary GP

Data taken at locations sj 2 {0, 1, ...,M � 1}, M a multiple of 2.

Frequencies !m 2 {0, 1M , ..., 1

2 ,�12 + 1

M , ...,� 1M }, m = 0, 1, ...,M � 1.

Approximate the GP w(sj) by

g(sj) = a0 +2

M2 �1X

m=1

(am cos(2⇡!msj)� bm sin(2⇡!msj)) + aM2cos(2⇡!M

2sj)

am, bm have independent mean zero Gaussian priors with variances

proportional to �2s �(!m; ⇢), the spectral density of the GP covariance.

Based on Paciorek (2007), Wikle (2002).

Page 9: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

With the approximation g(sj

), the model becomes

y = X� + Zu + ✏

I Assuming no fixed e↵ects: X is a vector of 1’s.

I Omit a0 to avoid an identification problem.

IZ ’s columns are sin/cos functions and do not depend on unknowns.

IZ

0Z = Diag(1/2M, 1/2M, ..., 1/M); Z

01 = 0.

Iu ⇠ N(0,G ), G = �2

s Diag( 12M�(!m(j); ⇢),

1M�(!M/2; ⇢) )

I ✏ ⇠ N(0,R), R = �2e I

Page 10: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Columns 1 to 4 of the random e↵ects design matrix Z

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 50 100 150 200

−2−1

01

2

Index

t(z)[1

, ]

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 50 100 150 200

−2−1

01

2

Index

t(z)[2

, ]

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 50 100 150 200

−2−1

01

2

Index

t(z)[3

, ]

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 50 100 150 200

−2−1

01

2

Index

t(z)[4

, ]

Page 11: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Idea #1: Transform the data so the log RL is simple

Using the spectral approximation to the GP, the model is

y = X� + Zu + ✏

Pre-multiply this equation by (Z 0Z )�0.5

Z

0 to give:

v = (Z 0Z )

�0.5Z

0y = (Z 0

Z )0.5

u + (Z 0Z )

�0.5Z

0✏

Then

E(v) = 0

Cov(v) = �2sDiag(aj(⇢)) + �2

e I

aj(⇢) = �(!m(j); ⇢)

Page 12: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Idea #2: The (approximate) RL has a simple form.

v ⇠ N( 0, �2sDiag(aj(⇢)) + �2

e I )

v = (Z 0Z )

�0.5Z

0y

The (approximate) log RL for (�2s, ⇢,�

2e ) is the likelihood arising from v :

�1

2

M�1X

j=1

[ log(�2saj(⇢) + �2

e ) + v

2j / (�2

saj(⇢) + �2e ) ].

The keys to understanding this (approximate) RL as a function of

the data are the transformed data vj and the aj(⇢).

Page 13: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Given ⇢, the v

2j

follow a gamma-errors GLM

The log RL has the form of the likelihood from a gamma-errors GLMwith the identity link:

�1

2

M�1X

j=1

[ log(�2saj(⇢) + �2

e ) + v

2j / (�2

saj(⇢) + �2e ) ]

I The v

2j are the gamma-distributed “data”.

I 1/2 is the gamma’s shape parameter.

IE (v2

j ) = �2saj(⇢) + �2

e

I Var(v2j ) = 2 (�2

saj(⇢) + �2e )

2

Page 14: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

How does aj

(⇢) change with ⇢?

Exponential covariance function K : aj(⇢) for ⇢ = 5 and 16.

The horizontal axis is j .

0 50 100 150 200

05

1015

aj165

For larger ⇢, the aj(⇢) start higher and decline to zero more sharply.

Page 15: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Idea #2: Use this (approximate) RL to understand the fit

The approximate RL uses the data only through the vj ,

projections of y onto sin/cos functions of di↵erent frequencies.

The projections vj don’t depend on any unknowns.

The projections vj are the same for all GP covariance functions.

Using di↵erent GP models for the random e↵ect, fitting di↵erent gamma regressions to the same transformed data.

The model for the v

2j is a GLM with 3 parameters ) no overfitting.

Page 16: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

How do the data determine parameter estimates?

The “observations” are the v

2j ; parameters are fit such that:

E( v

2j | �2

s, ⇢,�2e ) = �2

s aj(⇢) + �2e .

The aj(⇢) are non-increasing in j and for large j ,

E( v

2j | �2

s, ⇢,�2e ) ⇡ �2

e .

Loosely,

I �2e is “in the middle” of the v

2j for large j .

I ⇢ fits the rate at which the v

2j decline for “small” j .

I �2s makes �2

saj(⇢)+ �2e go through the middle of the v2

j for “small” j .

Page 17: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Examples of v 2j

vs. j , with fits

Simulated: �2s = 2, ⇢ = 5,�2

e = 5 Red maple basal area

�2s smaller than �2

e Whopping non-stationarity

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

0 50 100 150 200

010

2030

4050

Index

v^2

0 100 200 300 400 500

01000

2000

3000

4000

5000

6000 1

2

3

45678

91011121314

15

16171819

20

2122232425262728293031323334353637383940414243444546474849505152535455565758596061626364

6566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208

209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288

289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344

345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440

441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559

Page 18: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Some conjectures about how the data determine estimates

An outlier inflates the v

2j for large j ’s (high frequencies) ) inflated �2

e .

Little e↵ect on v

2j for “small” j (low frequencies) and thus on �2

s and ⇢.

●●

●●●

●●

●●

●●

●●

●●●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

0 50 100 150 200

−50

510

15

Index

y

●●

●●●

●●

●●

●●

●●

●●●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

Page 19: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Data contaminated with shift

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

0 50 100 150 200

−50

510

Index

y

(a) data simulated from GP with�2s=2, �2

e

=5 and ⇢=5 with meanshift from 0 to 5 midway

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 50 100 150 200

−2−1

01

2

Index

t(z)[1

, ]

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 50 100 150 200

−2−1

01

2

Index

t(z)[2

, ]

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 50 100 150 200

−2−1

01

2

Indext(z

)[3, ]

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 50 100 150 200

−2−1

01

2

Index

t(z)[4

, ]

(b) first four columns of Z , thespectral basis matrix, on thedomain [1,2,...,199,200]

22 / 44

Page 20: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Hypothesize: how does this shift in mean a↵ect theestimates of the GP parameters ?

1v

22 will be inflated, this will lead to an inflated value of the estimateof �2

s .

2 So to capture the sharp decline in v

2j

’s the estimate of ⇢ will beinflated too.

3v

2j

’s for larger j ’s broadly una↵ected, hence the estimate of �2

e

willremain almost the same.

23 / 44

Page 21: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Parameter estimates (SE): data with mean shift

exact RL�2s �2

e

⇢actual values 2 5 5

uncontaminated 2.29 (0.11) 4.75 (0.11) 6.89 (0.82)contaminated 11.50 (0.36) 5.61 (0.06) 106.48 (6.06)actual values 10 0.1 16.67

uncontaminated 9.99 (0.34) 0.11 (0.01) 17.29 (0.67)contaminated 17.96 (0.77) 0.16 (0.01) 29.99 (1.32)

apprx RL�2s �2

e

⇢actual values 2 5 5

uncontaminated 2.39 (0.15) 4.90 (0.09) 13.51 (1.74)contaminated 11.60 (0.41) 5.33 (0.07) 122.0 (6.49)actual values 10 0.1 16.67

uncontaminated 10.02 (0.37) 0.32 (0.01) 32.70 (1.37)contaminated 19.70 (1.13) 0.37 (0.01) 61.49 (5.08)

25 / 44

Page 22: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

The data contain little information about ⌫

0 10 20 30 40 50 60

0.000

0.010

0.020

0.030

Index

a

nu=inf,rho=0.1nu=inf,rho=0.025nu=0.5,rho=0.1nu=0.5,rho=0.025

Figure: aj

’s for Matern (⌫=0.5) and Matern (⌫=1) for the spectralapproximation on the domain [0,1,...,62,63] on the horizontal axis.

30 / 46

Page 23: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Idea #3: Extend tools from linear models and GLMs

Plot the v

2j and fitted values �2

saj(⇢) + �2e vs. j .

This shows the data and model fit corresponding to the RL.

It’s a direct look at the “signal” for non-stationarity.

Added variable plots show how the data produce a fixed e↵ect’s estimate.

An AVP can be done in both the

Observation domain (y) and

Spectral domain (the vj)

Page 24: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Aim andModel

Spectral-transformingthe data

Fit of thetransformeddata

Identifyingnonstationarity

Investigatingmissingpredictors

Summary

How to applyto real data

Real dataanalysis

Conclusion18/42

Added variable plot in observation domain

Adding predictor C to the model y = X� + C↵+ u + ✏,X contains predictor already in the model.

Multiply both sides of the model equation by V

�0.5, ˆ denotesestimates from fitting a model with X .

Then multiply both sides by

P = I � V

�0.5X (X 0

V

�1X )

�1X

0V

�0.5

• Plot PV�0.5y vs PV�0.5

C .

Page 25: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Aim andModel

Spectral-transformingthe data

Fit of thetransformeddata

Identifyingnonstationarity

Investigatingmissingpredictors

Summary

How to applyto real data

Real dataanalysis

Conclusion16/42

Added variable plot in the spectral domain

Adding predictor C to the model y = X� + C↵+ u + ✏,X contains predictors already in the model.

Multiply both sides by (I � PX ),then multiply by (Z 0

Z )�0.5Z

0, to get

v

⇤ = (Z 0Z )�0.5

Z

0(I � PX )y : the vj from the residual y and

v

⇤C = (Z 0

Z )�0.5Z

0(I � PX )C : the vj from the residual C .

• Plot Dv

⇤ vs Dv

⇤C ,

where D = Diag(1/p�2s a(⇢) + �2

e ), ˆ denotes estimatesobtained from fitting a model with only X , no C .

Page 26: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Aim andModel

Spectral-transformingthe data

Fit of thetransformeddata

Identifyingnonstationarity

Investigatingmissingpredictors

Summary

How to applyto real data

Real dataanalysis

Conclusion22/42

Added variable plots (AVPs) in the two domains

The AVPs in the spectral domain and in the observationdomain estimate the same coe�cient for a particularpredictor.

The spectral domain AVP highlights particular large scaletrends in the data. The observation domain AVPhighlights particular localized details.

The spectral domain AVP involves the spectralapproximation which the observation domain AVP doesnot.

Page 27: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Back to the forest inventory data

Outcome: Red maple basal area Predictor: Elevation

Page 28: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Undoing the simplifying assumptions

The paper and supplements discuss these at length.

• Spectral approximation in 2-D: Each design-matrix column correspondsto a pair of frequencies, one in each dimension (Paciorek 2007).

• Data not on a grid: Pre-smooth to a grid. (9 a better way?)

• Fixed e↵ects: Regress them out and apply the spectral approximationto the residuals.

Page 29: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Plots of v 2j

vs. j for outcome and predictor

Outcome: Red maple basal area Predictor: Elevation

0 100 200 300 400 500

01000

2000

3000

4000

5000

6000 1

2

3

45678

91011121314

15

16171819

20

2122232425262728293031323334353637383940414243444546474849505152535455565758596061626364

6566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208

209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288

289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344

345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440

441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559

0 100 200 300 400 500

0200

400

600

800

1000

1

2

3

4

56

7

8

9

10

11

12

13

1415

1617

1819

20

21

22

2324

25

26

27282930

31323334

35

363738

3940

41

4243444546474849505152535455

565758596061626364

65

6667

6869

70

71

727374

75

767778798081828384858687888990919293949596979899100101102103

104105106107

108109110111

112113114115

116117118119

120121122123124125126

127128129130131132

133

134135136

137138139

140

141142143144145146147148

149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178

179

180

181182183184185186187188189190191192193194195196197198199200201202203204205206207208209

210211212213214215216217218219220221222223224225226227228229230231232233234235

236237238239240

241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292

293294295296

297

298299300301302303304305306307308309310311

312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348

349350351352353354355356

357358359360361362363364365366367368369370371372373374375376377

378379380

381382383

384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422

423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462

463

464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493

494495496497498499

500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559

The horizonal axis is j ) low frequencies at left, high frequencies at right

Page 30: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

AVPs for elevation

Spectral domain Observation domain

−0.4 −0.2 0.0 0.2

−3−2

−10

12

3 1

2

3

4

5

6

78

9

10

11

12

13

14

15

16 1718

19

20

21

22

2324

25

26

27

28

29

3031

32

3334

35

36

37

38

39

40

41

42

43

44 45

46

4748

49

5051

52

53

54

55

56

57

58

59

60

616263

64

65

666768

69

70

71

7273

74

75

7677

78

79

8081

8283

84

85

8687

88

89

90

91

92

939495

96

97

98 99100101

102

103

104

105

106

107

108

109110

111

112

113

114

115

116 117

118

119

120

121

122123

124

125

126

127

128

129

130

131

132

133

134

135136137

138139

140

141142

143

144

145

146

147

148

149

150

151

152153

154

155

156

157

158

159160161162163

164165

166

167

168

169170

171

172

173

174

175

176

177

178

179

180

181

182

183

184185186

187188

189190

191

192

193

194195

196

197198

199

200201

202

203

204205

206

207

208

209

210

211

212

213

214215

216

217218

219

220221

222

223

224

225226

227228

229

230231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251252253

254255256

257

258

259

260261

262

263

264265

266

267

268269

270

271

272

273 274

275276

277

278

279

280

281

282

283

284

285

286

287288

289

290

291292

293

294

295

296

297

298

299

300

301

302303

304305306

307

308

309

310

311

312313

314

315

316

317

318

319320

321

322

323

324325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340341342343344

345

346347

348

349

350351

352

353

354

355

356

357358

359360

361

362

363

364365

366

367

368

369

370

371372

373

374

375

376

377

378

379380

381

382

383

384

385

386

387

388

389390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406407

408

409

410

411

412

413

414415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432433

434

435436

437

438

439440

441

442

443

444

445446

447448

449

450451

452

453

454455

456

457

458

459

460

461462

463

464465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480481

482

483

484

485486

487

488

489490

491

492

493

494

495

496497

498

499

500501

502

503

504

505

506

507

508

509

510

511

512

513

514515

516

517

518

519520521

522

523

524

525526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541542

543

544

545

546

547548

549

550551

552

553

554555

556557558

559

●●

●●

●●

●●

● ●●

●●●

●● ●●

●●

●●

●● ● ●●

● ●●

●●●

●● ●

●● ●

●●●

●●

●●● ●

●●

●●

●● ●

●●●

● ●

●●●

●●●

●●●

● ●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

● ●●●

●●

●●

● ●

●●

● ●

●●

●●●

● ●

●●

●●

●● ●

●●

● ●●

● ●

●●● ●

●●

●●

−0.1 0.0 0.1 0.2

−20

24

Page 31: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Info from the AVPs vs. the actual fits

Elevation From the RLEstimate SE P-value �2

s ⇢ �2e

Intercept-only – – – 29.6 6.0 16.2AVP, Spectral -3.17 0.51 10�10 – – –

AVP, Observation -2.02 1.09 0.07 – – –Real fit -2.52 0.29 tiny 22.0 2.9 13.8

Page 32: Here’s a data analysis problem: For the 2002 forest ...Hodges/PubH8492/Lectures_17_MB_GP.pdfHere’s a standard model for analyzing data like this At spatial locations indexed by

Focus on the ideas, not on our specific choices

The important thing is the 3 ideas:

1. Approximate the GP; transform the data.

2. ) simpler forms, which makes the fit understandable.

3. Extend tools from linear models and GLMs.

All the specific choices we’ve made could be replaced (I think).