some uses of r at usda nasssep 11, 2017  · a bayesian approach to estimating agricultural yield...

24
Some Uses of R at USDA NASS Nathan B. Cruze [email protected] United States Department of Agriculture National Agricultural Statistics Service (NASS) EPA R Users Group Workshop Washington, DC September 11, 2017 ... providing timely, accurate, and useful statistics in service to U.S. agriculture.” 1

Upload: others

Post on 16-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Some Uses of R at USDA NASS

Nathan B. [email protected]

United States Department of AgricultureNational Agricultural Statistics Service (NASS)

EPA R Users Group WorkshopWashington, DC

September 11, 2017

“. . . providing timely, accurate, and useful statistics in service to U.S. agriculture.” 1

Page 2: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Me and My Agency

About me

I Mathematical statistician

I Research and Development Division, Sampling and EstimationResearch Section

I Day-to-day applied research activities often begin with R

About USDA NASS

I Conducts Census of Agriculture every 5 years

I Conducts over 400 surveys annually

I Primarily a SAS shop

Some Uses of R at USDA NASS 2

Page 3: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Project I: Iterative Sequential Regression (ISR)

NASS conducts Agricultural Resource Management Survey(ARMS) in collaboration with USDA’s Economic Research Service

Main Idea: Proposed ISR imputation methodology improvesdistributional characteristics and helps preserve relationships in thedata

I Data augmentation and fully conditional specification

I Bayesian technique implemented through Gibbs sampling

Development involved multiple stakeholders and extended paralleltesting

Some Uses of R at USDA NASS 3

Page 4: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Project I: Iterative Sequential Regression (ISR)

Page 5: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Project I: Iterative Sequential Regression (ISR)

Development and testing resulted in the R scripts used in practice

Conditional mean imputation ‘module’ replaced with ISR moduleI ISR machine imputation performed on a separate Linux box

I Interaction with existing infrastructureI Written in R and C codeI File input-output with SAS interface

I ISR output subsequently loaded for editing

R has been used to execute ISR imputation in production ofARMS Phase III estimates for the past three years

Some Uses of R at USDA NASS 5

Page 6: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Basic Mapping: Corn for Grain

Non−Spec

Speculative

USDA NASS Corn for Grain Estimation Program

I Annual program: 41 states (production of grain)

I Speculative region: 10 states

I Yield forecasts: August-November, January (final)

Page 7: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Basic Mapping: Soybeans

Non−Spec

Speculative

USDA NASS Soybean Estimation Program

I Annual program: 31 states

I Speculative region: 11 states

I Yield forecasts: August-November, January (final)

Page 8: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Basic Mapping: Upland Cotton

Non−Spec

Speculative

USDA NASS Upland Cotton Estimation Program

I Annual program: 17 states

I Speculative region: 6 states

I Yield forecasts: August-January, May (final)

Page 9: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Basic Mapping: Winter Wheat

Non−Spec

Speculative

USDA NASS Winter Wheat Estimation Program

I Annual program: 42 states

I Speculative region: 10 states

I Yield forecasts: May-September (final)

Page 10: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Basic Mapping: Winter Wheat Harvest Progress

25

30

35

40

45

50

−120 −100 −80long

lat

0

25

50

75

100Percent Harvested

Winter Wheat−−Percent Harvested as of Week 22

Some Uses of R at USDA NASS 10

Page 11: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Basic Mapping: Winter Wheat Harvest Progress

25

30

35

40

45

50

−120 −100 −80long

lat

0

25

50

75

100Percent Harvested

Winter Wheat−−Percent Harvested as of Week 24

Some Uses of R at USDA NASS 11

Page 12: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Basic Mapping: Winter Wheat Harvest Progress

25

30

35

40

45

50

−120 −100 −80long

lat

0

25

50

75

100Percent Harvested

Winter Wheat−−Percent Harvested as of Week 26

Some Uses of R at USDA NASS 12

Page 13: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Basic Mapping: Winter Wheat Harvest Progress

25

30

35

40

45

50

−120 −100 −80long

lat

0

25

50

75

100Percent Harvested

Winter Wheat−−Percent Harvested as of Week 28

Some Uses of R at USDA NASS 13

Page 14: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Basic Mapping: Winter Wheat Harvest Progress

25

30

35

40

45

50

−120 −100 −80long

lat

0

25

50

75

100Percent Harvested

Winter Wheat−−Percent Harvested as of Week 30

Some Uses of R at USDA NASS 14

Page 15: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Project II: Crop Yield ForecastingOfficial NASS crop yield forecasts in Crop Production Reportrepresent consensus of NASS Agricultural Statistics Board (ASB)

Main Idea: Develop Bayesian hierarchical models to produceone-number forecasts

I Synthesize several survey estimatesI Produce measures of uncertaintyI Gibbs sampler used to obtain Monte Carlo estimatesI Currently we support speculative regions and member states

0.00

0.05

0.10

0.15

0.20

Posterior Density of Yield Forecast for State 2 (June 2012 Forecast)

Yield (bushels/acre)

Den

sity

20 80

● ●●

OYSAYSOYS.adjAYS.adjCovariatesPosterior Mean

0.00

0.05

0.10

0.15

0.20

Posterior Density of Yield Forecast for State 4 (June 2012 Forecast)

Yield (bushels/acre)

Den

sity

20

● ● ●

OYSAYSOYS.adjAYS.adjCovariatesPosterior Mean

0.00

0.05

0.10

0.15

0.20

Posterior Density of Yield Forecast for State 9 (June 2012 Forecast)

Yield (bushels/acre)

Den

sity

20 80

● ● ●

OYSAYSOYS.adjAYS.adjCovariatesPosterior Mean

JSM 2016–A Bayesian Hierarchical Model for Combining Several Crop Yield Indications 15

Some Uses of R at USDA NASS 15

Page 16: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Project II: Crop Yield ForecastingUse in Production: Research staff members have used R as a‘conduit’ to execute a Gibbs sampler

I R calls C code for corn and soybean yield models (2011) andwinter wheat yield models (2015)

I R calls JAGS for new upland cotton yield models (2017)I Modeled estimates are provided to Agricultural Statistics

Board for their deliberations

APS

AYSAYSAYSAYS

OYSOYSOYSOYSOYS

● ● ● ● ●

Crop

Production

Report

Crop

Production

Report

Crop

Production

Report

Crop

Production

Report

Small

Grains

Summary

APS

AYS

OYS

May Jun Jul Aug Sep OctMonth

Survey and Publication Timeline for Winter Wheat

Some Uses of R at USDA NASS 16

Page 17: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Project II: Crop Yield ForecastingVisualizing the sequence of model-based forecasts versus NASSofficial statistics

State 1 State 2 State 3 State 4 State 5 State 6 State 7 State 8 State 9 State 10 Region

●●●●●●

●●●

●●●

●●

●●●●

● ●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

● ● ● ●●● ●

● ●●

● ● ●●

● ●

● ● ●

May

June

July

Aug

Sep

t

May

June

July

Aug

Sep

t

May

June

July

Aug

Sep

t

May

June

July

Aug

Sep

t

May

June

July

Aug

Sep

t

May

June

July

Aug

Sep

t

May

June

July

Aug

Sep

t

May

June

July

Aug

Sep

t

May

June

July

Aug

Sep

t

May

June

July

Aug

Sep

t

May

June

July

Aug

Sep

t

Month

Yie

ld

Source●

ASBModel95% CI.u95% CI.l

Year 2012 Comparisons: Published Yield and Model−based Yield Indications

Some Uses of R at USDA NASS 17

Page 18: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Project II: Crop Yield Forecasting

Visualizing a model-based ‘rule-of-thumb’ for the combination ofseveral disparate estimates

May June July August September

0.00

0.25

0.50

0.75

1.00

CO IL KS

MO

MT

NE

OH

OK TX

WA

SP

EC

CO IL KS

MO

MT

NE

OH

OK TX

WA

SP

EC

CO IL KS

MO

MT

NE

OH

OK TX

WA

SP

EC

CO IL KS

MO

MT

NE

OH

OK TX

WA

SP

EC

CO IL KS

MO

MT

NE

OH

OK TX

WA

SP

EC

Unit

Wei

ght Source

AYS Covariates OYS Sept. APS

Some Uses of R at USDA NASS 18

Page 19: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Project III: Integer Calibration for Dual System Estimation

NASS adopted dual system estimation (DSE) approach in 2012Census of Agriculture–applied DSE in other surveys

Main Ideas:

I DSE weights help adjust for undercoverage, nonresponse,misclassification

I Desire integer weights–consistent totals

I Desire estimates that satisfy many calibration targets

I Integer Calibration (INCA) improves rounding of weights andsatisfaction of multiple targets

Some Uses of R at USDA NASS 19

Page 20: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Project III: Integer Calibration for Dual System EstimationPrevious approach:

INCA:

INCA was developed in R and the ‘INCA’ package exists onCRAN. DSE with INCA was executed in R for the the 2015Local Foods Survey.

Some Uses of R at USDA NASS 20

Page 21: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Project III: Integer Calibration for Dual System EstimationINCA weights are ‘closer’ to (more highly correlated with) originalDSE weights

Some Uses of R at USDA NASS 21

Page 22: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Project III: Integer Calibration for Dual System EstimationINCA weights ensure that estimated totals miss fewer calibrationtargets

Some Uses of R at USDA NASS 22

Page 23: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

Additional Production and Research Projects in R

R in Production

1. Small area models for cash rental rates (2013)

2. Quarterly model-based estimates of cattle inventory

3. Simulated annealing for substratification of primary samplingunits in June Area Survey

R in Research and Development

1. Quarterly model-based estimates of hog inventory

2. Small area crop acreage, yield, and production estimates

Some Uses of R at USDA NASS 23

Page 24: Some Uses of R at USDA NASSSep 11, 2017  · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental

References

Cruze, N. B. (2016). A Bayesian hierarchical model for combining several cropyield indications. In JSM Proceedings, Survey Research Methods Section.Alexandria, VA: American Statistical Association.

Miller, D., Dau, A., and Lisic, J. (2015). Comparison of modern imputationmethodologies on complex data from agricultural operations. In Proceedingsof the 2015 FCSM Research Conference [online, accessed 10-sept-2017].https://www.nass.usda.gov/Education_and_Outreach/Reports,

_Presentations_and_Conferences/reports/conferences/FCSM/D1_

Miller_2015FCSM.pdf.

Nandram, B., Berg, E., and Barboza, W. (2014). A hierarchical Bayesianmodel for forecasting state-level corn yield. Environmental and EcologicalStatistics, 21(3):507–530.

Sartore, L., Toppin, K., and Spiegelman, C. (2016). Integer programming forcalibration. In Presentations of 2016 Federal CASIC Workshops [online,accessed 10-sept-2017].https://www.census.gov/fedcasic/fc2016/ppt/1_5_Integer.pdf.

Wang, J. C., Holan, S. H., Nandram, B., Barboza, W., Toto, C., andAnderson, E. (2012). A Bayesian approach to estimating agricultural yieldbased on multiple repeated surveys. Journal of Agricultural, Biological, andEnvironmental Statistics, 17(1):84–106.