some uses of r at usda nasssep 11, 2017 · a bayesian approach to estimating agricultural yield...
TRANSCRIPT
![Page 1: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/1.jpg)
Some Uses of R at USDA NASS
Nathan B. [email protected]
United States Department of AgricultureNational Agricultural Statistics Service (NASS)
EPA R Users Group WorkshopWashington, DC
September 11, 2017
“. . . providing timely, accurate, and useful statistics in service to U.S. agriculture.” 1
![Page 2: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/2.jpg)
Me and My Agency
About me
I Mathematical statistician
I Research and Development Division, Sampling and EstimationResearch Section
I Day-to-day applied research activities often begin with R
About USDA NASS
I Conducts Census of Agriculture every 5 years
I Conducts over 400 surveys annually
I Primarily a SAS shop
Some Uses of R at USDA NASS 2
![Page 3: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/3.jpg)
Project I: Iterative Sequential Regression (ISR)
NASS conducts Agricultural Resource Management Survey(ARMS) in collaboration with USDA’s Economic Research Service
Main Idea: Proposed ISR imputation methodology improvesdistributional characteristics and helps preserve relationships in thedata
I Data augmentation and fully conditional specification
I Bayesian technique implemented through Gibbs sampling
Development involved multiple stakeholders and extended paralleltesting
Some Uses of R at USDA NASS 3
![Page 4: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/4.jpg)
Project I: Iterative Sequential Regression (ISR)
![Page 5: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/5.jpg)
Project I: Iterative Sequential Regression (ISR)
Development and testing resulted in the R scripts used in practice
Conditional mean imputation ‘module’ replaced with ISR moduleI ISR machine imputation performed on a separate Linux box
I Interaction with existing infrastructureI Written in R and C codeI File input-output with SAS interface
I ISR output subsequently loaded for editing
R has been used to execute ISR imputation in production ofARMS Phase III estimates for the past three years
Some Uses of R at USDA NASS 5
![Page 6: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/6.jpg)
Basic Mapping: Corn for Grain
Non−Spec
Speculative
USDA NASS Corn for Grain Estimation Program
I Annual program: 41 states (production of grain)
I Speculative region: 10 states
I Yield forecasts: August-November, January (final)
![Page 7: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/7.jpg)
Basic Mapping: Soybeans
Non−Spec
Speculative
USDA NASS Soybean Estimation Program
I Annual program: 31 states
I Speculative region: 11 states
I Yield forecasts: August-November, January (final)
![Page 8: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/8.jpg)
Basic Mapping: Upland Cotton
Non−Spec
Speculative
USDA NASS Upland Cotton Estimation Program
I Annual program: 17 states
I Speculative region: 6 states
I Yield forecasts: August-January, May (final)
![Page 9: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/9.jpg)
Basic Mapping: Winter Wheat
Non−Spec
Speculative
USDA NASS Winter Wheat Estimation Program
I Annual program: 42 states
I Speculative region: 10 states
I Yield forecasts: May-September (final)
![Page 10: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/10.jpg)
Basic Mapping: Winter Wheat Harvest Progress
25
30
35
40
45
50
−120 −100 −80long
lat
0
25
50
75
100Percent Harvested
Winter Wheat−−Percent Harvested as of Week 22
Some Uses of R at USDA NASS 10
![Page 11: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/11.jpg)
Basic Mapping: Winter Wheat Harvest Progress
25
30
35
40
45
50
−120 −100 −80long
lat
0
25
50
75
100Percent Harvested
Winter Wheat−−Percent Harvested as of Week 24
Some Uses of R at USDA NASS 11
![Page 12: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/12.jpg)
Basic Mapping: Winter Wheat Harvest Progress
25
30
35
40
45
50
−120 −100 −80long
lat
0
25
50
75
100Percent Harvested
Winter Wheat−−Percent Harvested as of Week 26
Some Uses of R at USDA NASS 12
![Page 13: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/13.jpg)
Basic Mapping: Winter Wheat Harvest Progress
25
30
35
40
45
50
−120 −100 −80long
lat
0
25
50
75
100Percent Harvested
Winter Wheat−−Percent Harvested as of Week 28
Some Uses of R at USDA NASS 13
![Page 14: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/14.jpg)
Basic Mapping: Winter Wheat Harvest Progress
25
30
35
40
45
50
−120 −100 −80long
lat
0
25
50
75
100Percent Harvested
Winter Wheat−−Percent Harvested as of Week 30
Some Uses of R at USDA NASS 14
![Page 15: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/15.jpg)
Project II: Crop Yield ForecastingOfficial NASS crop yield forecasts in Crop Production Reportrepresent consensus of NASS Agricultural Statistics Board (ASB)
Main Idea: Develop Bayesian hierarchical models to produceone-number forecasts
I Synthesize several survey estimatesI Produce measures of uncertaintyI Gibbs sampler used to obtain Monte Carlo estimatesI Currently we support speculative regions and member states
0.00
0.05
0.10
0.15
0.20
Posterior Density of Yield Forecast for State 2 (June 2012 Forecast)
Yield (bushels/acre)
Den
sity
20 80
● ●●
●
●
●
OYSAYSOYS.adjAYS.adjCovariatesPosterior Mean
0.00
0.05
0.10
0.15
0.20
Posterior Density of Yield Forecast for State 4 (June 2012 Forecast)
Yield (bushels/acre)
Den
sity
20
● ● ●
●
●
●
OYSAYSOYS.adjAYS.adjCovariatesPosterior Mean
0.00
0.05
0.10
0.15
0.20
Posterior Density of Yield Forecast for State 9 (June 2012 Forecast)
Yield (bushels/acre)
Den
sity
20 80
● ● ●
●
●
●
OYSAYSOYS.adjAYS.adjCovariatesPosterior Mean
JSM 2016–A Bayesian Hierarchical Model for Combining Several Crop Yield Indications 15
Some Uses of R at USDA NASS 15
![Page 16: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/16.jpg)
Project II: Crop Yield ForecastingUse in Production: Research staff members have used R as a‘conduit’ to execute a Gibbs sampler
I R calls C code for corn and soybean yield models (2011) andwinter wheat yield models (2015)
I R calls JAGS for new upland cotton yield models (2017)I Modeled estimates are provided to Agricultural Statistics
Board for their deliberations
APS
AYSAYSAYSAYS
OYSOYSOYSOYSOYS
● ● ● ● ●
Crop
Production
Report
Crop
Production
Report
Crop
Production
Report
Crop
Production
Report
Small
Grains
Summary
APS
AYS
OYS
May Jun Jul Aug Sep OctMonth
Survey and Publication Timeline for Winter Wheat
Some Uses of R at USDA NASS 16
![Page 17: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/17.jpg)
Project II: Crop Yield ForecastingVisualizing the sequence of model-based forecasts versus NASSofficial statistics
State 1 State 2 State 3 State 4 State 5 State 6 State 7 State 8 State 9 State 10 Region
●
●●●●●●
●●●
●●●
●●
●
●
●
●
●
●●●●
● ●●●●
●●
●
●
●●
●●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●
●
●●
●●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●●
●●
●●● ●
●●●●
●
●●
●●
●●
●
●●
●
●●
●
●
●●
●
●●●
●
●●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●●●
●
●
●
●
●●
●
●●
●●
●●●●●
●●
●
●●
●
●
●
●●
●●
●
●●●
●●●
●
●●
●
●●
●
●
●
●
●
●●●
● ● ● ●●● ●
● ●●
● ● ●●
●
● ●
● ● ●
May
June
July
Aug
Sep
t
May
June
July
Aug
Sep
t
May
June
July
Aug
Sep
t
May
June
July
Aug
Sep
t
May
June
July
Aug
Sep
t
May
June
July
Aug
Sep
t
May
June
July
Aug
Sep
t
May
June
July
Aug
Sep
t
May
June
July
Aug
Sep
t
May
June
July
Aug
Sep
t
May
June
July
Aug
Sep
t
Month
Yie
ld
Source●
●
●
●
ASBModel95% CI.u95% CI.l
Year 2012 Comparisons: Published Yield and Model−based Yield Indications
Some Uses of R at USDA NASS 17
![Page 18: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/18.jpg)
Project II: Crop Yield Forecasting
Visualizing a model-based ‘rule-of-thumb’ for the combination ofseveral disparate estimates
May June July August September
0.00
0.25
0.50
0.75
1.00
CO IL KS
MO
MT
NE
OH
OK TX
WA
SP
EC
CO IL KS
MO
MT
NE
OH
OK TX
WA
SP
EC
CO IL KS
MO
MT
NE
OH
OK TX
WA
SP
EC
CO IL KS
MO
MT
NE
OH
OK TX
WA
SP
EC
CO IL KS
MO
MT
NE
OH
OK TX
WA
SP
EC
Unit
Wei
ght Source
AYS Covariates OYS Sept. APS
Some Uses of R at USDA NASS 18
![Page 19: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/19.jpg)
Project III: Integer Calibration for Dual System Estimation
NASS adopted dual system estimation (DSE) approach in 2012Census of Agriculture–applied DSE in other surveys
Main Ideas:
I DSE weights help adjust for undercoverage, nonresponse,misclassification
I Desire integer weights–consistent totals
I Desire estimates that satisfy many calibration targets
I Integer Calibration (INCA) improves rounding of weights andsatisfaction of multiple targets
Some Uses of R at USDA NASS 19
![Page 20: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/20.jpg)
Project III: Integer Calibration for Dual System EstimationPrevious approach:
INCA:
INCA was developed in R and the ‘INCA’ package exists onCRAN. DSE with INCA was executed in R for the the 2015Local Foods Survey.
Some Uses of R at USDA NASS 20
![Page 21: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/21.jpg)
Project III: Integer Calibration for Dual System EstimationINCA weights are ‘closer’ to (more highly correlated with) originalDSE weights
Some Uses of R at USDA NASS 21
![Page 22: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/22.jpg)
Project III: Integer Calibration for Dual System EstimationINCA weights ensure that estimated totals miss fewer calibrationtargets
Some Uses of R at USDA NASS 22
![Page 23: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/23.jpg)
Additional Production and Research Projects in R
R in Production
1. Small area models for cash rental rates (2013)
2. Quarterly model-based estimates of cattle inventory
3. Simulated annealing for substratification of primary samplingunits in June Area Survey
R in Research and Development
1. Quarterly model-based estimates of hog inventory
2. Small area crop acreage, yield, and production estimates
Some Uses of R at USDA NASS 23
![Page 24: Some Uses of R at USDA NASSSep 11, 2017 · A Bayesian approach to estimating agricultural yield based on multiple repeated surveys. Journal of Agricultural, Biological, and Environmental](https://reader033.vdocuments.site/reader033/viewer/2022050118/5f4ef9c24fcfe171c6180780/html5/thumbnails/24.jpg)
References
Cruze, N. B. (2016). A Bayesian hierarchical model for combining several cropyield indications. In JSM Proceedings, Survey Research Methods Section.Alexandria, VA: American Statistical Association.
Miller, D., Dau, A., and Lisic, J. (2015). Comparison of modern imputationmethodologies on complex data from agricultural operations. In Proceedingsof the 2015 FCSM Research Conference [online, accessed 10-sept-2017].https://www.nass.usda.gov/Education_and_Outreach/Reports,
_Presentations_and_Conferences/reports/conferences/FCSM/D1_
Miller_2015FCSM.pdf.
Nandram, B., Berg, E., and Barboza, W. (2014). A hierarchical Bayesianmodel for forecasting state-level corn yield. Environmental and EcologicalStatistics, 21(3):507–530.
Sartore, L., Toppin, K., and Spiegelman, C. (2016). Integer programming forcalibration. In Presentations of 2016 Federal CASIC Workshops [online,accessed 10-sept-2017].https://www.census.gov/fedcasic/fc2016/ppt/1_5_Integer.pdf.
Wang, J. C., Holan, S. H., Nandram, B., Barboza, W., Toto, C., andAnderson, E. (2012). A Bayesian approach to estimating agricultural yieldbased on multiple repeated surveys. Journal of Agricultural, Biological, andEnvironmental Statistics, 17(1):84–106.