ag-analytics data platform - cornell university...currently building a gis based mapping platform...

25
Joshua D. Woodard Assistant Professor and Zaitz Faculty Fellow in Agribusiness and Finance Dyson School of Applied Economics and Management Cornell University NY State Precision AgWorkshop Ag-Analytics Data Platform

Upload: others

Post on 29-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of

Joshua D. WoodardAssistant Professor and

Zaitz Faculty Fellow in Agribusiness and FinanceDyson School of Applied Economics and Management

Cornell University

NY State Precision Ag Workshop

Ag-Analytics Data Platform

Page 2: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of

The Data Integration Problem Analysts typically source data from many different government and non-

government sources, different temporal and spatial resolution Relevant data spread over a wide variety of operational/transaction

based databases, datamarts, unstructured text files, etc. Sources all have different data storage and formatting protocols, API’s,

different levels of temporal and spatial aggregation etc. Existing infrastructures can not be queried jointly, nor at all Not processed to scales appropriate for most uses Typical approach is to “one off ” for every study, to do the

following: At a point in time, download “slices” of data from several different sources, Then format (often by hand or copy/paste) individual data sets and mash

together (may take days or weeks; not automated/replicable/documented) Perform one off analysis To expand analysis or update, entire process must be recreated by human

Page 3: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of

A Fairly Small Sampling…

Page 4: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of

AgDB Data Warehousing Overview

AgDB Data Warehouse

External Clients

OLTP

Data Chunks

Scheduled JobsTo Download andExtract from Source Over Web Prepocessing/

Aggregation/Interpolation/Transformation

Data

Filter/clean

Data Auditing & Validation

External Databases, Datastores, Datastreams: RMA, USGS, NRCS, AMS, ERS, PRISMS, CME, NASA,

NASS, FSA, FAS, etc.

Web Data Services, OLAP, Data Marts

Web Decision Tools

Load

Integration Services

Page 5: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of

Ag-Analytics.org

Page 6: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of

Ag-Analytics.org An open source, open-data portal for ag and enviro data and

models Open data: get and use data for free, direct from platform Open source: see exact code for how data are sourced,

processed, transformed and stored; and contribute code

Page 7: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of

Abridged/Partial Summaries of Major Datasets/Sources Currently in AgDB

Data Source and Item DescriptionIPCC Climate Change Projections Future temperature and precipitation projections across different emission scenarios and percentiles of the 16 General

Circulation Models (GCMs).National Climatic Data Center Drought Data

Monthly PDSI drought index data available at the climate district level aggregation. Data is available from 1895 to present, by NCDC District.

PRISMs Climate Group Monthly and daily historical temperature and precipitation data, as well as GDD/HDD processed data. Monthly data is available from 1895 to present. Daily weather data is available from 1981 to present. 800 meter resolution (raw) and processed by FIPS, Township, and in certain cases CLU (pre-2008) available.

Chicago Mercantile Exchange Daily historical futures and options data for agricultural commodities from the Chicago Mercantile Exchange (CME), Chicago Board of Trade (CBOT), and Kansas City Board of Trade (KCBOT). Data is available from 1959 to present, updated daily.

Risk Management Agency (RMA) Agricultural insurance price and participation data available at the county level aggregation. Data is available from 1989 topresent from Summary of Business. Other data also loaded from various unstructured text files (including historical discovery prices, GRIP yields, etc.)

US Census Bureau County-level and township level geographical coordinates, land area size, water area size, and population data.

USDA Economic Research Service (ERS)

Annual farm structural and financial data available at state-level aggregation for the 15 Agricultural Resource Management Survey (ARMS) states. Data is available from 1996 to present. Other various datasets are also sourced from the ad hoc ERS tools and API’s.

USDA Agricultural Marketing Service (AMS)

Monthly data on the volume, pricing, and utilization of raw milk received by handlers regulated under Federal milk orders from dairy farmers. All tables in the Public MMO database.

USDA National Agricultural Statistics Service (NASS)

Census and survey data available at regional, state, and county level aggregation. The broad categories of data available arecrops, animals and products, economics, demographics, and environmental. Data is available from 1926 to present. Obtained via FTP bulk download from QuickStats. CDL data processed against ready to map gSSURGO NRCS data by crop also available (raw and county processed).

USDA Foreign Agricultural Service Data on production, supply and distribution of agricultural commodities for the U.S. and key producing and consuming countries.

USDA National Resource Conservation Service (NRCS)

Soil data for the continental US from gSSURGO, raw and processed available at various levels of aggregation.

Page 8: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of

Applications & Accessing Data Applications: Virtually anything in the broader ag and

environmental domain for policy, risk, economics and finance Insurance Conservation and Climate Change Policy Analysis and oversight Farm Bill Program Analysis Product Development

Page 9: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of

Tools Data access tools and API’s (industrial, for developers, data analysts Facilitates end user tool development Automates processes for getting data Improves reliability of research Makes possible the previously not possible (or only possible for a few at

high cost)

End user and visualization tools RMA Premium calculator Yield and weather visualization tools Mapping applications Dairy margin protection tool S02 wine

Page 10: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of
Page 11: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of
Page 12: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of
Page 13: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of
Page 14: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of
Page 15: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of
Page 16: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of

CKAN Open Data Portal Software

Page 17: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of
Page 18: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of
Page 19: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of
Page 20: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of
Page 21: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of
Page 22: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of

Ongoing Efforts and Priorities Currently building a GIS based mapping platform for data

exploration Recently received a Microsoft Azure Research Grant for

use of Azure cloud platform, currently converting Additional datasets, API’s, tools (ongoing) Open Source launch Upgrading data portal interface, more extensive metadata,

flexible cataloging/access (CKAN and other) Incorporation of NoSQL platforms Identify various user needs, partners, and collaborators

Page 23: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of

Challenges, Policy Considerations, and Opportunities

Technical and training Some degree of learning curve, but frankly minimal (we teach this to

undergrads) Technical limitations (networking, processing, etc.) are eroding quickly

Inherently a public good, without intervention will be under-provisioned Marginal cost curse leads to lack of action Coordination within the community How can we work together? Goal: Open source eco-systems for data curation and systems development Government AND Universities and others must be involved Improving access to government data (incentives and bandwidth vs reasonable

delivery formats) Not only what but HOW data are made available is of utmost importance,

otherwise not usable Not a “FOIA-able” solution, must work together!

Page 24: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of

Challenges, Policy Considerations, and Opportunities

Next Horizons: Secure Data Warehouses for Integrating Agency and other data (RMA, FSA, etc.)

Some work simply can’t be done without linking these together Example: integrating soil information into insurance rates and programs,

modifications to properly treat or incentive conservation, soil health, etc.

Privacy concerns Many precedents, well within allowable law

Field is at an interesting vantage point compared to many others given mix of market, business, environmental and other natural systems data= LOTS OF OPPORTUNITY!

Page 25: Ag-Analytics Data Platform - Cornell University...Currently building a GIS based mapping platform for data exploration Recently received a Microsoft Azure Research Grant for use of

Thank you Questions?