![Page 1: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/1.jpg)
Big data in agriculture
Andreas DrakosProject Manager, Agro-Know
![Page 2: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/2.jpg)
EDBT Special Track Big Data, Athens, March 2014 2
Presentation Outline
• The importance of Big Data in Agriculture
• Major challenges
• The agINFRA and SemaGrow solutions
• Supporting Global Initiatives
![Page 3: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/3.jpg)
EDBT Special Track Big Data, Athens, March 2014 3
INTRO TO OPEN DATA IN AGRICULTURE
Sour
ce: h
ttp:
//w
ww
.agr
icor
ner.c
om/s
hare
hold
er-d
eman
ds-t
o-sh
ape-
mod
ern-
agric
ultu
re/
![Page 4: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/4.jpg)
EDBT Special Track Big Data, Athens, March 2014 4
Agriculture data to solve major societal challenges
• All demographic and food demand projections suggest that, by 2050, the planet will face severe food crises due to our inability to meet agricultural demand – by 2050:– 9.3 billion global population, 34% higher than today– 70% of the world’s population will be urban, compared to 49%
today– food production (net of food used for biofuels) must increase by
70%
• According to these projections, and in order to achieve the forecasted food levels by 2050, a total investment of USD 83 billion per annum will be required
![Page 5: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/5.jpg)
EDBT Special Track Big Data, Athens, March 2014 5
Open Data in Agriculture
• In an era of Big Data, one of the most promising routes to bootstrap innovation in agriculture is by the use of Open Data:– e.g. provisioning, maintaining, enriching with relevant metadata,
making openly available a vast amount of information• The use and wide dissemination of these data sets is strongly
advocated by a number of global and national policy makers such as:– The New Alliance for Food Security and Nutrition G-8 initiative– Food & Agriculture Organization of the UN– DEFRA & DFID in UK– USDA & USAID in the US
![Page 6: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/6.jpg)
EDBT Special Track Big Data, Athens, March 2014 6
Open Data in agriculture: a political priority
“How Open Data can be harnessed to help meet the challenge of sustainably feeding nine billion people by 2050”
April, 2013, Washington, D.C. USA
![Page 7: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/7.jpg)
EDBT Special Track Big Data, Athens, March 2014 7
A huge market, globally
Food & Agricultural commodities production, http://faostat.fao.org
![Page 8: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/8.jpg)
EDBT Special Track Big Data, Athens, March 2014 8
Some figures
• Food - Gross Production Value globally in 2011: $2,318,966,621
• Agriculture - Gross Production Value globally in 2011: $2,405,001,443
• Investment in agriculture - Gross Capital Stock globally: $5,356,830,000
… they are big
![Page 9: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/9.jpg)
EDBT Special Track Big Data, Athens, March 2014 9
Open data for businesses
![Page 10: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/10.jpg)
EDBT Special Track Big Data, Athens, March 2014 10
Farmers starting to capitalize on Big Data technology
• Freeing farmers from the constraints of uncertain factors– Dairy farm in UK with ‘connected’ herd
• anticipating the risks of epidemics and spotting random factors in milk production
– Monsanto’s new acquisition protects farmers from weather issues
• The spread of smart sensors– Wine-growers in Spain reduced application of fertilizers
and fungicides by 20%, accompanied by a 15% improvement in overall productivity using humidity sensors
![Page 11: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/11.jpg)
EDBT Special Track Big Data, Athens, March 2014 11
![Page 12: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/12.jpg)
EDBT Special Track Big Data, Athens, March 2014 12
BIG DATA IN AGRICULTURE
![Page 13: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/13.jpg)
EDBT Special Track Big Data, Athens, March 2014 13
Agricultural data types I• Publications, theses, reports, other grey literature• Educational material and content, courseware• Research data, – Primary data, such as measurements & observations
structured, e.g. datasets as tablesdigitized, e.g. images, videos
– Secondary data, such as processed elaborationse.g. dendrograms, pie charts, models
• Sensor data
![Page 14: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/14.jpg)
EDBT Special Track Big Data, Athens, March 2014 14
Agricultural data types II
• Provenance information, incl. authors, their organizations and projects
• Experimental protocols & methods• Social data, tags, ratings, etc.• Germplasm data• Soil maps• Statistical data• Financial data
![Page 15: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/15.jpg)
EDBT Special Track Big Data, Athens, March 2014 15
Big Data demand…
• Storage– High volume storage– Impractical or impossible to use centralized storage
• Distribution• Federation
• Computational power – For efficient discovering / querying– For aggregating and processing– For joining
![Page 16: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/16.jpg)
EDBT Special Track Big Data, Athens, March 2014 16
Rationale: Problem statement
Enable the inclusion of:
• Large, live, constantly updated datasets and streams
• Heterogeneous data
Involve publishers that
• cannot or will not directly and immediately make the transition to standards and best practices
Open Agricultural Data Liaison Meeting 30-31/10/2013
![Page 17: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/17.jpg)
EDBT Special Track Big Data, Athens, March 2014 17
Use Cases (DLO)Heterogeneous Data Collections & Streams Big data:
– Sensor data: soil data, weather– GIS data: land usage, forest and natural resources management data– Historical data: crop yield, economic data– Forecasts: climate change models
Problem:– Combine heterogeneous sources to analyze past food production and
forecast future trends– Cannot clone and translate: large scale, live data streams– Cannot immediately and directly affect radical re-design of all sensing
and processing currently in place
3rd Plenary & ESG Meeting 21/10/2013
![Page 18: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/18.jpg)
EDBT Special Track Big Data, Athens, March 2014 18
Use Cases (FAO)Reactive Data Analysis Big data:
– Document collections: past experiences, analysis and research results– Databases: climate conditions and crop yield observations, economic
data (land and food prices) Problem:
– Retrieving complete and accurate information to compile reports• Raw data and reports, scientific publications, etc.
– Wastes human resources that could analyze data and synthesize useful knowledge and advice for food production• Too much time spent cross-relating responses from different sources
– Too many different organizations and processes rely on the different schemas to make re-design viable
– Cloning is inefficient: large and constantly updated stores
3rd Plenary & ESG Meeting 21/10/2013
![Page 19: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/19.jpg)
EDBT Special Track Big Data, Athens, March 2014 19
Use Cases (AK)Reactive Resource Discovery Big data:
– Multimedia content about agriculture and biodiversity
Problem:– Real-time retrieval of relevant content– Used to compile educational activities– Schema heterogeneity:
• Different providers (Oganic edunet, Europeana, VOA3R, etc.)
– Too many different organizations and processes rely on the different schema to make re-design viable
– Cloning is inefficient: large and constantly updated stores
3rd Plenary & ESG Meeting 21/10/2013
![Page 20: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/20.jpg)
EDBT Special Track Big Data, Athens, March 2014 20
THE AGINFRA & SEMAGROW SOLUTIONS
![Page 21: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/21.jpg)
EDBT Special Track Big Data, Athens, March 2014 21
The agINFRA project
• e-infrastructure for agricultural research resources (content/data) and services
• Higher interoperability between agricultural and other data resources (linked data)
• Improved research data services and tools using Grid and Cloud resources
![Page 22: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/22.jpg)
EDBT Special Track Big Data, Athens, March 2014 22
agINFRA Grid & Cloud resources• PARADOX cluster
704 CPU; 50 TB• Roma Tre cluster
350 CPUs; 100TB• Catania cluster
800 CPUs; 700 TB • SZTAKI cluster
8 CPUs• PARADOX upgrade
1696 CPU;100 TB
• Total: 3.5 kCPU; 0.9 PT
![Page 23: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/23.jpg)
EDBT Special Track Big Data, Athens, March 2014 23
The SemaGrow project
• Develop novel algorithms and methods for querying distributed triple stores
• Overcome problems stemming from heterogeneity and unbalanced distribution of data
• Develop scalable and robust semantic indexing algorithms that can serve detailed and accurate data summaries and other data source annotations about extremely large datasets
![Page 24: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/24.jpg)
EDBT Special Track Big Data, Athens, March 2014 24
The SemaGrow Stack
• Integrates the components in order to offer a single SPARQL endpoint that federates a number of heterogeneous data sources
• Targets the federation of independently provided data sources
• Use POWDER to mass-annotate large-subspaces– W3C recommendation, exploits natural groupings
of URIs to annotate all resources in a subset of the URI space
![Page 25: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/25.jpg)
EDBT Special Track Big Data, Athens, March 2014 25
Moving Forward
HARVESTER
OAI-PMH Service Provider #1
Schema #1
OAI-PMH Service Provider #n
Schema #n
INDEXER
AggregatedXML Repository
Web Portals
Open AGRIS (FAO)AgLR/GLN (ARIADNE)Organic.Edunet (UAH)
VOA3R (UAH)...
AGRIS AP Schema
IEEE LOM Schema
DC Schema
...
RDF Triple Store
Common Schema
SPARQL endpoint(Data Source #1)
SPARQL endpoint(Data Source #n)
INDEXER
Web Portals
SPARQL endpoint
NOW (2012) CASE OF AGRICULTURAL INFRASTRUCTURES 2015 (AgINFRA) CASE OF AGRICULTURAL INFRASTRUCTURES
![Page 26: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/26.jpg)
EDBT Special Track Big Data, Athens, March 2014 26
Query
Federated endpoint Wrapper
SemaGrow SPARQL endpoint
Resource Discovery
Query results
query fragment,Source
(#1)
Instance StatisticsData Summaries
SPARQL endpoint
POWDER Inference Layer
P-Store
InstanceStatistics
query fragment,target Source
transformed query
Query Decomposition
querypatterns
Query Results Merger
query fragment,Source
(#n)
queryresults
Client
Reactivityparameters
Query Decomposer
Data Source(s) Selector
Ctrl
Candidate Source(s) List· Instance Statistics· Load Info· Semantic Proximity
Query Transformation Service
SchemaMappings
SPARQL endpoint(Data Source #n)
SPARQLquery
Ctrl
Ctrl
Load Info
Instance Statistics
Data Summaries
Set of query
patternsQuery Pattern Discovery
Service
equivalentpatterns
querypattern
SemanticProximity
Resource Selector
query results schema
transformed schema
queryrequest #1
queryrequest #n
queryresults
SPARQL endpoint(Data Source #1)
SPARQLquery
Query Manager
What Semantic Web can bring into the picture
• One Data Access Point for the entire Data Cloud– Enabling Service-Data level agreements with Data providers
• Application-level Vocabularies / Thesauri / Ontologies– Enabling different application facets for different communities of users over the SAME data pool
• Going beyond existing Distributed Triple Store Implementations–Link Heterogeneous but Semantically Connected
Data–Index Extremely Large Information Volumes (Peta
Sizes)–Improve Information Retrieval response • Data (+Metadata)
physically stored in Data Provider– No need for harvesting
• Vocabularies / Thesauri / Ontologies of Data Provider choice– No need for aligning
according to common schemas
![Page 27: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/27.jpg)
EDBT Special Track Big Data, Athens, March 2014 27
SUPPORTING GLOBAL INITIATIVES
![Page 28: Big Data in Agriculture, the SemaGrow and agINFRA experience](https://reader036.vdocuments.site/reader036/viewer/2022062617/54c690d54a7959bc708b459c/html5/thumbnails/28.jpg)
EDBT Special Track Big Data, Athens, March 2014 28
Global Open Data for Agriculture and Nutrition (GODAN) godan.info
Research Data Alliance (RDA) rd-alliance.org Agricultural Data Interoperability Interest GroupWheat Data Interoperability Working Group
CIARD - global movement dedicated to open agricultural knowledge www.ciard.net
e-Conference on Germplasm Data Interoperability