![Page 1: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/1.jpg)
Global Agricultural Concept SchemeThe collaborative integration of three thesauri
Prof Dr Thomas Baker [2]
Dr Osma Suominen [1]
Dini Jahrestagung“Linked Data – Vision und Wirklichkeit”
Frankfurt, 28. Oktober 2015
[1] Sungkyunkwan University (Korea) and Dublin Core Metadata Initiative[2] National Library of Finland
![Page 2: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/2.jpg)
http://www.iskouk.org/content/great-debate
![Page 3: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/3.jpg)
http://www.iskouk.org/content/great-debate
Concept-based thesauri, published with modern toolsand available as Linked Data, are useful!
![Page 4: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/4.jpg)
Three big thesauri in agricultureThree thesauri of terms and concepts related to agriculture -- concepts like rice, ricefield aquaculture, and plant pests.
● FAO – Food and Agriculture Organization of the United Nations● CABI – Centre for Biosciences and Agriculture International (UK)● NAL – National Agricultural Library (US)
![Page 5: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/5.jpg)
Separate thesauriSeparate databases
![Page 6: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/6.jpg)
Create GACS as glue linking them together
![Page 7: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/7.jpg)
Global Agricultural Concept Scheme (GACS)
1. Improve semantic interoperability of the thesauri2. Provide core concepts.3. Achieve efficiencies through cooperative maintenance.
![Page 8: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/8.jpg)
Requirements1. Integrated view2. Reuse of work, such as translations3. Compatibility with existing databases4. Based on RDF technologies: URIs, SKOS...5. Available as Linked Open Data
Based on, mapped to, but independent of, its three source thesauri.
![Page 9: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/9.jpg)
AGROVOC CAB Thesaurus NAL Thesaurus
140,000concepts,
>1.4M labels
32,000concepts,
>1.2M labels
53,000concepts,
>200k labels
English, Spanish, Portuguese, German, Czech, Persian, Polish, Hindi, French, Italian, Russian, Japanese, Hungarian, Chinese, Slovak, Thai, Lao, Turkish, Korean, Arabic, Telugu ...
English, Spanish, Portuguese, Dutch+ many languages with lower coverage
English, Spanish
First step: represent all three in SKOS
![Page 10: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/10.jpg)
Obtained via automatic mappings created using AgreementMakerLight
First rough estimate
![Page 11: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/11.jpg)
Long tail distribution (in AGRIS)10,000 concepts cover nearly 99% of occurrences in metadata
![Page 12: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/12.jpg)
Top 10,000 concepts from eachEach partner organization provided the 10,000 concepts most frequently used in their respective databases.
Added:● all countries● all higher-level organisms
![Page 13: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/13.jpg)
Automated mappings
Created using AgreementMakerLight softwarebetween the full thesauri, for completeness
![Page 14: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/14.jpg)
Human evaluation of mappingsCreated Google Docs spreadsheets using the lists of selected concepts and the auto-generated mappings. Three sheets with circa 10,700 rows each.
Mappings manually evaluated bystaff of partner organizations.
Evaluated 60 to 150 rows/hour.Evaluation took 500 to 600 hoursfor GACS Beta.
![Page 15: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/15.jpg)
Starting point October 2014
![Page 16: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/16.jpg)
30,000 mappings later... January 2015
![Page 17: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/17.jpg)
4,689 mappings later... February 2015
![Page 18: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/18.jpg)
5,522 mappings later... March 2015
![Page 19: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/19.jpg)
Forming GACS conceptsby merging the source concepts and aggregating their information
riceUF paddyUF paddy rice
cerealsUF feed cerealsUF small grain cereals (grain)
Oryza sativaUF Oryza glutinosaUF Oryza indicaUF Oryza japonicaUF Oryza sativa … (subsp, var etc.)
OryzaUF PadiaUF rice (plant)
agrovoc:c_5435cabt:82917nalt:56271
exactMatch
agrovoc:c_5438cabt:82935nalt:56277
exactMatch
agrovoc:c_1474cabt:26247
exactMatch
agrovoc:c_6599cabt:101613
nalt:56293
exactMatch
(Note: GACS uses SKOS, not traditional thesaurus tags)
![Page 20: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/20.jpg)
Lumpsclusters of concepts mapped one-to-several, several-to-one, or in spirals
![Page 21: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/21.jpg)
15,090 concepts; 972 lumps
Lumps
March 2015
![Page 22: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/22.jpg)
15,278 concepts; 339 lumps
Lumps
April 2015
![Page 23: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/23.jpg)
15,411 concepts; 84 lumps
Lumps
October 2015
![Page 24: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/24.jpg)
15,406 concepts; no lumps Last week
![Page 25: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/25.jpg)
Polyhierarchy?
countries
developing countries
Argentina
Buenos Aires
development
socioeconomicdevelopment
economicdevelopment
sciences
social sciences
economics
![Page 26: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/26.jpg)
Concept types?
![Page 27: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/27.jpg)
Concept types!
Plus Product
![Page 28: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/28.jpg)
Towards GACS roll-out (2016)
• Concept scheme as Linked Data. Own publication and editorial platform.
• Quality improvements. Inconsistencies in hierarchy, choice of labels, scope notes and definitions.
• Own semantic structure. Common vs scientific names, custom relationships, concept types.
![Page 29: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/29.jpg)
VocBench for editing
![Page 30: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/30.jpg)
Skosmos for display and browsing
![Page 31: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/31.jpg)
Size of GACS
GACS GACS Beta 1.1• 15,406 concepts• 398,216 labels in
28 languages
![Page 32: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/32.jpg)
Extension module for what remains?
![Page 33: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/33.jpg)
AGROVOC and NALT may be phased out
Extension module?
GACS
CABT
GACS
![Page 34: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/34.jpg)
Agrisemantics
http://aims.fao.org/sites/default/files/Report_workshop_Agrisemantics.pdf
![Page 35: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/35.jpg)
Global food security and climate change
• GACS as hub for agricultural code lists, taxonomies, statistical indicators...
• Simplify data normalization and integration• More coherent datasets and research results• Help farmers become more efficient
![Page 36: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/36.jpg)
Reports available on the FAO AIMS sitehttp://aims.fao.org/community/agrovoc/blogs/phase-one-gacs-approved-read-reports
http://aims.fao.org/sites/default/files/Report_workshop_Agrisemantics.pdf
[email protected]@tombaker.org
![Page 37: Global Agricultural Concept SchemeThe collaborative integration of three thesauri](https://reader034.vdocuments.site/reader034/viewer/2022042723/587ae4cc1a28ab542b8b70bf/html5/thumbnails/37.jpg)
Abstract
The Food and Agricultural Organization of the United Nations (FAO), CAB International (CABI), and the USDA National Agricultural Library (NAL), maintainers of three large thesauri of agricultural terminology that largely overlap in scope, have partnered to create a shared Global Agricultural Concept Scheme (GACS). Duplication of effort has proven to be both inefficient and a barrier to users wishing to search across databases indexed with their terms. Expressing AGROVOC, CAB Thesaurus, and NAL Thesaurus in RDF and SKOS, as Linked Data, facilitates mappings, but mappings among three large, continually moving targets are difficult to maintain.
Starting with algorithmically generated mappings among three sets of the terms most frequently used to index the AGRICOLA, CAB Abstracts, and AGRIS databases, thesaurus managers in the GACS Working Group have manually vetted the mappings for quality and are currently correcting logical inconsistencies. In a final iteration, these mappings will be used to generate a Global Agricultural Concept Scheme with its own identifiers, and GACS will be moved into its own distributed editorial environment and jointly maintained by the three partners.
Targeted for beta release in early 2016, GACS aggregates the complementary strengths of its sources, such as expertise in particular areas and labels in twenty languages. Formulating consistent policies for GACS on issues such as scientific versus common names for organisms requires balancing scientific, commercial, educational, and mass-market perspectives. The challenge of global food security under conditions of climate change will require the integration of data at all levels. GACS can serve as a focal point in the broader ecosystem of vocabularies, code lists, database schemas, ontologies, statistical indicators, and taxonomies required to drive agricultural research and innovation.