caracciolo et al_2012_aos_agrovoc_multilinguality
TRANSCRIPT
A I M SIs ISO 639 enough for a multilingual
thesaurus?The AGROVOC case
Caterina Caracciolo, Gudrun Johannsen, Lavanya Kiran, Johannes Keizer
Food and Agriculture Organization of the UNAOS 2012
Sept 4. 2012 - Kuching (MY)
Background
• AGROVOC is published in 21 languages + other under development
• Multilinguality has always been an issue• Since the beginning, multilinguality was
interpreted as “translation”:– One hierarchy of terms (one structure),
translations in various languages• This organization remained with the move
from a term-centered to a concept-centered resource04/13/2023 2
AGROVOC as object-centered resource…
• Being mainly a resource for document indexing in the area of agriculture, it contains large amount of words referring to plants, animals, food in general
04/13/2023 3
# of concepts below top concepts
04/13/2023 4strategies
site
events
time
factors
processes
technology
stages
state
measures
groups
locations
systems
subjects
resources
objects
features
properties
methods
products
activities
phenomena
entities
substances
organism
0 5000 10000 15000 20000 25000
Series1
Differentiating languages
• Salmon (en)• Salmón (es)• лососи (ru)
04/13/2023 5
But distribution of languages may be wide…
04/13/2023 6
… and names of food tend to vary…
04/13/2023 7
Palta
Aguacate
… and names of food tend to vary…
04/13/2023 8
Coime, coimi, cuimi, millmi
Achis,Coyos (Cajamarca), Achita (Ayacucho), Kiwicha (Cusco)
Ataco morado, sangorache, sergorache, hawarcha
Not only food names vary
04/13/2023 9
Requirements for rendering multilinguality in AGROVOC
1. Unambiguously express the geographic area where a given word is used– specification of the area of use of a given word
should be optional.
2. No limitations on the type of area allowed– Countries, groups of countries, geographical or
administrative regions should be equally available for specification.
04/13/2023 KISAF, Rome 10
AGROVOC as a SKOS resource
• skos:Concept is to indicate a group of words in various languages, to be considered translations of one another
• URI are kept “abstract” to emphasize independence of the concept from language– E.g. http://aims.fao.org/aos/agrovoc/c_12332
• The words grouped are then labels of the given concept
04/13/2023 11
SKOS properties to express terms
• skos:prefLabel, skos:altLabel– take plain literals as values– and an optional language tag expressed by XML
attribute xml:lang• skosxl:prefLabel, skosxl:altLabel
– Take entities with URIs, so extra infomation be attached to labels
04/13/2023 12
AGROVOC uses ISO 639 2 digitsto tag languages in xml:lang
• ISO 639 provides codes for languages independently of– the country where they are spoken:
• Spanish, Basque (same country, both official languages)• Dutch, Flamish (different country, similar enough
languages…)
– And their status: French and Breton (same country, Breton has no status)
• Only one code for English, Spanish…• Limitations shown from previous examples04/13/2023 KISAF, Rome 13
Multilinguality
ISO 639Languagecodes
04/13/2023 14
Is ISO 639 3 digits an option?
• More languages are included– More contemporary languages
• Bemba language
– “Old” languages (no longer spoken)• Old French (842ca-1400)
– Groups of languages• Cuacasian languages
– Artificial languages• Same approach as the 2 digit version
04/13/2023 KISAF, Rome 15
Is IETF an option?
• Internet Engineering Task Force (IETF)• IETF 5646 Tags for identifying languages
– Basis is ISO for languages (639) – Subtags from ISO for countries (3166), ISO for
scripts (15924) • Examples:
– tr-CY = Turkish from Cyprus– zh-Hant-HK = Chinese in traditional Chinese script
04/13/2023 KISAF, Rome 16
Is a relational approach an option?
• Keep tagging approach to mark the language– Use ISO 639 or IETF
• And introduce a relational notion of “where a given word is used”
• Link together a concept representing a geographic area, and the object to name– E.g., Kiwicha isNameUsedInRegion Cusco
• Aim at “standard” relations…
04/13/2023 KISAF, Rome 17
Conclusions?
• This is work in progress• We continue working out use cases, especially
from Spanish and Portuguese• Assess alternatives
04/13/2023 KISAF, Rome 18