harmonization of vocabularies for water data
TRANSCRIPT
Harmonization of vocabularies for water data
Jonathan Yu | Research engineer
HIC 2014, 17 August 2014
LAND AND WATER FLAGSHIP | OCEANS AND ATMOSPHERE FLAGSHIP
Outline
• Context and problem space – need formal mechanisms for publishing vocabularies
• Use of semantic web tech to publish and harmonise vocabularies
• Challenges still exist• conceptualisation as both classes and individuals – pragmatic but problematic
• URI patterns
• Versioning and keeping track
• Suggested paths forward?
Issues
• Formalization• RDF SKOS OWL
• Collections
• Re-use/clone/leave alone
• URI Patterns
• Distribution• UIs/APIs
• Versioning
• Mappings
• Search and discovery
Presentation title | Presenter name3 |
Formalization: classic glossary – term+definition
Presentation title | Presenter name4 |
CABI - http://www.cabi.org/ashc/uploads/file/ASHC/8_Glossary__acronyms__index_revised.pdf
AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
cas_rnnumber
ANGDTS Code ANGDTS Description Units_used
WDTF Parameter chemical name
ADWG name
IUPAC name Group Ion
EC ECease at which conduction current can be caused to flow through material in microSiemens/centimetre
us/cm ms/cm mg/L
ElectricalConductivityAt25C_uScm
Electrical Conductivity Conductivity
PH pHnegative logarithm of hydrogen ion concentration in ph units
pH units WaterpH_pHpH pH
pH, alkalinity, acidity
16887-00-6
16887-00-6
concentration of chloride as Cl in milligrams/litre
mg/L mg/kg Chloride Chloride Chloride Anion
TDS TDSthe portion of total solids that passes through filter and deemed to have been dissolved in sample in milligrams/litre
mg/L Total Dissolved Solids
Total Dissolved Solids Salinity
TOTALALKALINITY
ALKTconcentration in milligrams/litre CaCO3 of titratable bases using a methyl-orange endpoint of about pH 4.3
mg/L Total Alkalinity (as CaCO3)
pH, alkalinity, acidity
HARDNESS_CACO3
HARDthe ability of water to precipitate soap and is sum of calcium and magnesium concentrations as milligrams/litre CaCO3
mg/L Hardness (as CaCO3)
Hardness (as calcium carbonate)
Hardness (as calcium carbonate)
SAR SARratio of sodium to magnesium and calcium and used to assess risk of excess sodium in irrigation water Ratio
Sodium Adsorption Ratio Salinity
3812-32-6
ALKCalkalinity ascribed to carbonate in milligrams/litre CO3
mg/L %MOL
Carbonate Alkalinity (as CaCO3) Carbonate
pH, alkalinity, acidity
NITRATE14797-
55-8concentration of nitrate as N in milligrams/litre
mg/L mg/kg Nitrate
Nitrate and Nitrite
Nitrate and Nitrite Anion
7439-89-6
7439-89-6
concentration of iron as Fe in milligrams/litre
mg/L mg/kg ug/L Iron Iron Metal Cation
Formalization: table – structure + mappings
Healthy Headwater - NGIS Terms
Formalization: RDF – SKOS for basic vocabularies
Linked Vocabularies | Simon Cox6 |
chem:sodium
a skos:Concept ;
rdfs:label "sodium"^^xsd:string ;
skos:broader chem:alkali ;
skos:exactMatch <http://dbpedia.org/resource/Sodium> ;
skos:inScheme skos:chemicals ;
skos:prefLabel "nátrium"@hu , "sodio"@it , "sodium"@fr , "sodium"@en .
Formalization: RDFS/OWL add rich predicates
• Water Quality Vocabulary
Presentation title | Presenter name7 |
AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
Formalization: alignment with existing vocabularies (Water Quality extension to QUDT)
QUDT
OP
AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
Formalization: link detailed model to SKOS access using SKOS API
Other approaches: OWL Class per concept
• deep subsumption hierarchy: SWEET, OBO
Presentation title | Presenter name10 |
• intersecting constraints:CGI Lithology
Formalization challenge
• Sometimes formalized as OWL - usually as SKOS(example? SWEET / GEMET?)
• Class vs individuals(Example from QUDT?)
• Hybrid approaches exist – vocabulary as individuals of classes from an ontology but aligned with SKOS(Example from OP?)
• https://www.seegrid.csiro.au/wiki/Siss/VocabularyFormalizationInSKOS
Presentation title | Presenter name11 |
Collections
skos:Collection –skos:member skos:Concept|skos:Collection• A new collection can claim existing concepts as members
• Nested collections
skos:Concept –skos:inscheme skos:ConceptScheme• Concepts assert their own membership
• No nesting
owl:Ontology• No membership predicate
– rdfs:member? dct:hasPart?
void:Dataset, ldp:Container, reg:Register
Presentation title | Presenter name12 |
Re-use: new collections from old – clone, or leave alone
Presentation title | Presenter name13 |
• eReefs WQ vocabulary includes a subset of 330+ chemicals from 36000+ in ChEBI
• New resources in local namespace
• SKOS *Match predicate gives provenance, link to more detail
Clone or leave alone?
• Question of caching content vs federating queries/discovery of content
• Consider CHEBI – big• Cache or just link to its definitions?
• Tradeoff between performance and convenience vs updating and synchronize
• LDR allows registration of external resources• New register = subset or combination of terms already published elsewhere?
Presentation title | Presenter name14 |
URI Patterns – opaque?
What does the URL path imply?
http://vocab.nerc.ac.uk/collection/G04/current/008/
G04 ISO RoleCode, 008 Principal Investigator
http://resource.geosciml.org/classifier/ics/ischart/Pliocene
= Pliocene, URI supplied by GeoSciML, definition sourced from International Commission for Stratigraphy (ics), in the collection known as ‘International Stratigraphic Chart’ (ischart)
Semantics? Management? Set-membership?
Presentation title | Presenter name15 |
Versioning - 2
Are these the same thing? How can we tell? How can a machine tell?http://sweet.jpl.nasa.gov/1.1/time.owl#PLEISTOCENEhttp://sweet.jpl.nasa.gov/2.0/timeGeologic.owl#Pleistocenehttp://sweet.jpl.nasa.gov/2.2/stateTimeGeologic.owl#Pleistocenehttp://sweet.jpl.nasa.gov/2.3/stateTimeGeologic.owl#Pleistocene
Compare with http://resource.geosciml.org/classifier/ics/ischart/Pliocene
– URI for the concept
http://def.seegrid.csiro.au/sissvoc/isc2014/resource.html?uri=http://resource.geosciml.org/classifier/ics/ischart/Pliocene
– URI for a description of the concept (i.e. record), according to the 2014 version of the service
Care with version number in URI!
Presentation title | Presenter name17 |
Versioning - 3
• Version info in item?http://vocab.nerc.ac.uk/collection/G04/current/008/ a skos:Concept ;
skos:prefLabel ”principalInvestigator” ;
owl:versionInfo “1” ;
dc:date “2012-07-04 10:56:53.0” .
Presentation title | Presenter name18 |
• Version info in registration record?
Versioning
• How do we manage versions of definitions?
• Do we version a definition of an abstract concept?• Does the definition of the concept change or does our understanding
change?
• Version the set or individual items?
Presentation title | Presenter name19 |
Distribution
• Vocabulary packaged in a file or pagehttp://resource.geosciml.org/vocabulary/timescale/isc2014.ttl
http://resource.geosciml.org/vocabulary/timescale/isc2014.html
• Dereference the URI for a resource in the vocabularyhttp://resource.geosciml.org/classifier/ics/ischart/ (all)
http://resource.geosciml.org/classifier/ics/ischart/Cambrian
• SPARQL endpointhttp://resource.geosciml.org/sparql/isc2014
• Vocabulary servicehttp://def.seegrid.csiro.au/sissvoc/isc2014/collection
Presentation title | Presenter name20 |
Mapping challenge
• Linking between ontologies – which to use? All or some?
• SKOS relations - exactMatch, closeMatch, narrowMatch, broadMatch• OWL predicates - sameAs for individuals, equivalentClass for classes and
equivalentProperty for properties
• Dublin core• Prov-O• VoID• VOAF
• Linking between classes and individuals in OWL – logics-based reasoning support
Presentation title | Presenter name23 |
Standards…
• The standard ISO 8601 concerns dates, a common type of information used for data and documentation.
• March 5, 2014• 2014-03-05• 3/5/14• 05/03/2014• 5 Mar 2014
• Multiple representations but essentially one meaning
Source: http://dataabinitio.com/?p=449
Presentation title | Presenter name26 |
Challenges still exist
• Variation of formalisation and publication
• conceptualisation as both classes and individuals – pragmatic but problematic
• URI patterns
• Versioning and keeping track
Presentation title | Presenter name27 |
Jonathan Yu
Research Software Engineer
Bruce Simons
SDI Modeller
ADD BUSINESS UNIT/FLAGSHIP NAME
Thank you
Terms of use: Image sources from Wikipedia under CC2.0 licencehttp://en.wikipedia.org/wiki/File:Amazing_Great_Barrier_Reef_1.jpg
Simon Cox
Research Scientist
http://ereefs.org.au/