a water information r & d alliance between the bureau of meteorology and csiro’s water for a...
TRANSCRIPT
A water information R & D alliance between the Bureau of Meteorology and CSIRO’s Water for a Healthy Country Flagship
Vocabulary Services, RDF,SKOS and REST
Peter Fitch
Outline
• Outline the problem• Background on
• Linked data• RDF• SKOS• REST – Linked Data API
• Vocabulary Service • What is it• How to develop a vocabulary service
• Test case with USGS code list• Demo
Warning
Frequent use of XML
Motivations
Xlink is all well and good, but the real
problem is what is at the end of the link and how to use it.
Agreed, and I wish I knew more about
the semantic technologies.
The need for semantic context
From Lemon OSDM Linked Data workshop 2010
• Semantic Context• Black and White • Bessie• Good Milker
Machines need it to.
From Lemon OSDM Linked Data workshop 2010
Information Needed
• Internal Structure-the information model• Supported functions – the operations• Semantics
• What are the concepts• What are the vocabularies• How are they related• Where are they defined
• Where did it come from?• How was it created?
Current Metadata
Adapted from Lemon OSDM Linked Data workshop 2010
Semantic Context
The need for semantic information in Hydro-Domain data exchange
Don’t Information Models solve the problem?
Take a closer look
http://www.bom.gov.au/std/water/xml/wio0.2/property//bom/WaterCourseLevel_m
The O word
You need an ontology!
O What? I know one O word and its not that. I better find
out more.
4 Rules of Linked Data TBL – key take home!
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more things.
Comment:
So by following rule 3, we might be able to get some useful information, we still need semantic context though.
Tim Berners-Lee http://www.w3.org/DesignIssues/LinkedData.html
Linked data quality schemeSir Tim BL
Rating Description
★ Available on the web (whatever format), but with an open license
★★ Available as machine-readable structured data (e.g. excel instead of image scan of a table)
★★★ as (2) plus non-proprietary format (e.g. CSV instead of excel)
★★★★ as (2) plus non-proprietary format (e.g. CSV instead of excel)All the above plus, Use open standards from W3C
(RDF and SPARQL) to identify things, so that people can point at your stuff
★★★★★ All the above, plus: Link your data to other people’s data to provide context
Tim Berners-Lee http://www.w3.org/DesignIssues/LinkedData.html
Intro to RDF
• RDF is a data model for describing resources• Resource Description Framework
Subject Object
Predicate
Things Have properties Property Value
• The object of one statement can become the subject in another.• The set of linked statements, forms a directed graph• Subject, Object and Predicate are all Resources*
A set of Subject, Predicate, Object entities is called a Triple
RDF Example
Remember - Resources are URI’s
Peter – http://www.csiro.au/people/PeterFitch.html
hasColleague – http:/mydefinitions/defintions#hasColleague
Nate - http://cida.usgs.gov/professional-pages/booth.html
Peter Nate
hasColleague
Subject: Peter Predicate: hasColleague Object: Nate
RDF Landscape
Basic resource descriptionsRDF
Express resources as classes, with properties and class relationships
RDFS
OWLWeb Ontology LanguageExact description and relationships
SKOSSimple KOSSimple description and relationships
Expressivity
Basic Building Blocks
SPARQLRDF
Query
Intro to SKOS
• SKOS : Simple Knowledge Organizational System. KOS- provides semantic context.
• Built on RDF and RDFS• Designed to bridge current chaotic, poorly described web, and full
sematic web – OWL.• See SKOS primer at http://www.w3.org/TR/skos-primer• Limited vocabulary eg:
• skos:ConceptScheme• skos:Concept• skos:prefLabel• skos:scopeNote
• And some limited standard relationships• skos:exactMatch• skos:narrower, skos:broader
• Allows for limited inference• Because of its limited vocabulary, really useful for Thesauri,
classification lists, taxonomies etc.
SPARQL Queries
Purpose: query a RDF triple store, works by matching triples to patterns.
example:
select ?concept
where { ?concept rdf:type skos:Concept}
Return me all concepts which are of rdf:type skos:Concept
Other Queries
CONSTRUCT – returns a rdf graph
ASK – returns bool if triple is matched
DESCRIBE – returns a graph describing a resource.
Intro to Linked Data API
• Familiar with RESTful services right??• LD API designed as a bridge between the complexity of
SPARQL endpoints, and a standard REST API• Provides standard URI matching patterns and additional
specification for behaviors• eg: /doc/school/12345 should respond with a document that
includes information about /id/school/12345’• /doc/school should respond with a document of schools• /doc/school/12345 .JSON should return with a JSON
document.
Intro to linked data API
URI pattern
SPARQL for Result Set
SPARQL for view on Result Set
Response in RDF, Turtle, etc.
Vocabulary Service
• In semantic web, vocabulary is defined as a set of URI’s
• Functionally we want:• Ability to look up definitions of terms and or code lists –
skos:Concept, skos:definition, skos:prefLabel• Ability to resolve synonyms skos:exactMatch, skos:broader,
skos:narrower• Ability to deal with different langauages skos:preflabel lang=en• Standard API – Linked data API and REST• Standard Information/Data Model - SKOS
Simon Cox vocab proposal
Proposal by S. Cox https://www.seegrid.csiro.au/wiki/Siss/VocabularyService3
Vocab development process
1. Select code list or vocab for service
2. Map code list to skos
3. Check code list for web compatibility and harmonise with other code lists or vocabs
1. eg: use a standards units vocab
2. remove any non conforming content.
4. Convert code list to SKOS RDF
5. Validate RDF using W3C RDF validator
6. Import to Triple Store
7. Publish Service
8. Use: Link to in documents!
Case study USGS Parameter Code ListProof of Concept
• Code List is a CSV table of parameter codes.
Mapping code list to SKOS
Parameter Code SKOS Mapping
Parameter Code List skos:ConceptScheme
Parameter Code skos:Concept
Group Name skos:broader
Parameter Name skos:prefLabel
cas Name skos:exactMatch
srs Name skos:broadMatch
Units need additional relationshipusgs:hasUnits
Content conformance-harmonization
• Issues• need skos:Definition – Parameter Name?• Invalid characters for web in Parameter Name eg &, <, > - &
< >
• Units – non standard representation, eg Mi2 (square Miles),mgd (Million G per day),%, nu (number of bad characters TX by DCP)
• Fix up• Leave as a literal?
Comments on code list
• Conflation of information• Chromium(VI), water, unfiltered, recoverable, micrograms per liter
• Observable phenomena – Chromium(VI)• Procedure – unfiltered/recoverable• media – water• Units – ug/L
• Phosphorus, suspended sediment, total digestion, dry weight, percent
• Observable phenomena - Phosphorus• procedure total digestion (but not linked to standard method)• media – suspended sediment• units dry weight percent.
• Some meaningless codes - Precipitation, cumulative at given time, location 6, inches
Duplication
• Turbidity, water, unfiltered, broad band light source (400-680 nm), detection angle 90 +/- 30 degrees to incident light, nephelometric turbidity units (NTU)
• Turbidity, water, unfiltered, laboratory, Hach 2100AN, nephelometric turbidity units
• USEPA method 180.1???• Are they the same?
• Why not link to standard methods?• Needs work by domain experts to resolve.
Conversion to SKOS-EXcel2SKOS
• .Net utility to convert Table into skos using Nvelocity
Spreadsheet
Template
Office Interop
Formatter
Excel2Skos
Nvelocity Template-Mapping to SKOS
#foreach($row in $excelsheet)
<skos:Concept rdf:about="$Globals.get_item("URI-Base")/$row.get($code)">
<skos:inScheme rdf:resource="$Globals.get_item("URI-Base")"/>
<skos:definition>Definition of parameter code $row.get($code)</skos:definition>
<skos:prefLabel>$row.get($name)</skos:prefLabel>
#if($row.get($casrn)!= "")
<skos:exactMatch rdf:resource="http://casrn.namespace/$row.get($casrn)"/>
#end
#if($row.get($srsname) != "")
<skos:broadMatch rdf:resource="http://srs.namespace/$row.get($srsname)"/>
#end
<skos:broader rdfs:literal="$row.get($groupname)"/>
<usgs:preferredUnit rdf:resource="http://usgs.gov/vocabularies/units#$row.get($units)"/>
</skos:Concept>
#end
• Classes passed in• Globals – ConceptScheme definitions• excelsheet – 2D table of values.
Conversion to skos
Converting list to RDF
RDF Validationhttp://www.w3.org/RDF/Validator/
RDF Validationhttp://www.w3.org/RDF/Validator/
Import to Triple store
Test Services
• Developed REST services using Microsoft WCF dotnetRDF and NVelocity libraries
• Test API/Vocab/ParameterCodeList – respond with a document of skos:ConceptScheme
/Vocab/ParameterCodeList/ParameterCode – respond with the first page of parameter codes – my implementation returns all!
/Vocab/ParameterCodeList/ParameterCode/{ID}
Process reminder
Sesame RDF Triple store
SPARQL API
USGSCode List
Harmonise and Map
Excel2RDF
RDF Validator
ValidateRDFLoad
dotnetRDF
WCF REST TestServices
http://localhost:8080/VocabService/ParameterCodes/
Demo
Next Steps
• Try the Auscope tooling• the process is the same, uses sesame rdf store• has diferent tooling for Excel to RDF• Different service interface, LD not quite ready.
• If have time, we should set up a test service before I leave.• Below is example of what LD vocabs in WaterML2.0 might look
like.
Conclusions
• The need for sematic context to assist with data integration is pressing.
• Vocabularies are foundation services and need to be put in place for data mediation.
• Technologies and approaches are now mature enough to use• RDF, SKOS, SPARQL, LD API
• There is tooling available through AUSCOPE, but it needs assessment.
• USGS & CIDA has the opportunity to make a range of standard vocabularies available for the hydro community.
Pillars and foundations of Interoperability
System of Systems Interoperablity
Identity and Registration
Service S
tandards
Application S
chema
Netw
ork Standards
Com
munity P
rofiles
Feature C
atalogs
Agreed V
ocabularies and O
ntologies
Sem
antic Brokering
Final word
I don’t look it,but I’m so
happy, I know what is at the
end of the xlink!
Thank you
Business Unit NamePeter FitchProgram Leader Environmental Information Systems
Phone: +61 2 6246 5763Email: [email protected]: www.csiro.au/clw/eis
Contact UsPhone: 1300 363 400 or +61 3 9545 2176Email: [email protected] Web: www.csiro.au
Lessons of Climate Gate
• Theft of e-mails from UEA Nov 2009• E-mails indicated manipulation of data, and suppression of raw data
• Investigations found• methods dis-organised• bunker mentality• lack of transparency
• Researchers promised to• improve scientific data management• open access to data• Improve transparency
climatic research unit, University of East Anglia
From RDF Primer W3C
<rdf:Description rdf:about="http://www.w3.org/TR/rdf-syntax-grammar"> <ex:editor> <rdf:Description> <ex:homePage> <rdf:Description rdf:about="http://purl.org/net/dajobe/"> </rdf:Description> </ex:homePage> </rdf:Description> </ex:editor></rdf:Description>
SWT in other domains
Eco-InformaticsBio-Informatics
Bio-Informatics
HWB – WIRADA Symposium August 2011
If
Insert presentation title
Terminology
• Activity – single process block which can perform a useful task and which can be linked to another process block
• Workflow – a linked set of process blocks
HWB – WIRADA Symposium August 2011