converting gho to rdf
TRANSCRIPT
Converting WHO’s Global Health Observatory Data to
RDF
Amrapali Zaveri, PhD student
August 27, 2012
1
Outline
• Background• What is the RDF Data Cube Vocabulary?• Semi-automated approach• OntoWiki's CSVImport plug-in• RDFized GHO data• Limitations and Future Work
2
Background• Biomedical statistical data
• Published as Excel sheets• Advantage
• Readable by humans• Disadvantages
• Cannot be queried efficiently• Difficult to integrate with other data (in different formats)
• Our approach• Converting data into a single data model - RDF• Using the RDF Data Cube Vocabulary*• designed particularly to represent multidimensional
statistical data using RDF.
*http://www.w3.org/TR/vocab-data-cube/
3
What is the RDF Data Cube Vocabulary?
4
What is the RDF Data Cube Vocabulary?
• Dimensions• Attributes• Measures• Observations
5
Semi-automated approach• Transforming CSV to RDF in a fully automated way is not
feasible.• Dimensions may often be encoded in heading or
label of a sheet• Our semi-automatic approach:
• As a plug-in in OntoWiki#• a semantic collaboration platform developed by
the AKSW research group.• A CSV file is converted into RDF using the RDF
Data Cube Vocabulary: http://aksw.org/Projects/Stats2RDF
# Sören Auer, Sebastian Tramp (geb. Dietzold), Jens Lehmann, and Thomas Riechert: OntoWiki: A Tool for Social Semantic Collaboration In: Proceedings of the Workshop on Social and Collaborative Construction of Structured Knowledge CKC 2007 at the 16th International World Wide Web Conference WWW2007 Banff, Canada, May 8, 2007 6
1. Create Knowledge Base
7
2. Import a CSV file
8
3. Define dimensions
9
10
4. Define data range
5. Save template, extract triples
11
6. Re-use template for similar files
12
7. View resources
13
RDFized GHO data
gho:Country rdfs:subClassOf qb:DimensionProperty; rdf:type rdfs:Class; rdfs:label "Country" .
gho:Disease rdfs:subClassOf qb:DimensionProperty; rdf:type rdfs:Class; rdfs:label "Disease" .
gho: Afghanistan rdf:type ex:Country; rdfs:label "Afghanistan" .
gho:Tuberculosis rdf:type ex:Disease; rdfs:label "Tuberculosis" .
gho:c1-r6 rdf:type qb:Observation;rdf:value "127"^^xsd:integer;qb:dimension gho:Afghanistan;qb:dimension gho:Tuberculosis .
14
RDFized GHO data
• Available at http://gho.aksw.org
• 50 datasets
• ~ 8 million triples
• Paper published at SWJ Call for Dataset descriptions: http://www.semantic-web-journal.net/content/publishing-and-interlinking-global-health-observatory-dataset
15
Limitations and Future Work• Conversion
• Coherence
• Temporal Comparability
• Exploring GHO
16
Thank You!
Questions?
17
http://aksw.org/AmrapaliZaveri