a primer on converting analysis results data to …openrefine method 18 advantages • flexibility...
TRANSCRIPT
![Page 1: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/1.jpg)
A Primer on Converting Analysis Results Data to RDF Data Cubes using Free and Open Source Tools
Tim Williams Principal Statistical Solutions Analyst Global Statistical Sciences UCB BioSciences, Inc.
PhUSE 2014
TT03
![Page 2: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/2.jpg)
The Semantic Web (circa 2011) 2
![Page 3: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/3.jpg)
3 "I want to take the clinical trials results..."
"..and put them in an RDF Data Cube!"
Placebo LowDose HighDose Baseline N=28 N=30 N=29 --------------------------------------------- Sex F 12 (42.9) 14 (46.7) 16 (55.2) M 16 (57.1) 16 (53.3) 13 (44.8)
ds:obs1 a qb:Observation ; prop:treatment "Plc" ; prop:sex "F" ; prop:statistic "count" ; prop:result "12"^^xsd:double ; qb:dataSet ds:dataset-demog .
ds:obs2 a qb:Observation ; ...
![Page 4: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/4.jpg)
4
"op": "rdf-extension/save-rdf-schema", "description": "Save RDF schema skeleton", "schema": { "baseUri": "http://www.example.org/", "prefixes": [ { "name": "dccs", "uri": "http://www.example.org/dc/demog/dccs/" }, { "name": "rdfs", "uri": "http://www.w3.org/2000/01/rdf-schema#" }, { "name": "prov", "uri": "http://www.w3.org/ns/prov#" }, ........
JSON
ts:i7832 ts:firstName “Homer” ; ts:lastName “Simpson” ; ts:hasSpouse ts:i5628 . ts:i5628 ts:firstName “Marge”; ts:lastName “Simpson”;
Turtle
Tribble hasSpouse Homer Simpson
Marge Simpson
Triple
Turtle
Jason
How to start?
![Page 5: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/5.jpg)
In Scope
• Introduction to Semantic Web, RDF....
• PhUSE Wiki "PhUSE Semantic Technology Curriculum" • Detailed tutorial
5
Out of Scope
• Simplified RDF Data Cube • Two creation methods (overview)
![Page 6: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/6.jpg)
PhUSE Wiki: Companion Documents 6
![Page 7: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/7.jpg)
What is an RDF Data Cube?
7
![Page 8: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/8.jpg)
8
![Page 9: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/9.jpg)
3 Main Components in the Cube Model • Attributes
• metadata • status=final,issued="2014-08-06T00:00:00"^^xsd:dateTime ;
• Measure (or Primary Measure) • the observed value of primary interest • count=12
• Dimensions • value keys or indices that identify the measure • treatment="Plc" , sex="F", statistic="count"
9
![Page 10: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/10.jpg)
10
F
M
Plc LowD HighD
count
12
16
14
16
16
13
12
percentage
14 16
16
13
42.9 46.7 55.2
55.2
44.8
Treatment
Sex
Baseline Placebo LowDose HighDose Characteristic N=28 N=30 N=29 ---------------------------------------------------------------------- Sex F 12 (42.9) 14 (46.7) 16 (55.2) M 16 (57.1) 16 (53.3) 13 (44.8)
Statistic • count • percentage
Treatment
![Page 11: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/11.jpg)
11
treatment="Plc",
Dimensions
12
16
14
16
16
13
12 14 16
16
13
42.9 46.7 55.2
55.2
44.8
It's a hit!! count=12
Measure
sex="F",
statistic="count"
Plc Treatment
F Sex
count
![Page 12: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/12.jpg)
12 Publisci OpenRefine
X
X X
![Page 13: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/13.jpg)
Publisci Method 13
Map table
Ruby Script
CSV
Baseline Placebo LowDose HighDose Characteristic N=28 N=30 N=29 ---------------------------------------------------------------------- Sex F 12 (42.9) 14 (46.7) 16 (55.2) M 16 (57.1) 16 (53.3) 13 (44.8)
Statistic • count • percentage
Treatment,Sex,Statistic,Result Plc,F,count,12 Plc,F,percentage,42.9 Plc,M,count,16 Plc,M,percentage,57.1 LowD,F,count,14, LowD,F,percentage,46.7 LowD,M,count,16 LowD,M,percentage,53.3 etc.
require 'publisci' include PubliSci::DSL data do source 'demog3DimSource.csv' dimension 'Treatment' , 'Sex', 'Statistic' measure 'Result' option :base_url, 'http://example.org' option 'base', 'http://example.org/' option 'label_column', 'Statistic' end metadata do dataset 'Demographics Analysis Results' title 'Demographics' creator 'Your-Name-Here' description 'Table example for Demographics and Baseline Characteristics' date '2014-07-07T00:00:00' end open('demog3Dim_p.ttl','w'){|file| file.write generate_n3}
... ns:obs1 a qb:Observation ; qb:dataSet ns:dataset-demog3DimSource ; rdfs:label "1" ; prop:Treatment <code/treatment/Plc> ; prop:Sex <code/sex/F> ; prop:Statistic <code/statistic/count> ; prop:Result 12 ; ... .
![Page 14: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/14.jpg)
Publisci Method 14
Advantages
Simple, quick, easy
Minimal cube knowledge
Automatic code list generation
Disadvantages
Limited support*
Harder to extend unless you are a Ruby and Cube expert
Not as flexible as OpenRefine
RDF Data Cube
![Page 15: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/15.jpg)
15
Map table
Import
Construct
Attach
Export
CSV/XLS
Create Project
Cube Skeleton Components • Attributes • Dimensions • Measure
Values
OpenRefine Method
![Page 16: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/16.jpg)
OpenRefine 16
![Page 17: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/17.jpg)
Save & Re-use JSON from OpenRefine 17
![Page 18: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/18.jpg)
OpenRefine method 18
Advantages
• Flexibility in cube design
• Incremental development
Disadvantages
• Greater cube knowledge required
• Steep Learning curve
• Labour-intensive, manual steps
• Measures in the same cube all receive the same data type Example: count and percentage as xsd:double
• Cube components available within interface
• Data reconciliation
![Page 19: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/19.jpg)
Where did I go wrong with this child?
Query the data with SPARQL 19
PREFIX prop: <http://www.example.org/dc/demog/prop/> SELECT ?value WHERE { ?obs prop:treatment "Plc"; prop:sex "F"; prop:statistic "count"; prop:result ?value. }
I blame The Internets, honey.
SPARQL Protocol and RDF Query Language
![Page 20: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/20.jpg)
Cube Construction: an Evolution. 20
Publisci rrdf, rrdflibs
• “My first cube!” • Codelists
• Structure and Skeletons • Customization • Data reconciliation
• Production solution
OpenRefine
![Page 21: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/21.jpg)
Data Transparency?
• Metadata
• embedded with the data
• Standardization
• data reconciliation with online vocabularies & thesauri
• translation between different coding systems and data models
• Merge data
• similar and dissimilar sources
• Machine readable • Reasoning, logic, intelligent search
21
Semantic Interoperability: "The ability for computer systems to exchange data with unambiguous, shared meaning". - Wikipedia.
![Page 22: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/22.jpg)
22
Thank you!
Tim Williams UCB Biosciences, Inc. Raleigh, NC USA [email protected]
Acknowledgements Will Strinz - Publisci Author OpenRefine team Ian Fleming, Marc Andersen - PhUSE WG Leads PhUSE WG team members Open Source Movement The Internets Contact:
www.linkedin.com/in/timpwilliams/
![Page 23: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/23.jpg)
Copyright & Source Attributions All images are copyrights of their creators and respected owners.
23
Paramount Pictures
Hasbro Inc.
Ron Leishman. Image 440722 illustrations Of.com
LOD Cloud Diagram as of September 2011CC BY-SA 3.0 . Anja Jentzch, own work
http://dgallery.s3.amazonaws.com/sparql-protocol.png
Davidson University Dept. of Biology Herpetology Lab Research http://www.bio.davidson.edu/people/midorcas/research/stresearch/tercar.jpg
My life with Fly Ball dogs http://mylifewithflyballdogs.com http://farm6.staticflickr.com/5341/7186194778_3c9d6b56be.jpg
Daily Tombstone Photo http://dailytombstonephoto.blogspot.com/2010/05/mausoleum-of-charles-lucky-luciano-st.html MAUSOLEUM OF CHARLES "LUCKY" LUCIANO - St. John's Cemetery, Middle Village, New York Image modifications by TW, Aug 2014
![Page 24: A Primer on Converting Analysis Results Data to …OpenRefine method 18 Advantages • Flexibility in cube design • Incremental development Disadvantages • Greater cube knowledge](https://reader034.vdocuments.site/reader034/viewer/2022050305/5f6d756764422238352ed141/html5/thumbnails/24.jpg)
24
Nickelodeon
Copyright & Source Attributions All images are copyrights of their creators and respected owners.