data-as-a-service: datagraft
TRANSCRIPT
2
“Data is the new oil”…but many of us just need gasoline
Data-as-a-Service …is the new filling station
Data-as-a-Service
• Outsourcing of various data operations to the cloud
• Eliminates
– upfront costs on data infrastructure
– ongoing investment of time and resources in managing the data infrastructure
• Complete package for
– transformation of raw data into meaningful data assets
– reliable delivery of data assets
3
Example #1: Using open data – petroleum activities on the Norwegian continental shelf
4
• ~70 tabular datasets• Difficult to query across
tables, integrate with other data, e.g. Business Registry
• Simplified integration with external datasets
• Distribution of integrated dataset• Live service• Reliable access• …
• Which companies have been owners in license X?
• What is the oil production for each field in year X?
• What is the total production of the top 10 companies by number of employees in year X?
• ....
Integration and querying service
Tabular data on the Web
Data Insights
factpages.npd.no data.brreg.no/oppslag/enhetsregisteret
Example #2: Reporting state-owned real estate properties in Norway
• A hard copy of 314 pages and as a PDF file
• 6 Person-Months• Data collection with spreadsheets• Quality assurance through e-mails
and phone correspondence
Pains• Time consuming• Poor data quality• Static report without live updating
• Live service• Efficient sharing of data• Simplified integration with external
datasets• Live updating• Reliable access• …
• Risk and vulnerability analysis, e.g. buildings affected by flooding
• Analysis of leasing prices
Report Reporting Service 3rd party services
5
Sample data
6
Cleaning, Transformation, Publishing,
Integration, Querying, Visualization,
Service Access
7
Example #3: Personalized and Localized Urban Quality Index (PLUQI)
The index includes data from various domains:
Daily life satisfaction weather, transportation, community,…
Healthcare level number of doctors, hospitals, suicide statistics,…
Safety and security number of police stations, fire stations, crimes per capita,…
Financial satisfaction prices, incomes, housing, savings, debt, insurance, pension,…
Level of opportunity jobs, unemployment, education, re-education,…
Environmental needs and efficiency green space, air quality,…
Sample data
8
was developed to allow
data workers to manage their data in a
simple, effective, and efficient way
Powerful
data transformation and
reliable data access capabilities
9
DataGraft
Tabular Data Graph Data
• Open Data is mostly tabular data
• Excel, CSV, TSV, etc.
• Records organized in silos of collections
• Very few links within and/or across
collections
• Difficult to understand the nature of the
data
• Difficult to integrate / query
Based on Linked Data• Method for publishing data on the Web
• Self-describing data and relations
• Interlinking
• Accessed using semantic queries
• Open standards by W3C− Data format: RDF
− Knowledge representation: RDFS/OWL
− Query language: SPARQL
http://www.w3.org/standards/semanticweb/data
europeandataportal.eu
10
Data Transformation and RDF Publication Process
• Interactive design of transformations?
• Repeatable transformations?
• Reuse/share transformations (user-based access)?
• Cloud-based deployment of transformations?
• Self-serviced process?
• Data and Transformation as-a-Service? 11
Semantic graph
database
Tabular Data
GraphData
DataGraft: Data-as-a-ServiceFor the Data Transformation and RDF Publication Process
12
13
https://www.ssb.no/statistikkbanken
Example: Using statistical data
14
30
31
32
Data records (rows)
Add rowTake row(s)Drop row(s)
Shift rowFilter rows (grep)
Remove duplicate rows
Entire datasetSort
Reshape datasetGroup (categorize) and aggregate
Columns
Add column(s)Take column(s)Drop column(s)Move column
Merge columnsSplit column
Rename column(s)Apply function to all values in a column
33
34
35
36
37
Data pages and federated querying
38
What is the population of locations and total number of persons employed in Human health and social work activities?
Configuring data visualizations
39
40
41
42
43
APIs
DataGraft key feature: Flexible management and sharing of data
and transformations
Fork, reuse and extend transformations built by other professionals from DataGraft’s
transformations catalog
Interactively build, modify and share data
transformations
Share transformations privately or publicly
Reuse transformations to repeatably clean and
transform spreadsheet data
Programmatically access transformations and the transformation catalogue
44
Reuse of transformations in environmental data publishing
TRAGSA Pilot
• Number of transformations: 42
– Created via reuse: 25
• Number of triples:
– ~ 7.7M
ARPA Pilot
• Number of transformations: 5
– Created via reuse: 2
• Number of triples:
– ~ 14K
45
Forking/reusing transformations helped us spend less time on creating new transformations
DataGraft key feature: Reliable data hosting and querying services
Host data on DataGraft’sreliable, cloud-based
semantic graph database
Share data privately or publicly
Query data through your own SPARQL
endpoint
Programmatically access the data
catalogue
46
Operations & maintenance performed on behalf of users
Grafter Grafterizer
Semantic Graph DBaaSData Portal
DataGraft
47
DataGraft Enablers
DataGraft – 1 package 2 audiences
DataGraft
Data Publisher Application Developer
Helping integrating and publishing data
Giving better, easier tools
48
DataGraft – targeted impacts
Reduction in costsfor organisations which lack sufficient expertise and resources to make their data available
Reduction on the dependencyof data owners on generic Cloud platforms to build, deploy and maintain their linked data from scratch
Increase in the speed of publishing new datasets and updating existing datasets
Reduction in the cost and complexity of developing applications that use data
Increase in the reuse of data by providing reliable access to numerous datasets hosted on DataGraft.net
49
• Gathering enough of good datasets
• Designing/implementing
2. Able to focus onservice quality
Example: The benefit of DataGraft in PLUQI
50
• Reducing cost for implementing transformations
• Integrating the process is simpler
1. 23% of developmentcost reduction
Datasetsgathering
Datatransformation
Data provisioning/access
ImplementingApp
Before
Datasetsgathering
Datatransformation
Data provisioning/
access
ImplementingApp
After (with DataGraft)
DataGraft in numbers (as of end of Jan 2016)
51
238Registered users
607 (208 public)
Registered Data transformations
1828Uploaded files
192Public Data
pages
DataGraft in the wild
• Investigating crime data in small geographies
• Used DataGraft to transform data and publish RDF
52http://benproctor.co.uk/investigating-crime-data-at-small-geographies/
Data Science and DataGraft
Greater Data Science:
1. Data Exploration and Preparation
2. Data Representation and Transformation
3. Computing with Data
4. Data Visualization and Presentation
5. Data Modeling
6. Science about Data Science53
“50 years of Data Science” by David Donohohttp://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf
DataGraft
Summary
• DataGraft – emerging Data-as-a-Service solution for making (linked) data more accessible
– Platform, portal, methodology, APIs
– Online service, functional and documented
– Validated through several use cases
• Key features:
– Support for Sharable/Repeatable/Reusable Data Transformations
– Reliable RDF Database-as-a-Service
54
https://datagraft.net
Thank you!Contact: [email protected] 55