data-as-a-service: datagraft

55
Data-as-a-Service DataGraft Dumitru Roman [email protected] https://datagraft.net

Upload: dapaasproject

Post on 15-Feb-2017

1.771 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Data-as-a-Service: DataGraft

Data-as-a-ServiceDataGraft

Dumitru [email protected]

https://datagraft.net

Page 2: Data-as-a-Service: DataGraft

2

“Data is the new oil”…but many of us just need gasoline

Data-as-a-Service …is the new filling station

Page 3: Data-as-a-Service: DataGraft

Data-as-a-Service

• Outsourcing of various data operations to the cloud

• Eliminates

– upfront costs on data infrastructure

– ongoing investment of time and resources in managing the data infrastructure

• Complete package for

– transformation of raw data into meaningful data assets

– reliable delivery of data assets

3

Page 4: Data-as-a-Service: DataGraft

Example #1: Using open data – petroleum activities on the Norwegian continental shelf

4

• ~70 tabular datasets• Difficult to query across

tables, integrate with other data, e.g. Business Registry

• Simplified integration with external datasets

• Distribution of integrated dataset• Live service• Reliable access• …

• Which companies have been owners in license X?

• What is the oil production for each field in year X?

• What is the total production of the top 10 companies by number of employees in year X?

• ....

Integration and querying service

Tabular data on the Web

Data Insights

factpages.npd.no data.brreg.no/oppslag/enhetsregisteret

Page 5: Data-as-a-Service: DataGraft

Example #2: Reporting state-owned real estate properties in Norway

• A hard copy of 314 pages and as a PDF file

• 6 Person-Months• Data collection with spreadsheets• Quality assurance through e-mails

and phone correspondence

Pains• Time consuming• Poor data quality• Static report without live updating

• Live service• Efficient sharing of data• Simplified integration with external

datasets• Live updating• Reliable access• …

• Risk and vulnerability analysis, e.g. buildings affected by flooding

• Analysis of leasing prices

Report Reporting Service 3rd party services

5

Page 6: Data-as-a-Service: DataGraft

Sample data

6

Cleaning, Transformation, Publishing,

Integration, Querying, Visualization,

Service Access

Page 7: Data-as-a-Service: DataGraft

7

Example #3: Personalized and Localized Urban Quality Index (PLUQI)

The index includes data from various domains:

Daily life satisfaction weather, transportation, community,…

Healthcare level number of doctors, hospitals, suicide statistics,…

Safety and security number of police stations, fire stations, crimes per capita,…

Financial satisfaction prices, incomes, housing, savings, debt, insurance, pension,…

Level of opportunity jobs, unemployment, education, re-education,…

Environmental needs and efficiency green space, air quality,…

Page 8: Data-as-a-Service: DataGraft

Sample data

8

Page 9: Data-as-a-Service: DataGraft

was developed to allow

data workers to manage their data in a

simple, effective, and efficient way

Powerful

data transformation and

reliable data access capabilities

9

DataGraft

Page 10: Data-as-a-Service: DataGraft

Tabular Data Graph Data

• Open Data is mostly tabular data

• Excel, CSV, TSV, etc.

• Records organized in silos of collections

• Very few links within and/or across

collections

• Difficult to understand the nature of the

data

• Difficult to integrate / query

Based on Linked Data• Method for publishing data on the Web

• Self-describing data and relations

• Interlinking

• Accessed using semantic queries

• Open standards by W3C− Data format: RDF

− Knowledge representation: RDFS/OWL

− Query language: SPARQL

http://www.w3.org/standards/semanticweb/data

europeandataportal.eu

10

Page 11: Data-as-a-Service: DataGraft

Data Transformation and RDF Publication Process

• Interactive design of transformations?

• Repeatable transformations?

• Reuse/share transformations (user-based access)?

• Cloud-based deployment of transformations?

• Self-serviced process?

• Data and Transformation as-a-Service? 11

Semantic graph

database

Page 12: Data-as-a-Service: DataGraft

Tabular Data

GraphData

DataGraft: Data-as-a-ServiceFor the Data Transformation and RDF Publication Process

12

Page 13: Data-as-a-Service: DataGraft

13

https://www.ssb.no/statistikkbanken

Example: Using statistical data

Page 14: Data-as-a-Service: DataGraft

14

Page 15: Data-as-a-Service: DataGraft
Page 16: Data-as-a-Service: DataGraft
Page 17: Data-as-a-Service: DataGraft
Page 18: Data-as-a-Service: DataGraft
Page 19: Data-as-a-Service: DataGraft
Page 20: Data-as-a-Service: DataGraft
Page 21: Data-as-a-Service: DataGraft
Page 22: Data-as-a-Service: DataGraft
Page 23: Data-as-a-Service: DataGraft
Page 24: Data-as-a-Service: DataGraft
Page 25: Data-as-a-Service: DataGraft
Page 26: Data-as-a-Service: DataGraft
Page 27: Data-as-a-Service: DataGraft
Page 28: Data-as-a-Service: DataGraft
Page 29: Data-as-a-Service: DataGraft
Page 30: Data-as-a-Service: DataGraft

30

Page 31: Data-as-a-Service: DataGraft

31

Page 32: Data-as-a-Service: DataGraft

32

Data records (rows)

Add rowTake row(s)Drop row(s)

Shift rowFilter rows (grep)

Remove duplicate rows

Entire datasetSort

Reshape datasetGroup (categorize) and aggregate

Columns

Add column(s)Take column(s)Drop column(s)Move column

Merge columnsSplit column

Rename column(s)Apply function to all values in a column

Page 33: Data-as-a-Service: DataGraft

33

Page 34: Data-as-a-Service: DataGraft

34

Page 35: Data-as-a-Service: DataGraft

35

Page 36: Data-as-a-Service: DataGraft

36

Page 37: Data-as-a-Service: DataGraft

37

Page 38: Data-as-a-Service: DataGraft

Data pages and federated querying

38

What is the population of locations and total number of persons employed in Human health and social work activities?

Page 39: Data-as-a-Service: DataGraft

Configuring data visualizations

39

Page 40: Data-as-a-Service: DataGraft

40

Page 41: Data-as-a-Service: DataGraft

41

Page 42: Data-as-a-Service: DataGraft

42

Page 43: Data-as-a-Service: DataGraft

43

APIs

Page 44: Data-as-a-Service: DataGraft

DataGraft key feature: Flexible management and sharing of data

and transformations

Fork, reuse and extend transformations built by other professionals from DataGraft’s

transformations catalog

Interactively build, modify and share data

transformations

Share transformations privately or publicly

Reuse transformations to repeatably clean and

transform spreadsheet data

Programmatically access transformations and the transformation catalogue

44

Page 45: Data-as-a-Service: DataGraft

Reuse of transformations in environmental data publishing

TRAGSA Pilot

• Number of transformations: 42

– Created via reuse: 25

• Number of triples:

– ~ 7.7M

ARPA Pilot

• Number of transformations: 5

– Created via reuse: 2

• Number of triples:

– ~ 14K

45

Forking/reusing transformations helped us spend less time on creating new transformations

Page 46: Data-as-a-Service: DataGraft

DataGraft key feature: Reliable data hosting and querying services

Host data on DataGraft’sreliable, cloud-based

semantic graph database

Share data privately or publicly

Query data through your own SPARQL

endpoint

Programmatically access the data

catalogue

46

Operations & maintenance performed on behalf of users

Page 47: Data-as-a-Service: DataGraft

Grafter Grafterizer

Semantic Graph DBaaSData Portal

DataGraft

47

DataGraft Enablers

Page 48: Data-as-a-Service: DataGraft

DataGraft – 1 package 2 audiences

DataGraft

Data Publisher Application Developer

Helping integrating and publishing data

Giving better, easier tools

48

Page 49: Data-as-a-Service: DataGraft

DataGraft – targeted impacts

Reduction in costsfor organisations which lack sufficient expertise and resources to make their data available

Reduction on the dependencyof data owners on generic Cloud platforms to build, deploy and maintain their linked data from scratch

Increase in the speed of publishing new datasets and updating existing datasets

Reduction in the cost and complexity of developing applications that use data

Increase in the reuse of data by providing reliable access to numerous datasets hosted on DataGraft.net

49

Page 50: Data-as-a-Service: DataGraft

• Gathering enough of good datasets

• Designing/implementing

2. Able to focus onservice quality

Example: The benefit of DataGraft in PLUQI

50

• Reducing cost for implementing transformations

• Integrating the process is simpler

1. 23% of developmentcost reduction

Datasetsgathering

Datatransformation

Data provisioning/access

ImplementingApp

Before

Datasetsgathering

Datatransformation

Data provisioning/

access

ImplementingApp

After (with DataGraft)

Page 51: Data-as-a-Service: DataGraft

DataGraft in numbers (as of end of Jan 2016)

51

238Registered users

607 (208 public)

Registered Data transformations

1828Uploaded files

192Public Data

pages

Page 52: Data-as-a-Service: DataGraft

DataGraft in the wild

• Investigating crime data in small geographies

• Used DataGraft to transform data and publish RDF

52http://benproctor.co.uk/investigating-crime-data-at-small-geographies/

Page 53: Data-as-a-Service: DataGraft

Data Science and DataGraft

Greater Data Science:

1. Data Exploration and Preparation

2. Data Representation and Transformation

3. Computing with Data

4. Data Visualization and Presentation

5. Data Modeling

6. Science about Data Science53

“50 years of Data Science” by David Donohohttp://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf

DataGraft

Page 54: Data-as-a-Service: DataGraft

Summary

• DataGraft – emerging Data-as-a-Service solution for making (linked) data more accessible

– Platform, portal, methodology, APIs

– Online service, functional and documented

– Validated through several use cases

• Key features:

– Support for Sharable/Repeatable/Reusable Data Transformations

– Reliable RDF Database-as-a-Service

54

Page 55: Data-as-a-Service: DataGraft

https://datagraft.net

Thank you!Contact: [email protected] 55