kesw2012 linked data for enterprises and governments (5 oct 2012)

62
Linked Data @ KESW school Knowledge Engineering and Semantic Web (KESW), 5 Oct 2012, St-Petersburg Dr Sören Auer „Linked Open Data“ Senior scientist and head of the research group Agile Knowledge Engineering and Semantic Web at University of Leipzig Daniel Hladky, MBA „Enterprise Linked Data“ Researcher at NRU HSE “Semantic Lab”, Deputy Director W3C Russia Office Board member at Ontos, Avicomp Services, Intecor, MatchCode Software

Upload: integrum-solutions-ag

Post on 11-May-2015

685 views

Category:

Education


2 download

DESCRIPTION

Lecture given at IFMO and the KESW Semantic Web School

TRANSCRIPT

Page 1: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Linked Data @ KESW school Knowledge Engineering and Semantic Web (KESW), 5 Oct 2012, St-Petersburg

Dr Sören Auer „Linked Open Data“

Senior scientist and head of the research group Agile Knowledge

Engineering and Semantic Web at University of Leipzig

Daniel Hladky, MBA „Enterprise Linked Data“

Researcher at NRU HSE “Semantic Lab”, Deputy Director W3C Russia Office

Board member at Ontos, Avicomp Services, Intecor, MatchCode Software

Page 2: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Agenda (morning)

Time Topic Speaker

10:00 Welcome, Intro and Objectives Daniel

Essentials and W3C View

10:15 Evolution of LOD Sören

Status Quo and Current Challenges

11:30 Break

12:00 LOD Lifecycle Sören

13:30 Lunch-Break

© AKSW (LOD2) – NRU HSE / W3C Slide 2

Page 3: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Agenda (afternoon)

Time Topic Speaker

14:30 Linked Data for Enterprises Daniel

Use Cases

15:30 Hands-On LOD “Students”

16:00 Break

16:30 Hands-On continuation

17:30 Team presentation of hands-on

Wrap-Up Daniel

18:00 End

© AKSW (LOD2) – NRU HSE / W3C

Slide 3

Page 4: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Objectives

• Understand the building blocks

– URI, RDF, RDFa, SPARQL …

• Know how to «Publish» and

«Consume» Linked Open Data

• Tools, use cases and references

• Understand benefits and

limitations

© AKSW (LOD2) – NRU HSE / W3C Slide 4

Page 5: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

The Vision of the new Internet

© AKSW (LOD2) – NRU HSE / W3C Slide 5

Linked Data realizes the vision of evolving the Web into a global data commons, allowing applications to operate on top of an unbounded set of data sources, via standardised access mechanisms. I expect that Linked Data will enable a significant evolutionary step in leading the Web to its full potential.

CC-BY-SA von campuspartybrasil (flickr)

Page 6: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

5 Stars for Open Data by Tim Berners Lee

© AKSW (LOD2) – NRU HSE / W3C Slide 6

Page 7: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

W3C View

© AKSW (LOD2) – NRU HSE / W3C

A new wave of transformations

Just as the Web has transformed everything…

…It will transform everything again

Working Groups (W3C Standards) (http://www.w3.org/standards/semanticweb/data)

- RDF, RDFa, SPARQL, RDB2RDF, OWL, RIF, SKOS

Slide 7

Page 8: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Some statistic

HTML/CSS Validation

Markup Validation

© AKSW (LOD2) – NRU HSE / W3C Slide 8

Page 9: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

http://bit.ly/d37p4i

~30 bio. triples

The Semantic Web is already there!

© AKSW (LOD2) – NRU HSE / W3C Slide 9

Page 10: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Put the «L» in front of Open Data

© AKSW (LOD2) – NRU HSE / W3C Slide 10

• Give things an URI!

• Use RDF for Publishing!

• Link your Data to other Data

(as well as the data models)!

• Provide a Standard-API on top

•Provide an API!

•Organise Data!

•License Data!

•Raw Data now!

Publish Data!

Use Web-Technologies

Use Linked Data! • The web is an Ecosystem

• Networked Data creates

Network Effects

• Lowers Costs of Data

Integration

Page 11: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Linked Open Data

Dr Sören Auer

11

Page 12: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

LINKED ENTERPRISE DATA DANIEL HLADKY

HTTP://WWW.W3.ORG/2001/SW/SWEO/PUBLIC/USECASES/

HTTP://WWW.W3.ORG/2012/LDP/WIKI/USE_CASES_AND_REQUIREMENTS

LOD for Enterprise and Government

© AKSW (LOD2) – NRU HSE / W3C Slide 12

Page 13: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

What are Enterprise Data

© AKSW (LOD2) – NRU HSE / W3C 13

Legacy (ERP) System CRM System

E-Mail (Outlook) Wiki (MediaWiki)

CMS System

Page 14: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Data managed in silos

© AKSW (LOD2) – NRU HSE / W3C Slide 14

Finance Student affairs

Equipment

and assets

Institutions, organizations and departments create and store their own data

Departments do not effectively share information; they exchange data

Data inconsistencies, redundancies, and errors affect business results and increase

costs

Own schemas –

DB structures

Page 15: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Connect the silos

© AKSW (LOD2) – NRU HSE / W3C Slide 15

Finance Student Affairs

Equipment & Assets

Enterprise-Wide Reusable

Information

Page 16: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Data Integration by SAP

© AKSW (LOD2) – NRU HSE / W3C Slide 16

SUPPLIER EMPLOYEE CUSTOMER PRODUCT

MDM

SAP MDM Load master data from multiple transactional

systems (SAP & non-SAP) into a single, unified repository

Identify and consolidate similar master data values to eliminate duplicates

Enrich master data values centrally for enterprise wide purposes (such as reporting)

SAP BI (BW)

Integrate data from any SAP or non-SAP data

source for analytics or business-transaction

processing

Extract, transform, and load (ETL) data in

batch or real time

Page 17: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Next generation SAP Real-time

Data Platform and “EIM”

© AKSW (LOD2) – NRU HSE / W3C Slide 17

3rd Party

BI Client

SAP NetWeaver (On Premise / Cloud)

Custom

Apps

SAP Business

Suite

SAP Business

Warehouse

SAP Big Data Applications

SAP Analytics

SAP Mobile

Open Developer APIs and Protocols

Com

mon

Landsc

ape M

anagem

ent

SAP Smart Data Services Platform

SAP HANA Platform

SAP Real-time Data Platform

SAP Sybase ASE

Com

mon

Modeling

Sybase

Pow

erD

esi

gner

HA

DO

OP

3rd

Part

y D

B

MPP

Scale

-Out SAP Sybase SQLA

SAP Sybase ESP

SAP Sybase IQ

SAP Sybase

Replication Server

SAP Data

Services SAP MDG, MDM

Page 18: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Approach using LOD technology (W3C)

© AKSW (LOD2) – NRU HSE / W3C Slide 18

Page 19: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Linked Data in Enterprise Information Integration

© AKSW (LOD2) – NRU HSE / W3C Slide 19

Ref.: P. Frischmuth et al.

Page 20: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

LED principles (or W3C LOD Cookbook)

Publishing

• Analyse Data

• Clean your Data

• Model your Data (Vocab.)

• Choose vocabularies

• Specify license(s)

• Convert to RDF

• Link Data to other Data

• Publish and promote

Consuming LOD

• Specify use cases

• Evaluate relevant data

sources and data sets

• Check licenses

• Create consumption

patterns

• Manage alignment

• Create Mashup, GUIs,

serrvices and

applications on top

© AKSW (LOD2) – NRU HSE / W3C Slide 20

Page 21: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

LED Best Practice - Vocabularies

• Prerequisites Linked Data Vocabs

– Terms must be referencable (e.g. via

URI)

– References have to be unambiguous

– Terms have to be mappable (maybe using

SKOS)

• Vocabularies (co-existence)

– UDEF, AGROVOC, folksonomies

(del.icio.us), Company Data Dictionaries

– Apply SKOS (W3C standard)

© AKSW (LOD2) – NRU HSE / W3C Slide 21

Page 22: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Example of Ontology/Vocab Repository

© AKSW (LOD2) – NRU HSE / W3C Slide 22

http://ontowiki.net/Projects/OntoWiki

http://protege.stanford.edu/

Page 23: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

LED Best Practice – Data Curation

• The Business Need for Curation

– Complete, Accurate, Consistent, Provenance,

Timeliness

• Leads to a process:

> Identify data you need > Who will curate it >

Define curation process > Define tools, processes

needed to support the curation.

• How? Which Community approach:

– Internal (privat data)

– (External) Pre-competitive

– External – Crowd-sourcing

© AKSW (LOD2) – NRU HSE / W3C Slide 23

Page 24: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Data Curation Examples

• WikiPedia (crowd-sourcing) > DBPedia

• NYT Index (Started in 1913)

• Print «Index» once a year

– What about Online business?

© AKSW (LOD2) – NRU HSE / W3C

Slide 24

Page 25: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

NYT Index (Online)

© AKSW (LOD2) – NRU HSE / W3C Slide 25

WorkFlow at NYT (simplified)

1. Editor writes articles

2. Process article using autom.

Tagging (rNews) with NLP

3. Publish article online

4. Data curator review tagging and

correct manually

Page 26: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Demo of possible data curation process

© AKSW (LOD2) – NRU HSE / W3C Slide 26

RDFaCE PlugIn

- Various NLP

- RDFa in HTML

- rNews/schema.org

- RDF to EKB/IKB

- Data Curation

Ontos Framework

Page 27: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

A possible framework (LED)

© AKSW (LOD2) – NRU HSE / W3C Slide 27

RDBMS (Org.Data)

Docs (HTML)

Social Networks

Linked Op.Data

Trip

le S

tore

Bas

e Te

chn

olo

gy

Sou

rces

Manag. Knowledge

Quality & Coherence

Extraction

Unstructured

Semi-sructured

Structured

Linking

Matching

Data-Quality

Co-Evolution

Curation

Orchas-tration

Scal

abili

ty

Use

r-In

terf

ace

Scalable Search in Linked Data

Ap

ps

Eventos – Filter, Categorize, Visualise

CRM Int.

Media- News

E-Gov Eco(API)

Predictive Analysis

...

Page 28: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Tool Box (excerpt)

• W3C

– Guides and charters (http://www.w3.org/standards/semanticweb/data)

– Validator suite (http://www.w3.org/QA/Tools/)

• LOD2 Technology Stack

• Sindice

• Silk

• LIMES

• NLP: OntosMiner, OpenCalais, GATE, UIMA

• RDF Store: Ontos, Virtuoso, AllegroGraph,

4Store http://www.garshol.priv.no/blog/231.html

© AKSW (LOD2) – NRU HSE / W3C Slide 28

Based on EU FPx

Often Open Source

Page 29: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

LED – USE CASES

Early adopters

© AKSW (LOD2) – NRU HSE / W3C Slide 29

Page 30: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Digital News and Semantics

30

Early adopters of RDF(a), SPARQL etc

– NYTIMES, BBC, Guardien, AP etc.

© AKSW (LOD2) – NRU HSE / W3C

Page 31: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

rNews (vocab/ontology)

31

http://dev.iptc.org/rNews

RDF triple subject – predicat - object

© AKSW (LOD2) – NRU HSE / W3C

Intro by Evan Sandhaus/NYT: http://vimeo.com/22891051

Page 33: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

rNews Guideline

33

Artikel

http://dev.iptc.org/rNews-Sample-Story

Guideline:

http://dev.iptc.org/rNews-10-Implementation-

Guide-Introduction

Using schema.org (namespace)

http://dev.iptc.org/rNews-10-Implementation-

Guide-HTML-5-Microdata

Using IPTC (namespace)

http://dev.iptc.org/Implementation-Guide-HTML-

5-Microdata-in-IPTC-namespace

Example

http://www.nytimes.com/2012/09/19/world/asia/n

ato-curbs-joint-operations-with-afghan-

troops.html?_r=3

Validation:

http://www.w3.org/RDF/Validator/

http://www.google.com/webmasters/tools/richsnippets © AKSW (LOD2) – NRU HSE / W3C

Page 35: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

With structured

data

No structured

data

By understanding the structured data on a web page, search

engines can better present that web page to users.

Source: schema.org 2011

Why rNews

rNews markup allows you to describe the content on your site in a

machine-understandable way using RDFa.

© AKSW (LOD2) – NRU HSE / W3C

Page 36: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Cash/Ringier

© AKSW (LOD2) – NRU HSE / W3C

Page 37: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Cash Project

Objectives

• Similarity of articles

• Relevancy, Ranking

• SEO optimisation

• Metadata for MashUp

© AKSW (LOD2) – NRU HSE / W3C 37

Page 38: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

RIA Novosti

© AKSW (LOD2) – NRU HSE / W3C Slide 38

21 10

4

5

2

1 11

16 3

1 12

14

17

10

12

2

2 9 3

1 1

3

1

1

Page 39: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

BBC – Dynamic Semantic Publishing

© AKSW (LOD2) – NRU HSE / W3C Slide 39

Page 41: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

RDF(a) vs Schema.org

© AKSW (LOD2) – NRU HSE / W3C Slide 41

by Google, Yahoo, BING, Yandex

http://schema.org/docs/schemas.html

Page 42: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Google Knowledge Graph

© AKSW (LOD2) – NRU HSE / W3C Slide 42

Page 45: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

LINKED DATA AT CAR COMPANY

Based on http://semantic-web-journal.net/content/linked-data-

enterprise-information-integration

http://semantic-web-journal.net/sites/default/files/swj300.pdf

© AKSW (LOD2) – NRU HSE / W3C Slide 45

Page 46: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

LED at abc (Proof of Concept)

© AKSW (LOD2) – NRU HSE / W3C Slide 46

• The situation at abc:

• 3.000 heterogeneous IT systems

• Different units (car, bus, truck etc.) with very different views

• No common language

• Inability to identify crucial entities (parts, locations etc.) enterprise wide

• There is no (can not be a) single Enterprise Information Model

• A distributed, iterative, bottom-up integration approach such as Linked Data might be able to help (pay-as-you-go).

Finance Student Affairs

Equipment & Assets

Enterprise-Wide

Reusable Information

Page 47: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Extraction from RDBMS

“SPARQLMap – Mapping RDB 2 RDF“

© AKSW (LOD2) – NRU HSE / W3C Slide 47

1.Either resulting RDF knowledge base is materialized in a triple store &

2.subsequently queried using SPARQL

3.or the materialization step is avoided by dynamically mapping an input SPAQRL query into a corresponding SQL query, which renders exactly the same results as the SPARQL query being executed against the materialized RDF dump

Page 48: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Linked Government Data W3C eGovernment Interest Group http://www.w3.org/egov/wiki/Main_Page

Data.gov / data.gov.uk / W3C LGD

© AKSW (LOD2) – NRU HSE / W3C Slide 48

Page 49: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Open Government Data is a worldwide movement

to open data (& information) of the government /

public administration* - that is NOT personal

(individual related) – in human- and maschine

readable open formats (non proprietary) for use & re

use!

OPEN stands for lowering the barriers to ensure as broad as

possible re-use (for everybody)!

There is a new paradigm in publishing Open Government Data

= look, take and play!

* ….. data and information produced or commissioned by government or government controlled entities

What is Open (Government) Data?

© AKSW (LOD2) – NRU HSE / W3C

Page 50: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

What is important when thinking about open data in use?

•Interoperability to ensure broad & easy use & re-use

•Human AND machine readable data and meta data

•In open formats

•For smooth and cost efficient data integration

•To generate effects on several levels:

local – regional – national – EU wide & worldwide

For several target groups with several interests!

•Public administration (also for internal use)

•Politicians & decision makers

•Citizens (Citizen Analysts)

•Economy & Industry (data integration, -enrichment, APPs)

•(Data) Journalists, media & publishers

•Academia & Science

What is Important? For Whom?

© AKSW (LOD2) – NRU HSE / W3C

Page 51: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Data.gov (Open Data Sets) and Mashups

© AKSW (LOD2) – NRU HSE / W3C Slide 51

Civic Commons has a great collection of good open use cases:

http://civiccommons.org/

Page 53: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

E.g. Chicago - https://data.cityofchicago.org/

© AKSW (LOD2) – NRU HSE / W3C Slide 53

Page 54: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

5 Star Pyramid of Open Data

© AKSW (LOD2) – NRU HSE / W3C Slide 54

http://5stardata.info/ (Dr M. Hausenblas, DERI)

http://openorg.ecs.soton.ac.uk/wiki/Linked_Data_Basics_for_Techies

See also:Christopher Gutteridge has a Linked Data crash course for

programmers.

Page 55: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

HANDS-ON

Let’s apply our knowledge

© AKSW (LOD2) – NRU HSE / W3C Slide 55

Page 56: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Example…..

https://www.dropbox.com/s/uzulsw3zu9eyff2/LOD_Test.zip

© AKSW (LOD2) – NRU HSE / W3C Slide 56

Page 57: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

SUMMARY

Wrap-Up: Benefits and Limitations

© AKSW (LOD2) – NRU HSE / W3C Slide 57

Page 58: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Misconceptions about Linked Open Data

© AKSW (LOD2) – NRU HSE / W3C Slide 58

• All of us have to use ONE schema

• Everything needs to be switched to

RDF

• We all have to learn SPARQL, there

are no standard (web) APIs

• LOD is a pure academic approach

• LOD can only be used by Semantic

Web experts

• We have to change our data

integration & -management

approaches

Page 59: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

The Power of Linked Open Data

© AKSW (LOD2) – NRU HSE / W3C Slide 59

• Enables web-scale data publishing - distributed publication with web-

based discovery mechanisms

• Everything is a resource – follow your nose to discover more about

properties, classes, or codes within a code list

• Everything can be annotated - make comments about observations,

data series, points on a map

• Easy to extend - create new properties as required, no need to plan

everything up-front

• Easy to merge - slot together RDF graphs, no need to worry about name

clashes

• Easy use and re-use on top of common schemas AND schema mapping

• Allows complex querying of several distributed data sources & systems

Page 60: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

The Benefits of Linked Open Data

© AKSW (LOD2) – NRU HSE / W3C Slide 60

• Less replication (offering same

datasets in different places)

• Encouragement to re-use existing

datasets

• Clear which datasets are providing

similar / same information

• More innovation because datasets

can be put in a new context and

lead to interesting applications

• Put information in context and

thereby create knowledge

Page 61: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Cost of Data Integration – 2 Approaches

© AKSW (LOD2) – NRU HSE / W3C Slide 61

Source: Price Waterhouse Coopers – Technology Forecast, Spring 2009

Can we afford to

mash the data with

ours?

Page 62: KESW2012 Linked Data for Enterprises and Governments (5 Oct 2012)

Q & A

End of the Day (tomorrow hackathon for Open Gov Data)

© AKSW (LOD2) – NRU HSE / W3C Slide 62