kesw2012 linked data for enterprises and governments (5 oct 2012)
DESCRIPTION
Lecture given at IFMO and the KESW Semantic Web SchoolTRANSCRIPT
Linked Data @ KESW school Knowledge Engineering and Semantic Web (KESW), 5 Oct 2012, St-Petersburg
Dr Sören Auer „Linked Open Data“
Senior scientist and head of the research group Agile Knowledge
Engineering and Semantic Web at University of Leipzig
Daniel Hladky, MBA „Enterprise Linked Data“
Researcher at NRU HSE “Semantic Lab”, Deputy Director W3C Russia Office
Board member at Ontos, Avicomp Services, Intecor, MatchCode Software
Agenda (morning)
Time Topic Speaker
10:00 Welcome, Intro and Objectives Daniel
Essentials and W3C View
10:15 Evolution of LOD Sören
Status Quo and Current Challenges
11:30 Break
12:00 LOD Lifecycle Sören
13:30 Lunch-Break
© AKSW (LOD2) – NRU HSE / W3C Slide 2
Agenda (afternoon)
Time Topic Speaker
14:30 Linked Data for Enterprises Daniel
Use Cases
15:30 Hands-On LOD “Students”
16:00 Break
16:30 Hands-On continuation
17:30 Team presentation of hands-on
Wrap-Up Daniel
18:00 End
© AKSW (LOD2) – NRU HSE / W3C
Slide 3
Objectives
• Understand the building blocks
– URI, RDF, RDFa, SPARQL …
• Know how to «Publish» and
«Consume» Linked Open Data
• Tools, use cases and references
• Understand benefits and
limitations
© AKSW (LOD2) – NRU HSE / W3C Slide 4
The Vision of the new Internet
© AKSW (LOD2) – NRU HSE / W3C Slide 5
Linked Data realizes the vision of evolving the Web into a global data commons, allowing applications to operate on top of an unbounded set of data sources, via standardised access mechanisms. I expect that Linked Data will enable a significant evolutionary step in leading the Web to its full potential.
CC-BY-SA von campuspartybrasil (flickr)
5 Stars for Open Data by Tim Berners Lee
© AKSW (LOD2) – NRU HSE / W3C Slide 6
W3C View
© AKSW (LOD2) – NRU HSE / W3C
A new wave of transformations
Just as the Web has transformed everything…
…It will transform everything again
Working Groups (W3C Standards) (http://www.w3.org/standards/semanticweb/data)
- RDF, RDFa, SPARQL, RDB2RDF, OWL, RIF, SKOS
Slide 7
Some statistic
HTML/CSS Validation
Markup Validation
© AKSW (LOD2) – NRU HSE / W3C Slide 8
http://bit.ly/d37p4i
~30 bio. triples
The Semantic Web is already there!
© AKSW (LOD2) – NRU HSE / W3C Slide 9
Put the «L» in front of Open Data
© AKSW (LOD2) – NRU HSE / W3C Slide 10
• Give things an URI!
• Use RDF for Publishing!
• Link your Data to other Data
(as well as the data models)!
• Provide a Standard-API on top
•Provide an API!
•Organise Data!
•License Data!
•Raw Data now!
Publish Data!
Use Web-Technologies
Use Linked Data! • The web is an Ecosystem
• Networked Data creates
Network Effects
• Lowers Costs of Data
Integration
Linked Open Data
Dr Sören Auer
11
LINKED ENTERPRISE DATA DANIEL HLADKY
HTTP://WWW.W3.ORG/2001/SW/SWEO/PUBLIC/USECASES/
HTTP://WWW.W3.ORG/2012/LDP/WIKI/USE_CASES_AND_REQUIREMENTS
LOD for Enterprise and Government
© AKSW (LOD2) – NRU HSE / W3C Slide 12
What are Enterprise Data
© AKSW (LOD2) – NRU HSE / W3C 13
Legacy (ERP) System CRM System
E-Mail (Outlook) Wiki (MediaWiki)
CMS System
Data managed in silos
© AKSW (LOD2) – NRU HSE / W3C Slide 14
Finance Student affairs
Equipment
and assets
Institutions, organizations and departments create and store their own data
Departments do not effectively share information; they exchange data
Data inconsistencies, redundancies, and errors affect business results and increase
costs
Own schemas –
DB structures
Connect the silos
© AKSW (LOD2) – NRU HSE / W3C Slide 15
Finance Student Affairs
Equipment & Assets
Enterprise-Wide Reusable
Information
Data Integration by SAP
© AKSW (LOD2) – NRU HSE / W3C Slide 16
SUPPLIER EMPLOYEE CUSTOMER PRODUCT
MDM
SAP MDM Load master data from multiple transactional
systems (SAP & non-SAP) into a single, unified repository
Identify and consolidate similar master data values to eliminate duplicates
Enrich master data values centrally for enterprise wide purposes (such as reporting)
SAP BI (BW)
Integrate data from any SAP or non-SAP data
source for analytics or business-transaction
processing
Extract, transform, and load (ETL) data in
batch or real time
Next generation SAP Real-time
Data Platform and “EIM”
© AKSW (LOD2) – NRU HSE / W3C Slide 17
3rd Party
BI Client
SAP NetWeaver (On Premise / Cloud)
Custom
Apps
SAP Business
Suite
SAP Business
Warehouse
SAP Big Data Applications
SAP Analytics
SAP Mobile
Open Developer APIs and Protocols
Com
mon
Landsc
ape M
anagem
ent
SAP Smart Data Services Platform
SAP HANA Platform
SAP Real-time Data Platform
SAP Sybase ASE
Com
mon
Modeling
Sybase
Pow
erD
esi
gner
HA
DO
OP
3rd
Part
y D
B
MPP
Scale
-Out SAP Sybase SQLA
SAP Sybase ESP
SAP Sybase IQ
SAP Sybase
Replication Server
SAP Data
Services SAP MDG, MDM
Approach using LOD technology (W3C)
© AKSW (LOD2) – NRU HSE / W3C Slide 18
Linked Data in Enterprise Information Integration
© AKSW (LOD2) – NRU HSE / W3C Slide 19
Ref.: P. Frischmuth et al.
LED principles (or W3C LOD Cookbook)
Publishing
• Analyse Data
• Clean your Data
• Model your Data (Vocab.)
• Choose vocabularies
• Specify license(s)
• Convert to RDF
• Link Data to other Data
• Publish and promote
Consuming LOD
• Specify use cases
• Evaluate relevant data
sources and data sets
• Check licenses
• Create consumption
patterns
• Manage alignment
• Create Mashup, GUIs,
serrvices and
applications on top
© AKSW (LOD2) – NRU HSE / W3C Slide 20
LED Best Practice - Vocabularies
• Prerequisites Linked Data Vocabs
– Terms must be referencable (e.g. via
URI)
– References have to be unambiguous
– Terms have to be mappable (maybe using
SKOS)
• Vocabularies (co-existence)
– UDEF, AGROVOC, folksonomies
(del.icio.us), Company Data Dictionaries
– Apply SKOS (W3C standard)
© AKSW (LOD2) – NRU HSE / W3C Slide 21
Example of Ontology/Vocab Repository
© AKSW (LOD2) – NRU HSE / W3C Slide 22
http://ontowiki.net/Projects/OntoWiki
http://protege.stanford.edu/
LED Best Practice – Data Curation
• The Business Need for Curation
– Complete, Accurate, Consistent, Provenance,
Timeliness
• Leads to a process:
> Identify data you need > Who will curate it >
Define curation process > Define tools, processes
needed to support the curation.
• How? Which Community approach:
– Internal (privat data)
– (External) Pre-competitive
– External – Crowd-sourcing
© AKSW (LOD2) – NRU HSE / W3C Slide 23
Data Curation Examples
• WikiPedia (crowd-sourcing) > DBPedia
• NYT Index (Started in 1913)
• Print «Index» once a year
– What about Online business?
© AKSW (LOD2) – NRU HSE / W3C
Slide 24
NYT Index (Online)
© AKSW (LOD2) – NRU HSE / W3C Slide 25
WorkFlow at NYT (simplified)
1. Editor writes articles
2. Process article using autom.
Tagging (rNews) with NLP
3. Publish article online
4. Data curator review tagging and
correct manually
Demo of possible data curation process
© AKSW (LOD2) – NRU HSE / W3C Slide 26
RDFaCE PlugIn
- Various NLP
- RDFa in HTML
- rNews/schema.org
- RDF to EKB/IKB
- Data Curation
Ontos Framework
A possible framework (LED)
© AKSW (LOD2) – NRU HSE / W3C Slide 27
RDBMS (Org.Data)
Docs (HTML)
Social Networks
Linked Op.Data
Trip
le S
tore
Bas
e Te
chn
olo
gy
Sou
rces
Manag. Knowledge
Quality & Coherence
Extraction
Unstructured
Semi-sructured
Structured
Linking
Matching
Data-Quality
Co-Evolution
Curation
Orchas-tration
Scal
abili
ty
Use
r-In
terf
ace
Scalable Search in Linked Data
Ap
ps
Eventos – Filter, Categorize, Visualise
CRM Int.
Media- News
E-Gov Eco(API)
Predictive Analysis
...
Tool Box (excerpt)
• W3C
– Guides and charters (http://www.w3.org/standards/semanticweb/data)
– Validator suite (http://www.w3.org/QA/Tools/)
• LOD2 Technology Stack
• Sindice
• Silk
• LIMES
• NLP: OntosMiner, OpenCalais, GATE, UIMA
• RDF Store: Ontos, Virtuoso, AllegroGraph,
4Store http://www.garshol.priv.no/blog/231.html
© AKSW (LOD2) – NRU HSE / W3C Slide 28
Based on EU FPx
Often Open Source
LED – USE CASES
Early adopters
© AKSW (LOD2) – NRU HSE / W3C Slide 29
Digital News and Semantics
30
Early adopters of RDF(a), SPARQL etc
– NYTIMES, BBC, Guardien, AP etc.
© AKSW (LOD2) – NRU HSE / W3C
rNews (vocab/ontology)
31
http://dev.iptc.org/rNews
RDF triple subject – predicat - object
© AKSW (LOD2) – NRU HSE / W3C
Intro by Evan Sandhaus/NYT: http://vimeo.com/22891051
References to RDF(a)
© AKSW (LOD2) – NRU HSE / W3C Slide 32
http://www.w3.org/TR/rdf-primer/
http://dev.iptc.org/Introduction-To-RDFa
http://www.w3.org/TR/2011/WD-rdfa-primer-
20110419/
http://www.w3.org/TR/rdfa-lite/
rNews Guideline
33
Artikel
http://dev.iptc.org/rNews-Sample-Story
Guideline:
http://dev.iptc.org/rNews-10-Implementation-
Guide-Introduction
Using schema.org (namespace)
http://dev.iptc.org/rNews-10-Implementation-
Guide-HTML-5-Microdata
Using IPTC (namespace)
http://dev.iptc.org/Implementation-Guide-HTML-
5-Microdata-in-IPTC-namespace
Example
http://www.nytimes.com/2012/09/19/world/asia/n
ato-curbs-joint-operations-with-afghan-
troops.html?_r=3
Validation:
http://www.w3.org/RDF/Validator/
http://www.google.com/webmasters/tools/richsnippets © AKSW (LOD2) – NRU HSE / W3C
Demo RDFa (rNews)
34
http://hladky.ch/digipub/fake_news_html.html
http://dev.iptc.org/rNews-10-Implementation-Guide-HTML-5-Microdata
With structured
data
No structured
data
By understanding the structured data on a web page, search
engines can better present that web page to users.
Source: schema.org 2011
Why rNews
rNews markup allows you to describe the content on your site in a
machine-understandable way using RDFa.
© AKSW (LOD2) – NRU HSE / W3C
Cash/Ringier
© AKSW (LOD2) – NRU HSE / W3C
Cash Project
Objectives
• Similarity of articles
• Relevancy, Ranking
• SEO optimisation
• Metadata for MashUp
© AKSW (LOD2) – NRU HSE / W3C 37
RIA Novosti
© AKSW (LOD2) – NRU HSE / W3C Slide 38
21 10
4
5
2
1 11
16 3
1 12
14
17
10
12
2
2 9 3
1 1
3
1
1
BBC – Dynamic Semantic Publishing
© AKSW (LOD2) – NRU HSE / W3C Slide 39
More from BBC
© AKSW (LOD2) – NRU HSE / W3C Slide 40
http://www.w3.org/2001/sw/sweo/public/UseCases/BBC/
http://www.bbc.co.uk/blogs/bbcinternet/2012/07/olympic_data_se
rvices_and_the.html
RDF(a) vs Schema.org
© AKSW (LOD2) – NRU HSE / W3C Slide 41
by Google, Yahoo, BING, Yandex
http://schema.org/docs/schemas.html
Google Knowledge Graph
© AKSW (LOD2) – NRU HSE / W3C Slide 42
E-Commerce - GoodRelations
© AKSW (LOD2) – NRU HSE / W3C Slide 43
http://purl.org/goodrelations/
http://www.ebusiness-unibw.org/tools/goodrelations-
annotator/
Introduction by Dr M. Hepp from SemTech 2010
http://www.slideshare.net/mhepp/goodrelations-semtech2010-4590918
Magento Extension
© AKSW (LOD2) – NRU HSE / W3C Slide 44
http://www.magentocommerce.com/magento-connect/semantium/extension/2838/semantium_msemanticbasic#overview
http://www.heppnetz.de/ontologies/goodrelations/v1.html
LINKED DATA AT CAR COMPANY
Based on http://semantic-web-journal.net/content/linked-data-
enterprise-information-integration
http://semantic-web-journal.net/sites/default/files/swj300.pdf
© AKSW (LOD2) – NRU HSE / W3C Slide 45
LED at abc (Proof of Concept)
© AKSW (LOD2) – NRU HSE / W3C Slide 46
• The situation at abc:
• 3.000 heterogeneous IT systems
• Different units (car, bus, truck etc.) with very different views
• No common language
• Inability to identify crucial entities (parts, locations etc.) enterprise wide
• There is no (can not be a) single Enterprise Information Model
• A distributed, iterative, bottom-up integration approach such as Linked Data might be able to help (pay-as-you-go).
Finance Student Affairs
Equipment & Assets
Enterprise-Wide
Reusable Information
Extraction from RDBMS
“SPARQLMap – Mapping RDB 2 RDF“
© AKSW (LOD2) – NRU HSE / W3C Slide 47
1.Either resulting RDF knowledge base is materialized in a triple store &
2.subsequently queried using SPARQL
3.or the materialization step is avoided by dynamically mapping an input SPAQRL query into a corresponding SQL query, which renders exactly the same results as the SPARQL query being executed against the materialized RDF dump
Linked Government Data W3C eGovernment Interest Group http://www.w3.org/egov/wiki/Main_Page
Data.gov / data.gov.uk / W3C LGD
© AKSW (LOD2) – NRU HSE / W3C Slide 48
Open Government Data is a worldwide movement
to open data (& information) of the government /
public administration* - that is NOT personal
(individual related) – in human- and maschine
readable open formats (non proprietary) for use & re
use!
OPEN stands for lowering the barriers to ensure as broad as
possible re-use (for everybody)!
There is a new paradigm in publishing Open Government Data
= look, take and play!
* ….. data and information produced or commissioned by government or government controlled entities
What is Open (Government) Data?
© AKSW (LOD2) – NRU HSE / W3C
What is important when thinking about open data in use?
•Interoperability to ensure broad & easy use & re-use
•Human AND machine readable data and meta data
•In open formats
•For smooth and cost efficient data integration
•To generate effects on several levels:
local – regional – national – EU wide & worldwide
For several target groups with several interests!
•Public administration (also for internal use)
•Politicians & decision makers
•Citizens (Citizen Analysts)
•Economy & Industry (data integration, -enrichment, APPs)
•(Data) Journalists, media & publishers
•Academia & Science
What is Important? For Whom?
© AKSW (LOD2) – NRU HSE / W3C
Data.gov (Open Data Sets) and Mashups
© AKSW (LOD2) – NRU HSE / W3C Slide 51
Civic Commons has a great collection of good open use cases:
http://civiccommons.org/
Where my money goes (Greece)
© AKSW (LOD2) – NRU HSE / W3C Slide 52
http://publicspending.medialab.ntua.gr/en/#/~/total
http://dl.dropbox.com/u/46182458/2012-06-19%20ps.gr%20BRU.pdf
E.g. Chicago - https://data.cityofchicago.org/
© AKSW (LOD2) – NRU HSE / W3C Slide 53
5 Star Pyramid of Open Data
© AKSW (LOD2) – NRU HSE / W3C Slide 54
http://5stardata.info/ (Dr M. Hausenblas, DERI)
http://openorg.ecs.soton.ac.uk/wiki/Linked_Data_Basics_for_Techies
See also:Christopher Gutteridge has a Linked Data crash course for
programmers.
HANDS-ON
Let’s apply our knowledge
© AKSW (LOD2) – NRU HSE / W3C Slide 55
Example…..
https://www.dropbox.com/s/uzulsw3zu9eyff2/LOD_Test.zip
© AKSW (LOD2) – NRU HSE / W3C Slide 56
SUMMARY
Wrap-Up: Benefits and Limitations
© AKSW (LOD2) – NRU HSE / W3C Slide 57
Misconceptions about Linked Open Data
© AKSW (LOD2) – NRU HSE / W3C Slide 58
• All of us have to use ONE schema
• Everything needs to be switched to
RDF
• We all have to learn SPARQL, there
are no standard (web) APIs
• LOD is a pure academic approach
• LOD can only be used by Semantic
Web experts
• We have to change our data
integration & -management
approaches
The Power of Linked Open Data
© AKSW (LOD2) – NRU HSE / W3C Slide 59
• Enables web-scale data publishing - distributed publication with web-
based discovery mechanisms
• Everything is a resource – follow your nose to discover more about
properties, classes, or codes within a code list
• Everything can be annotated - make comments about observations,
data series, points on a map
• Easy to extend - create new properties as required, no need to plan
everything up-front
• Easy to merge - slot together RDF graphs, no need to worry about name
clashes
• Easy use and re-use on top of common schemas AND schema mapping
• Allows complex querying of several distributed data sources & systems
The Benefits of Linked Open Data
© AKSW (LOD2) – NRU HSE / W3C Slide 60
• Less replication (offering same
datasets in different places)
• Encouragement to re-use existing
datasets
• Clear which datasets are providing
similar / same information
• More innovation because datasets
can be put in a new context and
lead to interesting applications
• Put information in context and
thereby create knowledge
Cost of Data Integration – 2 Approaches
© AKSW (LOD2) – NRU HSE / W3C Slide 61
Source: Price Waterhouse Coopers – Technology Forecast, Spring 2009
Can we afford to
mash the data with
ours?
Q & A
End of the Day (tomorrow hackathon for Open Gov Data)
© AKSW (LOD2) – NRU HSE / W3C Slide 62