© 2008 openlink software, all rights reserved open conceptual data models making the conceptual...
TRANSCRIPT
© 2008 OpenLink Software, All rights reserved
Open Conceptual Data Models
Making the Conceptual Layer Real
via
HTTP based
Linked Data (aka. Linked Data)
© 2008 OpenLink Software, All rights reserved
Situation Analysis
Linked Data Vision: The transition of the HTTP based Webs (Intranet, Extranet,
or Internet) from a Webs of Linked Documents to Webs of interlinked Structured Data Items
(aka: entities, data objects, resources)
Concurrent trend in the IT industry: A recognition of the benefits of conceptual data models
over logical data models
The Big Question: To what extent does Linked Data support conceptual level
data models ?
© 2008 OpenLink Software, All rights reserved
Open Conceptual Data Models
Topics:
Conceptual & Logical Data Models
Conceptual Models for the Semantic Web
Realizing Conceptual Models through Ontologies & Linked Data
Virtuoso’s RDF based Linked Data Views
ADO.NET Data Services & the Entity Data Model
© 2008 OpenLink Software, All rights reserved
Data Model Layers
Physical How data is physically represented on disk
Logical (aka logical schema) Expresses problem domain in terms of data
management technology (tables / columns) e.g. relational schema
Conceptual (aka conceptual schema) Purely semantic description of problem space Describes things (entities), their characteristics
(attributes) & associations between things (relationships)
© 2008 OpenLink Software, All rights reserved
Logical Data Model
Most prominent of the three data model types
Main focus of database driven applications Due to pervasiveness of relational database driven
applications within the enterprise and across the Web
Weaknesses Impedance mismatch
Loss of semantics during development process
Heterogeneous databases & interoperability
© 2008 OpenLink Software, All rights reserved
Logical Data Model Weaknesses
Impedance Mismatch SQL expresses queries in terms of tables / views
=> targets logical schema Normalization fragments the data model
Entities & their attributes may be split across several tables
Navigation between objects requires relational joins over two or more tables
Table rows must be reconstituted into higher level conceptual entities
Conceptual level data model is desirable to: Remove impedance mismatch Isolate application from changes to logical data model Provide framework for productive human level interaction
© 2008 OpenLink Software, All rights reserved
Logical Data Model Weaknesses
Loss of Semantic Fidelity During Development
Process: Develop conceptual model (E-R modelling) Transform to logical model for implementation DBMS generates physical model
Problems: Each move to a lower level model depreciates semantic
fidelity of the higher level model Conceptual Model semantics fragmented across schema /
business rules / application code Application & Users must understand logical data model
Must be hardcoded or inferred (imperfectly) from system tables
© 2008 OpenLink Software, All rights reserved
Logical Data Model Weaknesses
Heterogeneous Databases & Interoperability
Logical data model Describes problem domain in terms of tables/columns Requires costly table joins to navigate model
Application Exposed to specifics of a particular vendor’s RDBMSIn heterogeneous database environment, must handle Different SQL dialects Different schemas
No explicit data model. No explicit semantics. Interoperability/integration = perpetual problem for IT depts
© 2008 OpenLink Software, All rights reserved
Conceptual Models for Linked Data Webs
Explosion of User Generated Data from Web 2.0 applications and their Data Silos is driving the recognition of the need to
move from logical to conceptual models, exemplified by:
Microsoft’s Entity Data Model / Entity Framework
W3C’s Semantic Web Project which includes powerful technologies for this paradigm shift such as:
Resource Description Framework (RDF Data Model and Data Representation Formats)
Web Ontology Language (OWL)
SPARQL (Query Language, RESTful Interface, and Query Result Serialization Formats)
© 2008 OpenLink Software, All rights reserved
Benefits of Conceptual Models
More faithfully represents human view of domain of interest
Conceptual model & semantics Explicit & available globallyNot implicit & fragmented across business logic /
UI etc Better / explicit semantics facilitates move from
“search” to “esoteric precision find” Much easier heterogeneous data integration
User Generated Data is inherently heterogeneous & disparately located
© 2008 OpenLink Software, All rights reserved
Application Areas – Present & Future
Social Media, eCommerce, Distributed Collaborative Apps. Require shareable, standards-based, cross-platform
conceptual views of data Data portability
Needed as users maintain multiple points of presence & identity across – blogs, social network accounts etc.
Open business models Require exchange & integration of large amounts of data
Scientific research – sharing of knowledge & findings Requires transparent access to distributed
heterogeneous data Requires database integration using global schema
Autonomous intelligent agents Free humans from large-volume data processing
© 2008 OpenLink Software, All rights reserved
Semantic Web Project TechnologiesThese technologies offer:
Ontologies For representing common semantics
Spanning databases, applications, enterprises, on-line communities Deliver shared conceptual model Provide common schemas (Dublin Core, FOAF, SIOC, GoodRelations
etc)
Common Semantics (Ontologies) & Common Data Representation (RDF) Enable cross data source querying using SPARQL
Data across several databases (or data spaces) can be meshed, expanded, and explored
Querying using proprietary APIs unnecessary Brute force data merging via code is unnecessary
Open Data Formats, Platform Independence, Common Models Facilitate data portability, accessibility, and integration.
© 2008 OpenLink Software, All rights reserved
Realizing Conceptual Models
Ontologies Provide the building blocks for conceptual models Define the concepts and their relationships in a domain of
interest (or world view)
Describing Classes & Properties – Ontology Languages RDFS
Introduces the notions of concepts (classes) & instances OWL
Adds more vocabulary for describing: relations between classescardinality richer typing of properties, etc.
© 2008 OpenLink Software, All rights reserved
Goodness of Fit
RDF was designed from the ground up as a metadata data model
RDF / RDFS / OWL work directly at the level of conceptual models
Conceptual model terminology matches RDF/OWL terminology Concepts, entities, attributes, relationships.
A natural fit!RDF lends itself naturally to describing conceptual models.
© 2008 OpenLink Software, All rights reserved
Semantic Expressivity Comparison
Data Definition Language (DDL)-based Relational Model Relationship between two entities isn’t explicit Foreign key relating two rows in separate tables doesn’t
express the nature of the relationship Semantics must often be inferred from table definitions
RDF-based Conceptual Model Relationship between two entities is stated explicitly by
predicate in subject-predicate-object triple Semantic expressivity of RDF/RDFS/OWL is much better
than DDL Has richer semantic content than equivalent DDL-based
logical/relational model
© 2008 OpenLink Software, All rights reserved
RDF Conceptual Model – Artist / Records / Tracks
© 2008 OpenLink Software, All rights reserved
Global Granular Information Sharing
Traditional Logical/Relational Data Model Schema described by DDL is internal to DBMS Primary keys identifying an individual table row
(i.e. entity instance) not globally unique, not easily usable outside host DBMS
Gives rise to ‘data silos’
RDF’s use of Generic HTTP-based URIs Externalises the data and schema Makes both globally accessible & scalable Provides globally unique IDs for entities/relations/classes A vehicle for granular, global information sharing down to
the equivalent of the record level.
© 2008 OpenLink Software, All rights reserved
Linked Data – What is It?
A method for exposing, sharing & connecting data on HTTP based Data Networks.
A term coined by Tim Berners-Lee that describes a RESTful mechanism for HTTP based Data Access & Manipulation by Reference
A record level HTTP based Open Data Access & Connectivity mechanism
A richer hyperlinking mechanism that takes us from Hypertext Links (Document to Document) to Hyperdata Links (Data Item to Data Item).
© 2008 OpenLink Software, All rights reserved
Linked Data – Why Is It Important
It exposes the compound nature of Data Containers (e.g., Documents) such thatData Containers are uniquely identified &
referenceableData Items within Data Containers are uniquely
identified & referencable It provides a conceptual model oriented Open Data
Access & Connectivity mechanism It delivers a powerful mechanism for meshing
disparate and heterogeneous data sources.
© 2008 OpenLink Software, All rights reserved
Linked Data Model
Changes the focus from linked documents to linked entitiesThe document as a data container becomes less relevant
© 2008 OpenLink Software, All rights reserved
Hyperdata Links Between Data Items
© 2008 OpenLink Software, All rights reserved
Linked Data Benefits – Data Exploration
Natural Navigation Through Typed Links RDF entities (instance data, classes, and properties) are
identified by dereferencable HTTP URIs Navigating from one data item to another is easy via:
Single LINK click from any HTTP user agent commences data item relationship navigation
Linked Data Browers such as OpenLink Data Explorer
Relational/Logical Model Cumbersome Requires SQL joins + typically Object-Relational mapping e.g. in C# : track = lennonAlbum.Tracks[“Imagine”]
© 2008 OpenLink Software, All rights reserved
Linked Data Benefits - Aggregatable Data
Often desirable to have an integrated view of all the data available about an item or topic
Database Realm Integration problematic, difficult to combine logical schemas
Semantic Web Data aggregation is easy: every resource has a unique URI
Individual items can be linked Conceptual models can be linked
Cross-domain links enrich domain knowledge Different facets of the same data item may be described by
different URIs minted by different authors Can be linked. e.g. owl:sameAs, rdf:type predicates May expose facts not directly represented in any one
source
© 2008 OpenLink Software, All rights reserved
Linked Data – Data Aggregation
© 2008 OpenLink Software, All rights reserved
Linked Data Benefits - Self Describing Data
Resource Description Framework (RDF) A technology for creating self-describing Web resources Data Item’s type definition ‘accompanies’ it via rdfs:type
relations An RDF based data can be queried using SPARQL without
knowing anything beforehand about the data definition (schema comes last in this realm)
Provides the basis for powerful deductive data exploration tools
Logical / Relational Schema Users / applications need a detailed understanding of the
schema to use and navigate the data Application’s knowledge of the schema typically hardcoded Ad-hoc end-user data exploration potentially error prone
© 2008 OpenLink Software, All rights reserved
Linked Data Benefits - SPARQL
If a user agent has no built-in knowledge of a particular Data Item, it can dereference its Generic HTTP URI to obtain
such information
The Power of SPARQLDiscover what sorts of things a data source contains select distinct ?URI ?ObjectType where { ?URI a ?
ObjectType }
Determine all the properties of an data item’s class select * where { <http://my.org/resourceTypes/Department>
?property ?hasValue }
Determine all the properties and values of an data item instance
DESCRIBE <http://my.org/resource/Accounts>
No prior knowledge of the RDF data source is needed
© 2008 OpenLink Software, All rights reserved
Virtuoso - Linked Data Generation OptionsConceptual layer insulates Linked Data consumers from RDFization infrastructure & data source heterogeneity
© 2008 OpenLink Software, All rights reserved
Virtuoso RDF based Linked Data Views
Expose relational model data as RDF graph model data Provide the means to move from a logical model to a
conceptual model view
Available for querying through SPARQL or SPASQL (SPARQL embedded in SQL)
No physical regeneration of relational data
RDF Views = Virtuoso RDF Meta-Schema (MSL) + Meta-Schema Language
MSL = A domain specific, declarative language for mapping a
logical SQL data model to a conceptual RDF data model
© 2008 OpenLink Software, All rights reserved
Northwind Demo Database:RDF View Definition Extract
Customer ID
Company Name
Contact Name
Contact
Title
Address City Postal Code
Country Phone Fax
prefix northwind: <http://www.openlinksw.com/schemas/northwind#>
…
create iri class northwind:Customer <http://^{URIQADefaultHost}^/Northwind/Customer/%U#this> (in customer_id varchar not null)
…
alter quad storage virtrdf:DefaultQuadStorage
…
from Demo.demo.Customers as customers
from Demo.demo.Orders as orders … {
Demo.demo.Customers
Northwind RDF View Definition
create virtrdf:NorthwindDemo as graph iri (“http://^{URIQADefaultHost}^/Northwind”) {
…
northwind:Customer(customers.CustomerID) a foaf:Organization as virtrdf:Customer-CustomerID ;
northwind:companyName customers.CompanyName as … ;
…
northwind:fax customers.Fax as virtrdf:Customer-fax .
…
} }
northwind:Customer(orders.CustomerID)northwind:has_order northwind:Order(orders.OrderID) as virtrdf:Order-has_order .
© 2008 OpenLink Software, All rights reserved
Northwind Demo Database:Customer Table to RDF data item Mapping
Customer ID
Company Name
Contact Name
Contact
Title
Address City Postal Code
Country Phone Fax
ALFKI Alfreds Futterkiste
Maria Anders
Sales Represe-ntative
Obere Str. 57
Berlin 12209 Germany 030 - 0074321
030 - 0076545
companyName
contactName
contactTitle
address city
PostalCode
country
phone
fax
AlfredsFutterkiste
MariaAnders
SalesRepresentative
Obere Str. 57
Berlin
12209
Germany
030-0074321
030-0076545
…
Order/10643#this
has_order
Order/10692#this
…
has_order
Customer/ALFKI#this
prefix <http://demo.openlinksw.com/Northwind/> has_customer
has_customer
Order ID
CustomerID
…
10643 ALFKI …
10692 ALFKI …
Orders Table
© 2008 OpenLink Software, All rights reserved
LinqToRdf + Virtuoso
© 2008 OpenLink Software, All rights reserved
LinqToRdf to MusicBrainz - Conceptual Model Veneer
© 2008 OpenLink Software, All rights reserved
ADO.NET Data Services & Entity Data Model
A framework for exposing ‘pure data’ service over HTTP
No support for RDF Fails to imbibe any of RDF’s inherent benefits
Lack of platform independence & standards compliance Supports REST-style interfaces Supports Atom, JSON and XML payloadsBut Server-side: Windows only Consuming Astoria services at a higher level requires
Windows .NET client or Silverlight-supported browser
© 2008 OpenLink Software, All rights reserved
ADO.NET Data Services & Entity Data Model
Server-side only conceptual model Powerful URL addressing to query/navigate/sort/filter etc
Customers collection:http://myserver/data.svc/Customers
Customer ALFKI: http://myserver/data.svc/Customers('ALFKI')
Customer ALFKI's orders: http://myserver/data.svc/Customers('ALFKI')/Orders
But Client must know conceptual schema
e.g. to construct above URIs
Lack of Deferencable Entity IDs Ability to discover entities and dereference their
descriptions (attributes/relations) is confined to the facilities offered by .NET
c.f. SPARQL’s ability to handle unknown data sources
© 2008 OpenLink Software, All rights reserved
ADO.NET Data Services & Entity Data Model
No Support for Non-SQL Data Sources Astoria is aimed exclusively at making relational data Web
accessible
c.f. Linked Data Realm Recognize that vast amounts of data resides in unstructured
and semi-structured data sources Support for embedding RDF into existing (X)HTML
RDFa, GRDDL, eRDF Emerging tools for converting non-RDF data to RDF model
data Emerging tools for exposing Relational data as RDF Graph
Model data
Astoria lacks scalability & scope of Semantic Web technologies