reading group: from database to dataspaces

27
From Databases to Dataspaces* Wearing the Linked Data goggles * M. Franklin, A. Halevy, D. Maier in ACM SIGMOD Record, Dez. DERI reading group presentation 23.02.2011 PhD J. Umbrich

Upload: juergen-umbrich

Post on 20-Jan-2015

495 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Reading Group: From Database to Dataspaces

From Databases to Dataspaces*

Wearing the Linked Data goggles

* M. Franklin, A. Halevy, D. Maier in ACM SIGMOD Record, Dez. 2005

DERI reading group presentation 23.02.2011 PhD J. Umbrich

Page 2: Reading Group: From Database to Dataspaces

Background of the paper

• Motivation of the paper in 2005

• The authors

• Motivation of the paper in 2005– Development of relational database management

systems showed spectacular results– BUT: “data everywhere” and use cases relying on large

amount of diverse, interrelated data sources poses new challenges for the data management

– M. Franklin: UC Berkeley, large scale data management

– A. Halevy: Google Inc.usage of structured data in web search

– D. Maier: Portland State Universitycoined Datalog, data stream processing

1 / 24

Page 3: Reading Group: From Database to Dataspaces

Dataspaces and their

support systems as a

new agenda for

data management

Topic of the paper

2 / 24

Page 4: Reading Group: From Database to Dataspaces

The Problem: Data Management

• Loosely connected data sources• Information are available in various formats• Not always control over data

• Low-level data management challenges across heterogeneous collections

Search & querying

Integrity constraints

Naming convention

Tracking lineage

Availability & recovery

Access control

(meta) data evolution

Enforcing rules

3 / 24

Page 5: Reading Group: From Database to Dataspaces

The Solution

• Define space of data– Identifiable scope and control across the data and

underlying systems

• DataSpace Support Platforms (DSSPs)Offers a suite of interrelated services and guarantees over self managed data sources (no complete data control)

• Pay-as-you-go– Keyword search is bare minimum– More function and increased consistency as you add

work

4 / 24

Page 6: Reading Group: From Database to Dataspaces

DataSpaces: System

Page 7: Reading Group: From Database to Dataspaces

DataSpaces: Logical Components

• data co-existence approach (not data integration)• contains all information relevant to a particular organisation

regardless of the format and location• model a rich collection of relationships between data

repositories

Participants

• Individual data sources• RDBs, XML, text, services• Stored or streamed• Different query support• Support updates, read

only

Relations

• Any kind of relationship• A replica of B• C mapping for A and BBroader set of relations• E and F created

independently but cover same physical system

5 / 24

Page 8: Reading Group: From Database to Dataspaces

DataSpaces: Services

• Content heterogeneity requires multiple style of data access

• Cataloging data resources (source, name, size, creation data, location)

• Search as a primary mechanism to deal with large collections and unfamiliar data (Similarity search, ranking)– Search applicable to all content of the dataspace

regardless of data format (includes also meta data)

• Updates (major research)• Monitoring, event detection, support for complex

workflows

6 / 24

Page 9: Reading Group: From Database to Dataspaces

DataSpaces: System

Source: Franklin et al: From Databases to Dataspaces, SIGMOD Rec. 20057 / 24

Page 10: Reading Group: From Database to Dataspaces

DSSP: Catalog

• Contains information about all the participants• Like (Rate of change, query answering,

statistics, ownership, access, privacy policies, relationships

• Basic inventory• Identifier, type, creation date

• Answering presence, absence of data element

• Model Management environment on top of the catalog

8 / 24

Page 11: Reading Group: From Database to Dataspaces

DSSP: Search & Query

• Query everything• Query data item regardless of format• Keyword search

• Structured Query• common interfaces (mediated schema)• Over specific source• Peer-data management systems• Various query formats with mappings

• Meta-data queries• Result sources, timestamps, uncertainty• Source location and similarity queries

• Monitoring• Stateless or stateful

9 / 24

Page 12: Reading Group: From Database to Dataspaces

DSSP: Local Store and Index

• Create efficiently queryable association between participants

• Improve access to data sources with limited access patterns

• Data replication• Support of high availability and

recovery• Highly adaptive to heterogeneous

data• Identifies information across

participants• Robust for multiple real-world objects

10 / 24

Page 13: Reading Group: From Database to Dataspaces

DSSP: Discovery

• Locating participants

• Creation of relationships• Semi automatically

• Monitoring/Learning

11 / 24

Page 14: Reading Group: From Database to Dataspaces

DSSP: Enhancement

• Imbue participants with additional capabilities• Schema• Keyword search• Update monitoring

12 / 24

Page 15: Reading Group: From Database to Dataspaces

Research Challenges

• Data models and querying • Dataspace discovery• Reusing human attention• Dataspace storage and indexing• Correctness guarantees• Theoretical foundations

13 / 24

Page 16: Reading Group: From Database to Dataspaces

Data Models and Querying

• Heterogeneous data models and query languages

• Query reformulation (complex -> simple, vice versa)

File system-like queries

Keyword query (bag-of-words)

Path/containment queries (semi-structured)

Structured Queries (XML , RDF, OWL)

• Hierachy of query languages (pay-as-you-go

14 / 24

Page 17: Reading Group: From Database to Dataspaces

DataSpace Discovery

• Locate participants

• Semi-automatic tool for clustering and finding relationships between data sources

• Creation of more precise relationships

15 / 24

Page 18: Reading Group: From Database to Dataspaces

Reusing Human Attention

• Semantic integration evolves over time

• Humans the most scarce resource

• Machine learning

16 / 24

Page 19: Reading Group: From Database to Dataspaces

Storage & Indexing

• Heterogeneity of the index (different data formats)

• Ideally, uniformly indexing of all data items

• Dealing with multiple identifiers for the same real word thing

• Updates• Automated tuning, which data items to cache

which indexes to build ?

17 / 24

Page 20: Reading Group: From Database to Dataspaces

Correctness guarantees

• Quality of answers for accessing disparate data source– Involving updates

• Define levels of service guarantees

• Rethinking of fundamental data management principles

• Inherent tradeoffs in terms of quality, performance and control

18 / 24

Page 21: Reading Group: From Database to Dataspaces

Theoretical Foundations

• Formal understanding of different data models

• What queries are expressible over a dataspace?

• Detection of semantically equivalent but syntactically different query languages?

19 / 24

Page 22: Reading Group: From Database to Dataspaces

… as a major step towards a concrete implementation

of a dataspace support platform ?

Linked Data …

Use and reuse of HTTP URIs for real-world things

Provide useful (self-descriptive) content in RDF

20 / 24

Page 23: Reading Group: From Database to Dataspaces

Data Models and Querying

• Unified data model (RDF) • URIs as identifiers for real-world things• Linkage as relationships between sources and

entities• Data co-exists (everyone can say everything

about everybody)

Keyword query (bag-of-words)

SPARQL

21 / 24

Page 24: Reading Group: From Database to Dataspaces

Remaining Challenges

• Querying– Meta data queries

• Discovery– Link traversal, link creation– Reasoning, Graph Mining

• Storage & Indexing– Consolidation

• Correctness Guarantees• Reuse Human Attention • Updates / Monitoring• Data access/ privacy

22 / 24

Page 25: Reading Group: From Database to Dataspaces

LinkedData: DataSpaces

Source: Franklin et al: From Databases to Dataspaces, SIGMOD Rec. 200523 / 24

Page 26: Reading Group: From Database to Dataspaces

DSSPs Examples

• Search Engines (SWSE, Sindice, FalconS,…)– Keyword search, ranking– SPARQL

• Data access– RDB2RDF, RDFizers

• Discovery– SILK

• All-in-oneStructured Dynamics LLC

24 / 24

Page 27: Reading Group: From Database to Dataspaces

Questions? Opinions?