vldb ‘99 edinbugh, scotland capturing and querying multiple aspects of semistructured data curtis...

29
VLDB ‘99 Edinbugh, Scotland Capturing and Querying Multiple Aspects of Semistructured Data Curtis Dyreson (formerly) Dept. of Comp. Sci., James Cook University Michael Böhlen, Christian S. Jensen Nykredit Center for Database Research Department of Computer Science, Aalborg University www.cs.auc.dk/NDB

Post on 22-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

VLDB ‘99 Edinbugh, Scotland

Capturing and Querying Multiple Aspects of Semistructured Data

Curtis Dyreson(formerly) Dept. of Comp. Sci., James Cook University

Michael Böhlen, Christian S. JensenNykredit Center for Database Research

Department of Computer Science, Aalborg Universitywww.cs.auc.dk/NDB

VLDB ‘99 Edinburgh, Scotland 2

Outline

• meta-data• representation

properties

• queries collapse match coalesce

• AUCQL• summary

VLDB ‘99 Edinburgh, Scotland 3

Meta-data

• database meta-data schema, security, transaction time

• web meta-data author, language, subject (Dublin Core), privacy

• web `meta-data’ standards RDF, P3P

• intrinsic informational, but also exclusional irregular ad-hoc

VLDB ‘99 Edinburgh, Scotland 4

Movie database

• movie data Bruce Willis stars in Colour of Night. Colour of Night premiered 1/Jul/1995.

• publication meta-data language English URL http://www.auc.dk publication date 2/Apr/1997 privacy/security ‘over 18’ publication history v1.2, modified 31/Jul/1998 subject Film, Suspense, Thriller

• queries Retrieve information published at Danish web sites. Find reviews published in the first week of the movie’s release. Get suspense films starring Bruce Willis.

VLDB ‘99 Edinburgh, Scotland 5

• database edges with labels nodes values

A semistructured database

&1

Bruce Willis

• meta-data schema security language URL subject time

&2

...

......

movie

name

star

ageOscars

VLDB ‘99 Edinburgh, Scotland 6

Properties

• property name: property value• default name property • A label is a set of properties.

Colour of Night

&1

title

Colour of Night

&1

name: title

Colour of Night

&1

name: title

URL: www.movie.com

VLDB ‘99 Edinburgh, Scotland 7

name: title

URL: www.movie.com

Label semistructure

Colour of Night

&1

title

www.movie.com

URL

name

URL Joe

authorname

• meta-meta-data: Joe authored the URL meta-data

VLDB ‘99 Edinburgh, Scotland 8

Properties (continued)

• required properties• missing properties

Colour of Night

&2

name: title

URL: www.movie.com

&1

name: movie

security! over 18

required

missing

the URL

property

missing

the security

property

VLDB ‘99 Edinburgh, Scotland 9

Property semantics

• transaction time example

Color of Night

&2

&3

Colour of Night

name: title

trans. time: [1/Aug/1998 - uc]

&1

name: reviewed

trans. time: [1/Sep/1999 - uc]

name: movie

name: title

trans. time: [2/Apr/1997 - 31/Jul/1998]

&1

&2

&3

Not a path!

VLDB ‘99 Edinburgh, Scotland 10

Using an existing model

• meta-data and data edges

• retrieve titles of reviewed movies

SELECT X.data

FROM reviewed R, R.movie M, M.title X

WHERE R.metadata.transtime INTERSECT M.metadata.transtime

AND M.metadata.transtime INTERSECT X.metadata.transtime

Colour of Night

&1

&2title

1/Aug/1998 - uc

data

&3

metadata

transtime

VLDB ‘99 Edinburgh, Scotland 11

Design flaws

• query must enforce semantics to avoid fictive results

SELECT X.data

FROM *. title X

• wildcard unintentionally accesses meta-data • no means of enforcing required properties • even correctly formed queries are brittle• user guesses at meta-data encoding

VLDB ‘99 Edinburgh, Scotland 12

Outline

• meta-data• representation

properties

• queries collapse match coalesce

• AUCQL• summary

VLDB ‘99 Edinburgh, Scotland 13

Shortest paths

5

17

3

12

Coalesce

min

22

15

22

15

sumCollapse

VLDB ‘99 Edinburgh, Scotland 14

Collapse

• Collapse the information along a path to a single edge.

Color of Night

&1

Colour of Night

name: reviewed

trans. time: [1/Sep/1999 - uc]

&2

&3

name: title

trans. time: [2/Apr/1997 - 31/Jul/1998]

name: title

trans. time: [1/Aug/1998 - uc]

name: movie

?

?

VLDB ‘99 Edinburgh, Scotland 15

Collapse example

• PropertyCollapse for name is concatenation, for trans. time it is temporal intersection.

Color of Night

&1

Colour of Night

name: reviewed

trans. time: [1/Sep/1999 - uc]

&2

&3

name: title

trans. time: [2/Apr/1997 - 31/Jul/1998]

name: title

trans. time: [1/Aug/1998 - uc]

name: movie

name: reviewed.movie.title

trans. time: [1/Sep/1999 - uc]

name: reviewed.movie.title

trans. time: undefined

VLDB ‘99 Edinburgh, Scotland 16

Match (retrieval)

• find paths that meet some condition(s)• path regular expression

role - exact match, e.g., title regular expression operators (.|?*+)

(reviewed.movie)*.(title | name)

• only label matching changes labels are sets of properties required properties values may be from non-string domains, use PropertyMatch

VLDB ‘99 Edinburgh, Scotland 17

name! movie

trans. time: [now - now]

LabelMatch example

• name property - `movie’ compares to `movie’, continue• transaction time property - missing in target, continue• URL property - missing in query, continue• security property - required by database, no match!

name: movie

security! over 18

URL: www.movie.com

query role label in database?

??

VLDB ‘99 Edinburgh, Scotland 18

Retrieval queries

• retrieval queries replace only LabelMatch test validity of each path with Collapse

• cost LabelMatch now O(m) where m is number of properties Collapse is O(m*n) where n is length of path

• backwards compatible implicit name property LabelMatch is string comparison Collapse can be ignored

• both kinds of labels can coexist

VLDB ‘99 Edinburgh, Scotland 19

Additional operations

• Coalesce - compute a distributed property value

&1

&2

name: review

security! developer

trans. time: [1/Jul/1999 - 15/Jul/1999]

name: review

security! subscriber

trans. time: [16/Jul/1999 - uc]

trans. time: [1/Jul/1999 - uc]

VLDB ‘99 Edinburgh, Scotland 20

Meta-data modification

• framework is extensible• specify the semantics and domain.

• Or just use it, default semantics.

PropertyCollapse

PropertyMatch

PropertySlice

PropertyCoalesce

concatenation

=

semantic error

union

intersect

overlaps

intersect

coalesce

last

=

semantic error

semantic error

name trans. time default

domain strings time intervals objects

VLDB ‘99 Edinburgh, Scotland 21

Outline

• meta-data• representation

properties

• queries collapse match coalesce

• AUCQL• summary

VLDB ‘99 Edinburgh, Scotland 22

AUCQL

• Lorel SELECT statement derivative• example, retrieve all movie titles.

SELECT Title

FROM movie.title Title;

• AUCQL replaces role with unordered list of properties

SELECT Title

FROM (name! movie).(name! title) Title;

• default to required name property

VLDB ‘99 Edinburgh, Scotland 23

AUCQL (continued)

• can use any property, retrieve current movie titles

SELECT Title

FROM (name! movie, trans. time: [now - now]).

(name! title, trans.time: [now - now]) Title;

• can set properties for entire query

SET PROPERTY (trans. time: [now - now]);

SELECT Title

FROM movie.title Title;

VLDB ‘99 Edinburgh, Scotland 24

AUCQL (continued)

• can use MATCH, COALESCE, COLLAPSE• example, show names along all current paths in the

database

SELECT PROPERTY(name, COLLAPSE(All))

FROM (trans. time: [now - now])* All;

result, e.g, reviewed

reviewed.movie

reviewed.movie.title

VLDB ‘99 Edinburgh, Scotland 25

Summary

• meta-data• representation

labels with properties property semantics

• new query operations• extensible

• AUCQL website implemented research prototype free, downloadable, Unix environment http://www.cs.auc.dk/~curtis/AUCQL interactive query engine tutorials

VLDB ‘99 Edinburgh, Scotland 26

Related work

• Lorel (Abiteboul et al., JDL 97)• non-simple labels

Chlorel/DOEM (Chawathe et al., ICDE ‘98) Deterministic Paths (Buneman et al., ICDT ‘99)

• RDF query languages (QL ‘98) Query Service for RDF (Decker et al.) P3P (Cranor) RDF Query Specification (Malhotra and Sundaresan)

VLDB ‘99 Edinburgh, Scotland 27

Future work

• XML/RDF/DCD translation labels can share common properties no container termination property terminators?

• recursive semi-structured labels• heterogeneous meta-data

does security mean security? AUCQL has single property name space dynamic scoping of properties

• property semantics keyed to single property name

VLDB ‘99 Edinburgh, Scotland 28

Future work (continued)

• soundness and completeness incomplete with respect to graph operations minimal set of operations information preserving? property-specific basis design guidelines for property semantics

• implementation path indexing (when labels have properties) query optimization

VLDB ‘99 Edinburgh, Scotland 29

Collapse mechanics

• collapse pair-wise along path• LabelCollapse: Label X Label -> Label

for each property in both labels

if property is in both then apply PropertyCollapse

else add to result

• PropertyCollapse is a property-specific constructor

T X T --> T U {undefined}• required properties stay required• path is valid if no property is undefined