vldb ‘99 edinbugh, scotland capturing and querying multiple aspects of semistructured data curtis...
Post on 22-Dec-2015
213 views
TRANSCRIPT
VLDB ‘99 Edinbugh, Scotland
Capturing and Querying Multiple Aspects of Semistructured Data
Curtis Dyreson(formerly) Dept. of Comp. Sci., James Cook University
Michael Böhlen, Christian S. JensenNykredit Center for Database Research
Department of Computer Science, Aalborg Universitywww.cs.auc.dk/NDB
VLDB ‘99 Edinburgh, Scotland 2
Outline
• meta-data• representation
properties
• queries collapse match coalesce
• AUCQL• summary
VLDB ‘99 Edinburgh, Scotland 3
Meta-data
• database meta-data schema, security, transaction time
• web meta-data author, language, subject (Dublin Core), privacy
• web `meta-data’ standards RDF, P3P
• intrinsic informational, but also exclusional irregular ad-hoc
VLDB ‘99 Edinburgh, Scotland 4
Movie database
• movie data Bruce Willis stars in Colour of Night. Colour of Night premiered 1/Jul/1995.
• publication meta-data language English URL http://www.auc.dk publication date 2/Apr/1997 privacy/security ‘over 18’ publication history v1.2, modified 31/Jul/1998 subject Film, Suspense, Thriller
• queries Retrieve information published at Danish web sites. Find reviews published in the first week of the movie’s release. Get suspense films starring Bruce Willis.
VLDB ‘99 Edinburgh, Scotland 5
• database edges with labels nodes values
A semistructured database
&1
Bruce Willis
• meta-data schema security language URL subject time
&2
...
......
movie
name
star
ageOscars
VLDB ‘99 Edinburgh, Scotland 6
Properties
• property name: property value• default name property • A label is a set of properties.
Colour of Night
&1
title
Colour of Night
&1
name: title
Colour of Night
&1
name: title
URL: www.movie.com
VLDB ‘99 Edinburgh, Scotland 7
name: title
URL: www.movie.com
Label semistructure
Colour of Night
&1
title
www.movie.com
URL
name
URL Joe
authorname
• meta-meta-data: Joe authored the URL meta-data
VLDB ‘99 Edinburgh, Scotland 8
Properties (continued)
• required properties• missing properties
Colour of Night
&2
name: title
URL: www.movie.com
&1
name: movie
security! over 18
required
missing
the URL
property
missing
the security
property
VLDB ‘99 Edinburgh, Scotland 9
Property semantics
• transaction time example
Color of Night
&2
&3
Colour of Night
name: title
trans. time: [1/Aug/1998 - uc]
&1
name: reviewed
trans. time: [1/Sep/1999 - uc]
name: movie
name: title
trans. time: [2/Apr/1997 - 31/Jul/1998]
&1
&2
&3
Not a path!
VLDB ‘99 Edinburgh, Scotland 10
Using an existing model
• meta-data and data edges
• retrieve titles of reviewed movies
SELECT X.data
FROM reviewed R, R.movie M, M.title X
WHERE R.metadata.transtime INTERSECT M.metadata.transtime
AND M.metadata.transtime INTERSECT X.metadata.transtime
Colour of Night
&1
&2title
1/Aug/1998 - uc
data
&3
metadata
transtime
VLDB ‘99 Edinburgh, Scotland 11
Design flaws
• query must enforce semantics to avoid fictive results
SELECT X.data
FROM *. title X
• wildcard unintentionally accesses meta-data • no means of enforcing required properties • even correctly formed queries are brittle• user guesses at meta-data encoding
VLDB ‘99 Edinburgh, Scotland 12
Outline
• meta-data• representation
properties
• queries collapse match coalesce
• AUCQL• summary
VLDB ‘99 Edinburgh, Scotland 14
Collapse
• Collapse the information along a path to a single edge.
Color of Night
&1
Colour of Night
name: reviewed
trans. time: [1/Sep/1999 - uc]
&2
&3
name: title
trans. time: [2/Apr/1997 - 31/Jul/1998]
name: title
trans. time: [1/Aug/1998 - uc]
name: movie
?
?
VLDB ‘99 Edinburgh, Scotland 15
Collapse example
• PropertyCollapse for name is concatenation, for trans. time it is temporal intersection.
Color of Night
&1
Colour of Night
name: reviewed
trans. time: [1/Sep/1999 - uc]
&2
&3
name: title
trans. time: [2/Apr/1997 - 31/Jul/1998]
name: title
trans. time: [1/Aug/1998 - uc]
name: movie
name: reviewed.movie.title
trans. time: [1/Sep/1999 - uc]
name: reviewed.movie.title
trans. time: undefined
VLDB ‘99 Edinburgh, Scotland 16
Match (retrieval)
• find paths that meet some condition(s)• path regular expression
role - exact match, e.g., title regular expression operators (.|?*+)
(reviewed.movie)*.(title | name)
• only label matching changes labels are sets of properties required properties values may be from non-string domains, use PropertyMatch
VLDB ‘99 Edinburgh, Scotland 17
name! movie
trans. time: [now - now]
LabelMatch example
• name property - `movie’ compares to `movie’, continue• transaction time property - missing in target, continue• URL property - missing in query, continue• security property - required by database, no match!
name: movie
security! over 18
URL: www.movie.com
query role label in database?
??
VLDB ‘99 Edinburgh, Scotland 18
Retrieval queries
• retrieval queries replace only LabelMatch test validity of each path with Collapse
• cost LabelMatch now O(m) where m is number of properties Collapse is O(m*n) where n is length of path
• backwards compatible implicit name property LabelMatch is string comparison Collapse can be ignored
• both kinds of labels can coexist
VLDB ‘99 Edinburgh, Scotland 19
Additional operations
• Coalesce - compute a distributed property value
&1
&2
name: review
security! developer
trans. time: [1/Jul/1999 - 15/Jul/1999]
name: review
security! subscriber
trans. time: [16/Jul/1999 - uc]
trans. time: [1/Jul/1999 - uc]
VLDB ‘99 Edinburgh, Scotland 20
Meta-data modification
• framework is extensible• specify the semantics and domain.
• Or just use it, default semantics.
PropertyCollapse
PropertyMatch
PropertySlice
PropertyCoalesce
concatenation
=
semantic error
union
intersect
overlaps
intersect
coalesce
last
=
semantic error
semantic error
name trans. time default
domain strings time intervals objects
VLDB ‘99 Edinburgh, Scotland 21
Outline
• meta-data• representation
properties
• queries collapse match coalesce
• AUCQL• summary
VLDB ‘99 Edinburgh, Scotland 22
AUCQL
• Lorel SELECT statement derivative• example, retrieve all movie titles.
SELECT Title
FROM movie.title Title;
• AUCQL replaces role with unordered list of properties
SELECT Title
FROM (name! movie).(name! title) Title;
• default to required name property
VLDB ‘99 Edinburgh, Scotland 23
AUCQL (continued)
• can use any property, retrieve current movie titles
SELECT Title
FROM (name! movie, trans. time: [now - now]).
(name! title, trans.time: [now - now]) Title;
• can set properties for entire query
SET PROPERTY (trans. time: [now - now]);
SELECT Title
FROM movie.title Title;
VLDB ‘99 Edinburgh, Scotland 24
AUCQL (continued)
• can use MATCH, COALESCE, COLLAPSE• example, show names along all current paths in the
database
SELECT PROPERTY(name, COLLAPSE(All))
FROM (trans. time: [now - now])* All;
result, e.g, reviewed
reviewed.movie
reviewed.movie.title
…
VLDB ‘99 Edinburgh, Scotland 25
Summary
• meta-data• representation
labels with properties property semantics
• new query operations• extensible
• AUCQL website implemented research prototype free, downloadable, Unix environment http://www.cs.auc.dk/~curtis/AUCQL interactive query engine tutorials
VLDB ‘99 Edinburgh, Scotland 26
Related work
• Lorel (Abiteboul et al., JDL 97)• non-simple labels
Chlorel/DOEM (Chawathe et al., ICDE ‘98) Deterministic Paths (Buneman et al., ICDT ‘99)
• RDF query languages (QL ‘98) Query Service for RDF (Decker et al.) P3P (Cranor) RDF Query Specification (Malhotra and Sundaresan)
VLDB ‘99 Edinburgh, Scotland 27
Future work
• XML/RDF/DCD translation labels can share common properties no container termination property terminators?
• recursive semi-structured labels• heterogeneous meta-data
does security mean security? AUCQL has single property name space dynamic scoping of properties
• property semantics keyed to single property name
VLDB ‘99 Edinburgh, Scotland 28
Future work (continued)
• soundness and completeness incomplete with respect to graph operations minimal set of operations information preserving? property-specific basis design guidelines for property semantics
• implementation path indexing (when labels have properties) query optimization
VLDB ‘99 Edinburgh, Scotland 29
Collapse mechanics
• collapse pair-wise along path• LabelCollapse: Label X Label -> Label
for each property in both labels
if property is in both then apply PropertyCollapse
else add to result
• PropertyCollapse is a property-specific constructor
T X T --> T U {undefined}• required properties stay required• path is valid if no property is undefined