xml + databases = ? (dimacs workshop, 3/2000)

30
XML + Databases = ? (DIMACS Workshop, 3/2000) Mike Carey Exploratory Database Systems Department IBM Almaden Research Center [email protected]

Upload: farrah

Post on 20-Mar-2016

37 views

Category:

Documents


3 download

DESCRIPTION

XML + Databases = ? (DIMACS Workshop, 3/2000). Mike Carey Exploratory Database Systems Department IBM Almaden Research Center [email protected]. Plan for Today’s Talk. Thoughts on DB and web technologies The web and web “querying” Semistructured databases Object-relational databases - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: XML + Databases = ? (DIMACS Workshop, 3/2000)

XML + Databases = ?(DIMACS Workshop, 3/2000)

Mike CareyExploratory Database Systems Department

IBM Almaden Research [email protected]

Page 2: XML + Databases = ? (DIMACS Workshop, 3/2000)

Plan for Today’s TalkPlan for Today’s Talk Thoughts on DB and web technologies

– The web and web “querying”– Semistructured databases– Object-relational databases– XML and databases

XML/DB research at IBM Almaden– The XPERANTO project

• Motivation and approach• Whirlwind tour of the system

Page 3: XML + Databases = ? (DIMACS Workshop, 3/2000)

The Web is Great at Supporting URL-Based Sharing Ex: Online conference proceedings Web browsers have given us

– Universal file access (ftp++)– Universal document access (html)– Universal service access (forms)

What more could we navigational couch potatoes possibly want?– Universal platform for e-shopping!

Page 4: XML + Databases = ? (DIMACS Workshop, 3/2000)

The Web is Lousy at Supporting Parametric Searches Ex: Find all the used Musicman Sterling bass

guitars currently available for under $750 within a 50-mile radius of my San Jose home

This is hard for a number of reasons– Data buried in web pages, news groups,

classified ads, store sites, auction sites, …– No schema (no metal fish, please!)– No data types (miles, US$, instruments)– No regularity within/across (good!) sites

Page 5: XML + Databases = ? (DIMACS Workshop, 3/2000)

Aren’t Aren’t WeWe Supposed to be the Supposed to be the Experts on Data Management?Experts on Data Management? The DB community brought the world

– Data models, schemas, and views– Query languages, optimizers, fast joins– Scalable parallel servers– Federated database systems

What do we have in our bag of tricks?– Semistructured databases– Object-relational database systems

Page 6: XML + Databases = ? (DIMACS Workshop, 3/2000)

Is Semistructured Database Technology the Answer? Database characteristics

– Collections of [name, value] pairs or maybe [name, type, value] triples

– Collections typically set<any> or list<any> System characteristics

– “Typeloose” query languages– Indexes for nested, typeloose structures– Appropriate query processing techniques

Page 7: XML + Databases = ? (DIMACS Workshop, 3/2000)

Are Semistructured Databases the Answer? (2)

No, because schemas are critical for– Data readers

• What info is in a given collection?• Thus, what queries might make sense?

– Data writers• What should I call this piece of info?• Is it okay to put this kind of data here?

– Efficient/effective query processors• Indexing, statistics, ... (e.g., range queries)• Integration mappings (e.g., unit conversions)

Page 8: XML + Databases = ? (DIMACS Workshop, 3/2000)

Are Semistructured Databases the Answer? (3) It has some nice features, though

– Flexible, dynamic schemas• Forgiving w.r.t. variations and exceptions• Schema evolution is not a big deal

– Richer data modeling (vs. relational)• Nested structures, ordered collections

– More powerful query languages• Blurring of schema and data querying• Ordering, nesting, restructuring handled

Page 9: XML + Databases = ? (DIMACS Workshop, 3/2000)

Is Object-Relational Database Technology the Answer? Database characteristics

– Base types, user-defined structured types, inheritance, reference types, collections

– Collections are well-typed System characteristics

– Extended SQL-based query languages– Support for methods (fenced/unfenced)– Also triggers, LOBs, extensible indexes

Page 10: XML + Databases = ? (DIMACS Workshop, 3/2000)

Are Object-Relational Databases the Answer? (2) No, because most O-R DBMSs have

– Overly rigid schemas• Every instance is of one (known) type• Evolving a type can be a major burden• Distributed type management is hard

– Crufty old storage managers• Ragged or sparse records poorly supported

– Insufficient power in extended SQL• Prehistoric assumptions get in the way• Weak on restructuring, schema-querying

Page 11: XML + Databases = ? (DIMACS Workshop, 3/2000)

Is XML the Answer?(Yes!!Yes!! ...What Was the Question Again?) Structured documents (for the web)

<book> <booktitle> Tables Are The Answer </booktitle> <author id = “cdate”> <name> <firstname> Chris </firstname> <lastname> Date </lastname> </name> <address> <city> Saratoga </city> <state> CA </state> </address> </author></book>

Page 12: XML + Databases = ? (DIMACS Workshop, 3/2000)

Is XML the Answer? (2) W3C’s XML Schema working group

– Typed elements, attributes, documents– Simple types and complex types– Derived types (extension, restriction)– Facets, anonymous types, groups, …– Uniqueness, keys and key references

W3C’s XML Query working group– XML-QL, Xpath, XQL, XSL/T, XSQL, …– Recommendation due in late 2000 (?)

Page 13: XML + Databases = ? (DIMACS Workshop, 3/2000)

Is XML the Answer? (3) XML Schema might help because

– XML has achieved a huge mindshare for data interchange on the web

– DTD standardization is happening for documents within vertical industries, and XML Schemas should take over

– When finished, XML Schema should be a widely used schema description tool

• Similar to O-R schemas, but with more flexibility (and web-based sex appeal)

Page 14: XML + Databases = ? (DIMACS Workshop, 3/2000)

Some Useful XML+DB Topics Publish documents with XML Schemas from

O-R databases– B2B e-commerce messages– B2C comparison shopping (if permitted!)– Robust O-R DB-resident web sites with XML for

page content generation Use XML Schema as the central data model

for data integration middleware– I.e., web information integration

Page 15: XML + Databases = ? (DIMACS Workshop, 3/2000)

Useful XML+DB Topics (2) Build a “native” XML Repository on top of

an O-R DBMS– Map from XML Schema model to O-R DBMS

modeling constructs– Map from XML queries to O-R queries

(including tag variables and loose typing)– Thereby provide XML document storage

management with industrial-strength robustness, scalability, and performance

Page 16: XML + Databases = ? (DIMACS Workshop, 3/2000)

Useful XML+DB Topics (3) Evolve XML-QL into a complete web

data manipulation language – Typing a la XML Schema– Ordered/unordered collections– XPath-inspired expressions– Easier grouping and aggregation– Updates (insert/delete, modify)– Etc.

Page 17: XML + Databases = ? (DIMACS Workshop, 3/2000)

The XPERANTO ProjectThe XPERANTO Project Middleware for publishing O-R (or plain

relational) DB content on the web– Provides a virtual XML document view– Based on a “pure XML” approach– Using XML-QL (as W3C placeholder)

Born at Almaden in summer of 1999– Mike Carey, Dana Florescu, Zack Ives, Ying Lu,

Jai Shanmugasundaram, Beau Shekita, Subbu Subramanian

Page 18: XML + Databases = ? (DIMACS Workshop, 3/2000)

The XPERANTO Belief System Databases contain, and will continue to

contain, the world’s “data jewels”– Transactional data (RDBMS)– Important multimedia assets (ORDBMS)

XML application developers of the future may not love SQL like we do– View databases as default XML documents– Let them define appropriate (query-able)

views of these XML documents

Page 19: XML + Databases = ? (DIMACS Workshop, 3/2000)

XPERANTO Architecture

Views

XML Schema

O-R Database

SQL Query Processor

Stored Tables

System Catalog

Metadata Services

View Services

Type & Table Services

Query Translation

XQGM

XML-QL Parser

XQGM

Query Rewrite

SQL Translation

XML SchemaGenerator

Catalog Info XML Tagger

Data Tuples

Table & Type Info

SQL Queries

Page 20: XML + Databases = ? (DIMACS Workshop, 3/2000)

XPERANTO Components XML-QL Parser

– Neutral query representation (XQGM) Query Rewrite

– View composition and other rewrites SQL Translation

– Produce SQL query(s) to get the required data from the underlying DBMS

XML Tagger– Tag and structure the tabular results

Page 21: XML + Databases = ? (DIMACS Workshop, 3/2000)

XPERANTO Components View Services

– Repository for XML view definitions Type & Table Services

– Interface (and cache) for DB catalog info XML Schema Generator

– Give DB catalog info in XML Schema form for default views

– Infer XML Schema info for queries and non-default view definitions

Page 22: XML + Databases = ? (DIMACS Workshop, 3/2000)

Consider a Simple O-R SchemaCreate Table book AS (bookID CHAR(30), name VARCHAR(255), publisher VARCHAR(30))

Create Table publisher AS (name VARCHAR(30), address VARCHAR(255))

Create Type author_type AS (bookID CHAR(30), first VARCHAR(30), last VARCHAR(30))

Create Table author OF author_type (REF IS ssn USER GENERATED)

Page 23: XML + Databases = ? (DIMACS Workshop, 3/2000)

Part of the Default XML View<simpleType name=”string255” source=”string”> <maxLength value=”255” /></simpleType>

<simpleType name=”string30” source=”string”> <maxLength value=”30” /></simpleType>

<complexType name=“bookTupleType”> <element name=“bookID” type=“string30” /> <element name=“name” type=“string255” /> <element name=“publisher” type=“string30” /></complexType>

<complexType name=“bookSetType”> <element name=“bookTuple” type=“bookTupleType” maxOccurs=“*” /></complexType>

<element name=“book” type=“bookSetType” />

.

.

.

Page 24: XML + Databases = ? (DIMACS Workshop, 3/2000)

XPERANTO’s Default Views XPERANTO generates default O-R to XML

Schema mappings– Each DB shown as an XML file– Subtyping handled via XML Schema’s refinement

facilities– OIDs and references become ids/idrefs

“Don’t use this at home!”– Application developers are expected to define the

real view(s) using XML-QL

Page 25: XML + Databases = ? (DIMACS Workshop, 3/2000)

Creating a Better XML ViewWHERE <library.book.bookTuple> <bookID> $bid </> <name> $name </> <publisher> $bpub </> </> IN “db2:xml:books/library”, $bpub = “Kluwer”CONSTRUCT <book id=$bid> <name> $bname </> {WHERE <library.publisher.publisherTuple> <name> $bpub </> <address> $addr </> </> IN “db2:xml:books/library” CONSTRUCT <publisher> <address> $addr </> </>} {WHERE <library.author.authorTuple> <bookID> $bid </> <first> $fname </> <last> $lname </> </> IN “db2:xml:books/library” CONSTRUCT <author first=$fname last=$lname/>} </>

.

.

.

Page 26: XML + Databases = ? (DIMACS Workshop, 3/2000)

XPERANTO Query Rewrite

XML-QL queries first translated into XQGM representation– Neutral, well-poised for more features– Easier to go from XML-QL to SQL– Borrow rewrites from DB2 UDB engine

XQGM is an extension of DB2’s QGM– XML data type for “columns”– Set of XML-specific functions

Page 27: XML + Databases = ? (DIMACS Workshop, 3/2000)

SQL Generation and XML Document Tagging/Structuring Sorted Outer Union queries are used to obtain

the data– Fetch the data in one query that brings it back in

the appropriate order– Tag and nest it to create XML document

Advantages of this approach– Shown to be stable as well as fast– Simple (linear-space) tagging possible

• Just watch for nesting-related changes

Page 28: XML + Databases = ? (DIMACS Workshop, 3/2000)

Outer Union Query ExampleWITH OuterUnion (type, bookID, bookName, pubName, pubAddr, authFirst, authLast) AS( SELECT ‘0’, b.bookID, b.name, NULL, NULL, NULL, NULL FROM book b WHERE b.publisher = “Kluwer”UNION ALL SELECT ‘1’, b.bookID, NULL, p.name, p.address, NULL, NULL FROM book b, publisher p WHERE b.publisher = “Kluwer” and b.publisher = p.nameUNION ALL SELECT ‘2’, b.bookID, NULL, NULL, NULL, a.first, a.last FROM book b, author a WHERE b.publisher = “Kluwer” and b.bookID = a.bookID)SELECT * FROM OuterUnion ORDER BY bookID

Page 29: XML + Databases = ? (DIMACS Workshop, 3/2000)

XPERANTO Project Summary Goal is to publish O-R data in XML form

– Default XML views– XML-QL for defining useful views– “Look Ma, no SQL!”

Currently (re)building our prototype– View composition is our first stop– Updates in addition to queries– Queries over both data and metadata– Other needs for XML web sites...?

Page 30: XML + Databases = ? (DIMACS Workshop, 3/2000)

A Few Closing RemarksA Few Closing Remarks DB community must ensure that the web

will support real queries…!– XML Schema and XML Query standards

need ongoing input from DB researchers– Large-scale technologies needed for XML

indexing, caching, querying, etc. DB community should also work on

important underlying technologies– Publishing XML both from and to RDBMSs

and ORDBMSs, for example!