using xml to view relational data
Post on 07-Jan-2016
49 Views
Preview:
DESCRIPTION
TRANSCRIPT
Using XML to View Relational Data
Xin He
AMPS SeminarNovember 30, 2001
Outline
Introduction
An XML Primer
Several related systems and results
An XML middle-ware system: SilkRoute
XML and Spreadsheets
XML and OLAP
Introduction
XML -- Basic ideas are simple but potential impact is significant Easy to read Simple and flexible Easy to extract useful information
Research opportunities XML brings to Database Management XML will “turn the Web into a database” Thus general Database Management issues arise for
XML
Introduction
Why using XML to view Relational Data
XML is emerging as the standard data-exchange format between applications on the Web
While most existing data is stored in relational databases This scenario is common This scenario is challenging
Relational data is flat, normalized, its schema is often proprietary XML data is nested, unnormalized, its schema is public So, the mapping is inherently complex and maybe difficult to
compute efficiently
An XML(eXtensible Markup Language) Primer
Example XML file (From Apache Tomcat 4.0 configuration file):
1. <Server>2. <Service name=“Tomcat-Standalone”>3. <Connector className=“http.HttpConnector” port=“80”>4. <Connector className=“http.HttpConnector” port=“8443”>5. <Factory className=“SSLServerSocketFactory”>6. </Connector>7. <Engine>8. <parameter>9. <name>mail.smtp.host</name>10. <value>localhost</value>11. </parameter>12. </Engine>13. </Service>14. <Service name=“Tomcat-Apache”>15. <Connector className=“warp.WarpConnector” port=“8008”>16. </Service>
17.</Server>
An XML PrimerExample DTD(Document Type Definitions)
1. <?xml encoding="US-ASCII"?>
2. <!ELEMENT Server (Service*)>
3. <!ELEMENT Service (Connector*, Engine*)>
4. <!ATTLIST Service name ID>
5. <!ELEMENT Connector (Factory?)>
6. <!ATTLIST Connector className #REQUIRED
7. port #REQUIRED>
8. <!ELEMENT Engine (Parameter)>
9. <!ELEMENT Parameter (name, value)>
10. <!ELEMENT name (#PCDATA)>
11. <!ELEMENT value (#PCDATA)>
An XML Primer
XML is a method for putting structured data in a text file
XML looks a bit like HTML but isn't HTML
XML is a family of technologies http://www.w3c.org
XML is new, but not that new
XML is license-free, platform-independent and well-supported
Several Related Systems and Results
An XML middle-ware system in AT&T research labs: SilkRoute Automate the conversion of realtional data into XML
A new paper published to optimize the query processing algorithm
IBM research center: Efficiently Publishing Relational Data as XML Documents Language specification is based on SQL with minor extension
So standard APIs like ODBC can be used
Query performance is worse than the revised SilkRoute
Several Related Systems and Results
UCSD: MIX--Mediation of Information using XML DTD inference Concentrate on Information Integration A more complicated architecture
University of Wisconsin-Madison: Relational Databases for Querying XML Documents Objective is different, but part of techniques is related Limitations and opportunities: some valuable points
SilkRoute Introduction
Public DTDs: Numerious industries are working on it http://www.oasis-open.org/cover
Construct XML views which conform to the public DTDs from vast stores of relational data automatically
The system is general, dynamic, and efficient
Motivating ExampleA simple example from electronic commerce: Suppliers provide product information to resellers
For mutual benefit, they have agreed on a particular DTD
Supplier's business data is organized according to a relational schema
Supplier: convert its relational data into an XML view conforms to the DTD and make the XML view available to resellers
Assume supplier wants to export a subset of its inventory(e.g. only its winter-outerwear stock)
Resellers: access that data by formulating queries over the XML view
Reseller is typically only interested in a small subset of the info(e.g. sale price less than half of the retail price)
Relational schemas differ from supplier to supplier
Architecture
SilkRoute’s Architecture
The View Query: RXL
Full power on both sides Joins, selection conditions, aggregates, and nested
queries Generate XML data with arbitrary levels of nesting
RXL has three powerful features which make it possible to create arbitrary complex XML structures Nested queries, Skolem functions, and Block structure
1. construct2. <supplier ID=Supp()>3. <company ID=Comp()>”Acme Clothing”</company>4. {5. from Clothing $c6. where $c.category = “outerwear”7. construct 8. <product ID=Prod($c.pid)>9. <name ID=Name($c.pid,$c.item)>$c.item</name>10. <category ID=Cat($c.pid,$c.category)>$c.category</category>11. <retail ID=Retail($c.pid,$c.price)>$c.price</retail>12. {from SalePrice $s13. where $s.pid = $c.pid14. construct15. <sale ID=Sale($c.pid,$s.pid,$s.price)>$s.price</retail>16. }17. {from Problems $p18. where $p.pid = $c.pid19. construct20. <report code=$p.code ID=Prob($c.pid,$p.pid,$p.code,$p.comments)>21. $p.comments22. </report>23. }24. </product>25. } 26. </supplier>
RXL (Relational to XML transformation Language) view query ( V )
The User Query: XML-QL1. Construct2. <results>3. { where <supplier>4. <company>$company</company>5. <product>6. <name>$name</name>7. <retail>$retail</retail>8. <sale>$sale</sale>9. </product>10. </supplier> in http://acme.com/products.xml,11. $sale < 0.5 * $retail12. construct13. <result ID=Result($Company)>14. <supplier>$company</supplier>15. <name>$name</name>16. </result>17. }18. </results>
XML-QL user query (U)
The Query Composer
The composed RXL query is equivalent to the user query evaluated on the materialized view
Composed query often contain constraints on scalar values that can be evaluated using indexes in the relational database
The Query Composer
1. Construct
2. <results>
3. { from Clothing $c, SalePrice $sfrom Clothing $c, SalePrice $s
4.4. where $c.category = “outerwear”,where $c.category = “outerwear”,
5.5. $c.pid = $s.pid,$c.pid = $s.pid,
6. $s.price < 0.5 * $c.retail$s.price < 0.5 * $c.retail
7. construct
8. <result ID=Result(<result ID=Result( “Acme Clothing” )>)>
9. <supplier><supplier> ”Acme Clothing” </supplier></supplier>
10. <name<name ID=Name($c.pid, $c.item)>$c.item </name></name>
11. }
12. </results>
Composed RXL query ( C )
Composition Algorithm
Problem Statement:
* C = UοV * XD = V ( RDB )
* A = U ( XD ) = U ( V ( RDB ) )
* C ( RDB ) = A = U ( V ( RDB ) )
= ( UοV ) ( RDB )
Composition Algorithm
Key Idea: Match U's pattern on V directly, without constructing XD
First step: match U's pattern with V's template (Next slide shows V again with U's patterns matched in it highlighted)
Second step: Construct C C's construct clause is the same as U's construct clause, with
variable substitutions C's from and where clauses consist of all the "relevant" from and
where in V and all the where filters in U, with variable renaming
1. construct2.2. <supplier ID=Supp()><supplier ID=Supp()>3. <company ID=Comp()>”Acme Clothing”</company><company ID=Comp()>”Acme Clothing”</company>4. {5. from Clothing $c6. where $c.category = “outerwear”7. construct 8. <product ID=Prod($c.pid)><product ID=Prod($c.pid)>9.9. <name ID=Name($c.pid,$c.item)>$c.item</name><name ID=Name($c.pid,$c.item)>$c.item</name>10. <category ID=Cat($c.pid,$c.category)>$c.category</category>11. <retail ID=Retail($c.pid,$c.price)>$c.price</retail><retail ID=Retail($c.pid,$c.price)>$c.price</retail>12. {from SalePrice $s13. where $s.pid = $c.pid14. construct15. <sale ID=Sale($c.pid,$s.pid,$s.price)>$s.price</retail><sale ID=Sale($c.pid,$s.pid,$s.price)>$s.price</retail>16. }17. {from Problems $p18. where $p.pid = $c.pid19. construct20. <report code=$p.code ID=Prob($c.pid,$p.pid,$p.code,$p.comments)>21. $p.comments22. </report>23. }24. </product>25. } 26. </supplier>
RXL view query ( V ) with patterns from XML-QL query in REDRED
Composition Algorithm
Diagram of Query Composition
Translator and XML Generator
The translator takes an RXL query and decomposes it into one or more SQL queries and an XML template
Initial SilkRoute uses full partition strategy The IBM research paper: sorted, outer union strategy The 2001 SIGMOD paper gives an optimal algorithm
XML generator merges the result tuples into XML document in a single pass
Other Scenarios
Minor changes to the information flow permit other
scenarios
Export the entire database as one, large XML document by materializing the view query
The result of query composition can be kept virtual for later composition with other user queries
Alternative Approaches Materialized XML view
Precompute or compute on demand Feasible when the XML view is small and the
applicaton needs to load the entire view in memory Data may become stale
Use a native XML database engine Stanford DB group: Lore Project One can materialize an XML view using SilkRoute
and store the result in an XML engine Avoid the cost of query composition Performance is unlikely to compete with SQL engine
anytime soon Can't guarantee data freshness and incur a high space cost
XML and SpreadsheetsXML support in Microsoft Excel 2002 for Office XP
"..these new features mean that Microsoft Excel is set to play an important role in any organization's application environment."
Bi-direction transformation Excel can recognize and open XML documents including XSL
processing XML flattening
Any Excel Spreadsheet can be saved as an XML file while preserving "the new XML Spreadsheet file format"
XML and Spreadsheets
Well, Microsoft enable Spreadsheets to manipulate
XML documents
We can enable Spreadsheets to manipulate
relational data using XML view!
XML and SpreadsheetsCan we execute Spreadsheet style processing directly on
XML files ?
XML is hierarchical, and it's unnormalized, which is exactly what people would like to see in Spreadsheets
Given system like SilkRoute
Can we define Spreadsheet function map on XML view? Think the function executions as XML view queries
Define one function on the other is similar as define a new XML view from an existing composed view
It will also arise challenge in the view generation systems
XML and OLAP (OnLine Analytical Processing)
Physically integrating unexpected data into OLAP systems is time-consuming
Logical integration is the better choice XML’s increasing use in data-exchange suggests that the
required data can be available through XML views Possibilities:
Reference external XML data in OLAP queries XML data can be presented along with dimensional data in the result
of an OLAP query Use XML data for selection and grouping
Microsoft and Hyperion published "Open XML for Analysis
Specification" in April 2001
top related