query processing with xml cse 350 – advanced database topics jeffrey r. ellis
TRANSCRIPT
![Page 1: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/1.jpg)
Query Processing with XML
CSE 350 – Advanced Database Topics
Jeffrey R. Ellis
![Page 2: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/2.jpg)
Query Processing Topics
Why? Java and Other Programming Languages XPath/XSLT XQuery (W3C-sponsored Query Language) Current Research
– Other Query Languages– XISS (XML Indexing and Storage System)
![Page 3: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/3.jpg)
FIRST – Distinction between XML and HTML/Web Technologies
XML spotlight is analogous to Java– Immediate benefits applied to World Wide Web– Long-range, more exciting benefits in applications
XML IS NOT AN HTML REPLACEMENT– HTML marks pages up for presentation on the web– XML marks text for semantic information purposes
XML can encode HTML pages, but HTML works well on the Web
![Page 4: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/4.jpg)
XML Data Storage
XML Documents– Data is delineated semantically– Schemas/DTDs control contents of elements– Semi-structured attitude allows flexibility– Text is human-readable and machine-parsable– Open standards work with common tools– File data storage allows for easy sharing– Can queries control access to data?
![Page 5: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/5.jpg)
Traditional Database Storage
Databases– Data is delineated semantically– Schemas control contents of rows– No flexibility from semi-structured storage– Data is not human-readable, but only machine-
parsable– Proprietary standards prevent interoperability– Proprietary storage prevents data sharing– Queries control access to data
![Page 6: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/6.jpg)
XML for Query Processing
If we can get efficient query processing, XML document storage provides many benefits over traditional database storage.
Sample application– Employee database document– XML Schema assumed to exist– Employee information queried as per standard HR
processing
![Page 7: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/7.jpg)
<?xml version="1.0"?><!DOCTYPE employees SYSTEM "employee.xsd"><employees> <emp gender='m'> <name> <last>Bissell</last> <first>Brian</first> </name> <position>IT Specialist</position> <salary>35,000</salary> <location>CT</location> </emp> <emp gender='m'> <name> <last>Pham</last> <first>Hung</first> <mi>Q</mi> </name> <position>Senior IT Specialist</position> <salary>45,000</salary> <location>CT</location> </emp> …</employees>
![Page 8: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/8.jpg)
Tree Structure of XML Document
Remember that XML documents are trees
emp
gender name position salary location
last first mi
![Page 9: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/9.jpg)
Query Processing – Programming Languages
XML Documents are flat files Any language with file I/O can read XML
document Any language with string parsing capabilities
can use XML data Query processing done through language
syntax “Obvious” result different from traditional
databases
![Page 10: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/10.jpg)
Query Processing – Programming Languages
Strategy– Basic File I/O through language– Basic String matching to identify elements– Processing possible, but not necessarily efficient
Languages have gathered XML processing tools in libraries– xerces – Apache library for Java and C++
Two methods for parsing XML data– DOM– SAX
![Page 11: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/11.jpg)
DOM
Document Object Model Defined by W3C for XML, HTML, and
stylesheets Provides an hierarchical, object-view of the
document DOMParser parses through file, then provides
access to nodes Key: Every item in XML document is a node
![Page 12: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/12.jpg)
DOM Example
Node (Element)name=“emp”attribute1child1
Node (Attr)name=“gender”value=“m”parent
Node (Element)name=“name”parentchild1
Node (Element)name=“last”parentchild1
Node (Text)value=“Bissell”parent
![Page 13: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/13.jpg)
SAX
Simple API for XML Defined by XML-DEV mailing list Provides an event-driven processing of the
document XMLReader parses through file and activates
different methods and functions based on the elements retrieved
Key: Methods are defined in interface, implemented in user code
![Page 14: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/14.jpg)
DOM versus SAX
SAX is primarily Java-based; DOM defined for most languages
DOM requires storage of entire document in memory; SAX processes as it reads
DOM mirrors a document that can be revisited; suited for document processing
SAX mirrors object lifecycles; suited for data processing
![Page 15: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/15.jpg)
Query Processing - XPath/XSLT
Standard XML technologies XPath and XSLT provide a ready-made querying infrastructure
XPath identifies the location of various document elements
XSL Stylesheets provide methods for tranforming data from one format to another
Combining XPath and XSLT provides easy generation of result sets based on queries
![Page 16: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/16.jpg)
XPath
Provides element, value, and attribute identification
employees/emp/name/first = “Brian”, “Hung”, “Sara”, “Brian”
//salary = “35,000”, “40,000”, “35,000”, “60,000”
count(/employees/emp) = 4
//mi = “Q”
![Page 17: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/17.jpg)
XSLT
Stylesheet transforms data from one form into another
<xsl:template match=“name”>
<xsl:value-of select=“first”/>
<xsl:value-of select=“last”/>
</xsl:template>
= Brian Bissell, Hung Pham, Sara Menillo, Brian Chicos
![Page 18: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/18.jpg)
Combine XPath and XSLT for Queries
Query: Find the last name and position of each employee named Brian
<xsl:template match='employees'> <xsl:for-each select='emp'> <xsl:if test='name/first="Brian"'> <xsl:value-of select='name/last'/> <xsl:text>:</xsl:text> <xsl:value-of select='position'/> <xsl:text>; </xsl:text> </xsl:if> </xsl:for-each> </xsl:template>
![Page 19: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/19.jpg)
Combine XPath and XSLT for Queries
Query: Find the average salary of all non-managers
<xsl:template match='employees'>
<xsl:variable name='running_sum'>
<xsl:value-of select='sum(emp/salary[../position!="Manager"])'/>
</xsl:variable>
<xsl:variable name='running_count'>
<xsl:value-of select='count(emp[position!="Manager"])'/>
</xsl:variable>
<xsl:value-of select='$running_sum div $running_count'/>
</xsl:template>
![Page 20: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/20.jpg)
Results XSLT/XPath
Many SQL queries can be accomplished– XPath provides element (data) access– XPath provides basic functions (e.g., sum() )– XPath provides WHERE functionality– XSLT provides SELECT functionality– XSLT provides ORDER BY functionality (sort)– XSLT provides result set formatting– UNION functionality provided ..?
![Page 21: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/21.jpg)
Querying with XPath and XSLT
Important questions– Is it sufficient?– Is it efficient?– Is there a better way?
XML community has need to design a full query language
XQuery – Working draft published 7 June 2001
![Page 22: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/22.jpg)
Query Processing - XQuery
XML provides flexibility in representing many kinds of information
Good query language must be likewise flexible– Pre-XQuery languages are good for specific types
of data
Goal: “[S]mall, easily implementable language in which queries are concise and easily understood.”
![Page 23: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/23.jpg)
XQuery Forms
1. Path expressions
2. Element constructors
3. FLWR expressions
4. Operator/Function expressions
5. Conditional expressions
6. Quantified expressions
7. Data Type expressions
![Page 24: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/24.jpg)
XQuery – Path Expressions
Contribution of XPath XQuery 1.0 and XPath 2.0 Data Model
document(“sample1.xml”)//emp/salary
/employees/emp/name[../@gender=‘f’]
//emp[1 TO 3]/name/first
![Page 25: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/25.jpg)
XQuery – Element Constructors
Queries can generate new elements Similar to XSLT abilities
<worker>
{$name/last}
{$position}
</worker>
![Page 26: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/26.jpg)
XQuery – FLWR Expressions
For clause/Let clause/Where clause/Return Similar to SQL
FOR $e IN document(“sample1.xml”)//emp
WHERE $e/salary > 38000
AND $e/@gender = ‘f’
RETURN $e/name
![Page 27: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/27.jpg)
XQuery – Operator/Function Expressions
Pre-defined and user-defined operators and functions
Still under development: Union, Intersect, Except
FOR $e IN //employees/emp
WHERE not(empty($e//mi))
RETURN $e/name
![Page 28: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/28.jpg)
XQuery – Conditional Expressions
If-then-else expressions are not yet limited to boolean (ongoing discussion)
FOR $e IN /employees/empRETURN<worker> {$name} IF ($e/position=“Manager”) THEN <manager /></worker>
![Page 29: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/29.jpg)
Quanitifed Expressions
Some/Every conditions Some/Every evaluates to True or False
FOR $e IN //employees
WHERE SOME $p IN $e//emp/position = “Manager”
RETURN $e
![Page 30: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/30.jpg)
Data Types
Data Types based on those available from XML Schema
Data types can be literal (“Brian”), from constructor functions (date(“2001-10-11”) ), or from casting ( CAST AS xsd:integer(24) )
User-defined data types are also allowable and parsable
![Page 31: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/31.jpg)
XQuery
More choices than XSLT/XPath combination Work in progress Current W3C efforts into query language Influencing the future design of the core XML
technologies (XPath) Hopes to be fully flexible for all future XML
applications
![Page 32: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/32.jpg)
Query Processing – Research
XQuery specification continues to undergo review and change– 6 of 7 specification documents released since June– All specifications released in 2001
Other avenues of research– Other Query languages– Indexing strategies– Implementation
![Page 33: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/33.jpg)
Query Processing – Other Query Languages
Many query languages exist– Quilt (basis for XQuery)– W3C early languages (XML-QL, XQL)– Adopted traditional languages (OQL, XSQL)– Research papers (XML-GL, YATL, Lorel)
Other query languages often optimized for a particular subset of XML documents
Query language field *MAY* be standardizing to XQuery
![Page 34: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/34.jpg)
Query Processing – Indexing Strategy
Query language less important; better indexing techniques lead to efficiency
XISS (XML Indexing and Storage System)– September 19, 2001 publishing– Builds sets of indexes on XML data elements and
attributes on initial parse of XML document– Lookup becomes constant-time through the various
built indexes– Demonstrated successes in test runs
![Page 35: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/35.jpg)
Query Processing - Implementation
XML is currently in state of flux– Standards are still being revised– Industry cautious before embracing a new
technology– Economic slowdown may prevent new research and
development efforts
XML still waiting for its “Killer App”, application that forces immediate acceptance
![Page 36: Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649e4a5503460f94b3ded4/html5/thumbnails/36.jpg)
XML Query Processing
XML is a functional database storage language Efficient query language needed to turn XML
into a viable database Query language solutions are being developed
– Java/C++ hooks first developed – OK– XSLT/XPath implemented – GOOD– XQuery being designed – GREAT?– Future additions – ????