xml schemas in oracle xml db ravi murthy sandeepan banerjee oracle corporation
TRANSCRIPT
Talk Overview
Why XML in DB ? Background
– XML Schema– Object-Relational DB
XML Schema Support in Oracle XML DB – XML Storage– XML Query – XML Update
Future directions and Conclusions
Why XML in DB ?
Trends in industry– More XML content being generated– More applications dealing with (structured) data and (semi-
structured) documents– Need for unified management for all kinds of data– XML Schema language provides strong typing
Many benefits to storing XML in DB– Better Queriability– Optimized Updates– Stronger Validation
Fidelity of XML very important
Oracle XML DB Overview
Native XML data type Supports W3C XML Schema data model XML/SQL Duality Support for XML standards
– Namespaces, XPath, XSLT, SQL/XML
High performance XML repository– Hierarchical File System Abstraction– Protocols : FTP, HTTP/WebDAV– Access Control (ACL)– Versioning
XML Schema W3C Recommendation Large number of built-in scalar data types Simple type definitions
– Length, pattern and other constraints Complex type definitions
– sequence / choice / all– Mixed content
Extensible– Derivation by extension and restriction– Substitution Groups– Wildcards
Gradually replacing the traditional DTD Commonly used as a validation mechanism
Object Relational DB Basics
Object types Collection types Object References LOBs
XML Schema Construct
Object Construct
Complex type Object type
Local complex type Embedded object type
Complex type with maxOccurs > 1
Collection type
Derived complex type
Subtype
XMLType Native data type for XML Used to define columns of tables and views,
arguments to stored procedures, etc. XML specific methods and operators for
– Querying and extracting XML using XPath– Transforming XML using XSLT– Validating XML using XML Schema
Multiple Storage Options– Unstructured Storage in CLOB– Structured Storage into object-relational rows and columns– Hybrid Storage
Maintains application transparency to physical storage choice
XML DB and XML Schema
XML Schema controls all aspects of processing – Storage mappings – In-memory representations– Language Bindings
XML Schema Registration Process– Associates XML Schema with URL– Generates Object types– Creates default tables
XMLType column can be constrained to a global element of registered schema
XML Schema Example
<schema targetNamespace=“http://www.oracle.com/PO.xsd” xmlns:po=“http://www.oracle.com/PO.xsd”elementFormDefault=”qualified”xmlns="http://www.w3.org/2001/XMLSchema">
<complexType name="PurchaseOrderType"> <sequence>
<element name="PONum" type="decimal"/> <element name="Company"> <simpleType>
<restriction base="string"> <maxLength value="100"/> </restriction>
</simpleType> </element>
XML Schema Example (contd)
<element name="Item" maxOccurs="1000"> <complexType> <sequence>
<element name="Part"> <simpleType> <restriction base="string">
<maxLength value="1000"/> </restriction> </simpleType> </element> <element name="Price" type="float"/>
</sequence> </complexType> </element>
</sequence> </complexType>
<element name="PurchaseOrder" type="po:PurchaseOrderType"/> </schema>
Generated Object Types
TYPE "Item_T" (part varchar2(1000), price number);
TYPE "Item_COLL" AS VARRAY(1000) OF "Item_T";
TYPE "PurchaseOrderType_T" (ponum number, company varchar2(100), item Item_COLL);
Generated Tables
TABLE po_tab
OF XMLTYPE
XMLSCHEMA “http://www.oracle.com/PO.xsd"
ELEMENT "PurchaseOrder“
VARRAY(item) STORE AS item_tab;
XML Document Example<PurchaseOrder xmlns="http://www.oracle.com/PO.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.oracle.com/PO.xsd
http://www.oracle.com/PO.xsd"> <PONum>1001</PONum> <Company>Oracle Corp</Company> <Item>
<Part>9i Doc Set</Part> <Price>2550</Price>
</Item> <Item>
<Part>8i Doc Set</Part> <Price>350</Price>
</Item> </PurchaseOrder>
XML Storage – PO_TAB and LINE_TAB
Row ID ponum company
1 1001 Oracle Corp
Parent Row ID Array Index part price
1 1 9i Doc Set 2250
1 2 8i Doc Set 350
Structured Storage Attributes and single-valued elements
– Stored as columns in single row– SQL data types correspond to XML Schema types– SQL constraints correspond to XML Schema constraints
Multi-valued elements (collections) stored in separate nested tables
– One row per item in collection– Nested table row stores parent key– Array Index column stores the position information
Number column – uses full range of floating points Elements inserted in the middle of the collection get
(previous_array_index + next_array_index)/2 Supports multiple levels of nesting
– Embedded object types– Embedded collection types with multiple nested tables
DOM Fidelity
Structured storage guarantees DOM fidelity– No whitespace fidelity
System binary attribute in object types– SYS_XDBPD$
PD attribute stores non-relational information– Ordering of elements – Comments – Processing Instructions– Namespace declarations – Prefix information – Mixed content – text nodes that are intermixed with
elements are stored in the system column
XDB Attributes in XML Schema
Mapping details captured as new attributes within the XML Schema
– Oracle attributes use namespace : http://xmlns.oracle.com/xdb
Input XML Schema does not need to be annotated– Default assumptions for all XDB attributes
Input XML Schema can explicitly specify XDB attributes
– SQLName : Name of column or type attribute– SQLType : Name of object type – SQLCollType : Name of collection type– MaintainOrder : Permits turning off order maintenance
Hybrid Storage of XML
Two ends of storage spectrum– Store entire content in a single LOB– Full shredding of XML data
Hybrid Storage Options– Combination of shredding and LOB storage– xdb:SQLType=“CLOB” applied to <complexType>– CLOB storage ideal for fragments that are not intended for
query or partial updates– Example : XHTML content embedded within resource
descriptors (metadata)
Querying XMLType
XPath based operators existsNode
– Boolean operator– Checks for existence of node identified by XPath
extract– Extracts a fragment identified by XPath
extractValue– Retrieves the raw value of leaf node identified by XPath
Namespace Aware ANSI SQL/XML Standards effort
Query Rewrite
Automatic rewrite of XPaths during query compilation
Rewritten query directly accesses underlying relational columns
Introduces joins with nested tables Enables use of indexes
Query Rewrite - Example
Original QuerySELECT extractValue(value(p),
'/PurchaseOrder/PONum') FROM po_tab pWHERE existsNode(value(p),
'/PurchaseOrder[Company=Oracle]') = 1; Rewritten Query
SELECT p.ponum FROM po_tab pWHERE p.company = 'Oracle';
Query Rewrite Examples
XPath Expression Rewrite LogicSimple XPath expressions /PurchaseOrder/Company/PurchaseOrder/@poDate
Traversals of object attributes1 or more levels
Collection traversals/PurchaseOrder/LineItem/Part
Joins with appropriate nested tables
Predicates[Company=“Oracle”]
Relational predicates
List indexesLineitem[1]
Operator to access i-th item of collectionUses index on array_index column
Updating XMLType
Updating entire XML document Partial Update
– Uses UpdateXML() operator– XPath identifies element or attribute to be
updated– New value for the updated node is specified– Rewritten to directly update underlying column(s)
Similar mechanisms for – Inserting new nodes– Deleting node(s)
Update Rewrite - Example
Original StatementUPDATE po_tab p SET value(p) = updatexml(value(p),
'/PurchaseOrder/PONum/text()',
9999)WHERE existsNode(value(p),
‘/PurchaseOrder[Company=Oracle]’) = 1; Rewritten Statement
UPDATE po_tab p SET p.ponum = 9999WHERE p.Company = ‘Oracle’;
XMLType Views
Provide an XML view of relational data Good evolutionary strategy
– New XML apps on XML abstraction of existing data
SQL/XML Standard operators used to generate XML Views can generate schema based XML Insert / Update / Delete operations via ‘Instead of’
Triggers Queries over XML views are rewritten to directly
access underlying relational columns
XMLType View - Example
CREATE VIEW po_view of XMLTYPEXMLSCHEMA "po.xsd" ELEMENT "PurchaseOrder" AS SELECT
XMLElement("PurchaseOrder", XMLForest(p.ponum "PONum",
p.company "Company"), (SELECT XMLAGG(
XMLElement("Item", XMLForest(i.part "Part",
i.price "Price")) FROM items_rel_tab i WHERE i.po_id = p.id))
FROM po_rel_tab p;
Complex XML Schemas
Cyclic Definitions– Object types created with references– Table row contains reference to other rows stored in same
or different table– Keys to document and parent rows stored in nested rows
Complex type derivation – Mapped to object type inheritance
Wildcards– Mapped to CLOB attributes
Indexing XMLType
Multiple index types B-Tree and bitmap indexes Function-based indexes
– Create index on specific XPath expressions
Text indexes– Inverted lists provide section-based search– Also support keyword based search within textual
content
Future Directions
Optimize XML Query support– Translate XML Query to SQL– Works against distributed databases using
Oracle connectivity solutions
Optimize support for highly variable documents
– New storage and indexing techniques