xml use cases: customer strategies and experiences€¦ · why store xml in a database? • xml use...
TRANSCRIPT
© 2010 IBM Corporation
XML Use Cases: Customer Strategies and Experiences
https://www.ibm.com/developerworks/wikis/display/db2xml/XMLUseCases Charts and Recording
Susan Malaika, Matthias [email protected], [email protected]
2
Agenda
• What is DB2 pureXML? • Why store XML in a Database?• XML Use Cases
• Forms Processing• SOA and Data Integration• Application and Event Logging• Adding Flexibility to Fixed Schema Applications• and more…
• Summary
3
Agenda
• What is DB2 pureXML? • Why store XML in a Database?• XML Use Cases
• Forms Processing• SOA and Data Integration• Application and Event Logging• Adding Flexibility to Fixed Schema Applications• and more…
• Summary
4
What is DB2 pureXML ?
• Native XML storage, indexing and processing in DB2
• Tight integration of XML and relational data management
• Based on standards: XQuery, XPath, SQL/XML, XML Schema, XSLT, Namespaces, etc.
• Facilitates end-to-end XML
5
DB2 pureXML Overview 1/2create table customer (cid integer, info XML)
insert into customer (cid, info) values (?,?)
select cid, info from customer
select xmlquery('$INFO/customer/name') from customer where cid > 1234 and xmlexists('$INFO/customer/addr[zip = 95123]')
xquery for $i in db2-fn:xmlcolumn("CUSTOMER.INFO")/customerwhere $i/addr/zip = 95123return <myresult>{$i/name}</myresult>
1
2
3
4
5
6
create index idx1 on customer(info) generate key using xmlpattern '/customer/addr/zip' as sql varchar(5)
update customer set info = ? where ….
update customerset info = xmlquery(‘copy $new := $INFO
modify do replace value of $new/customer/addr/zip with 95141
return $new')where …;
Plus: XML Schema Support, Utilities, Shredding, XSLT, etc.
DB2 pureXML Overview 2/2
9
10
7
8
7
Agenda• What is DB2 pureXML? • Why store XML in a Database?• XML Use Cases
• Forms Processing• SOA and Data Integration• Application and Event Logging• Adding Flexibility to Fixed Schema Applications• and more…
• Summary
8© 2010 IBM Corporation
Why/When store XML instead of Relational ?When business objects are typically represented as XML when transmitted When data format (schema) changes over timeWhen schemas from different sources are combined to represent an external entityWhen schema is complex and highly variable, such as:
XML Schema Files
Type Definitions
Elements +Attributes
HL7 CDA 3 6 1953 945 + 477
STAR 192 5846 77319 + 625
FpML 4.2: 23 686 1867 + 196
FIXML 4.4: 41 1310 619 + 2593
Health Level 7, Clinical Document Architecture
Standards for Technology in Automotive Retail (OAGIS)
Financial products Markup Language
Financial Information eXchange Protocol
Many industry formats have similar XML Schema complexity/variability
9© 2010 IBM Corporation
In Relational: Many tables
In pureXML: as simple ascreate table T(ID int, trade XML);
Generate Relational Schema for FpML
10
Web Applications Before DB2 pureXML
Client Application ServerXML
XML ObjectsMapping Mapping
Relational schema
Relational Database
• XML to object mapping + object to relational mapping• Complex, very costly, inflexible, inefficient, difficult to
maintain over time
11
End-to-end XML Processing
Client XML Forms, XHMTL etc. DB2 pureXML
XMLSOA Gateway / WAS
optional: XSLT
• End-to-End Straight-Through Processing using XML• XML programming paradigm and architecture pattern
• XForms• REST/SOAP web services• XML Standards: XPath, XQuery, XML Schema, XSLT, etc.
Protocol SOAP, REST,
MQ
XMLrelational
XML XML XML
SQL/XMLXQuery
12
Parse Once – Access Many
Client XML Forms, XHMTL etc. DB2 pureXML
XMLSOA Gateway / WAS
optional: XSLT
• The XML is parsed once on input • The XML is processed many times – through declarative query
languages (SQL/XML, XQuery) • Benefits
• Reduce CPU usage overall and in the middle tier in particular• Access to the XML is through a declarative language that can be
optimized outside the application e.g., through indexing
Protocol SOAP, REST,
MQ
XMLrelational
XML XML XML
SQL/XMLXQuery
Reduce CPU Usagein Middle Tier
13
XML vs. Relational
DepartmentDEPTID DEPTNAME
15 SalesEmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY
15 27 MICHAEL THOMPSON NULL 4125015 10 CHRISTINE SMITH 408-463-4963 52750
DEPTDOC<DEPARTMENT deptid="15" deptname="Sales">
<EMPLOYEE><EMPNO>10</EMPNO><FIRSTNAME>CHRISTINE</FIRSTNAME><LASTNAME>SMITH</LASTNAME><PHONE>408-463-4963</PHONE><SALARY>52750.00</SALARY>
</EMPLOYEE><EMPLOYEE>
<EMPNO>27</EMPNO><FIRSTNAME>MICHAEL</FIRSTNAME><LASTNAME>THOMPSON</LASTNAME> <SALARY>41250.00</SALARY>
</EMPLOYEE></DEPARTMENT>
…
create table dept(deptdoc XML);
14
Schema Evolution
PhoneEMPNO PHONE
27 406-463-123410 415-010-123410 408-463-4963
Relational Requires:• Normalization of existing data !• Change of applications
Costly!
DepartmentDEPTID DEPTNAME
15 SalesEmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY
15 27 MICHAEL THOMPSON 406-463-1234 4125015 10 CHRISTINE SMITH 408-463-4963 52750
“Employees are now allowed to have multiple phone numbers…”DEPTDOC<DEPARTMENT deptid="15" deptname="Sales">
<EMPLOYEE><EMPNO>10</EMPNO><FIRSTNAME>CHRISTINE</FIRSTNAME><LASTNAME>SMITH</LASTNAME><PHONE>408-463-4963</PHONE><PHONE>415-010-1234</PHONE><SALARY>52750.00</SALARY>
</EMPLOYEE><EMPLOYEE>
<EMPNO>27</EMPNO><FIRSTNAME>MICHAEL</FIRSTNAME><LASTNAME>THOMPSON</LASTNAME><PHONE>406-463-1234</PHONE><SALARY>41250.00</SALARY>
</EMPLOYEE></DEPARTMENT>
…
15
Denormalization
Sales
EmployeeDEPTID EMPNO FIRSTNAME LASTNAME PHONE SALARY
15 27 MICHAEL THOMPSON 406-463-1234 4125015 10 CHRISTINE SMITH 415-010-1234 52750Sales
DEPTNAME
15 10 CHRISTINE SMITH 408-463-4963 52750Sales
DEPTDOC<DEPARTMENT deptid="15" deptname="Sales">
<EMPLOYEE><EMPNO>10</EMPNO><FIRSTNAME>CHRISTINE</FIRSTNAME><LASTNAME>SMITH</LASTNAME><PHONE>408-463-4963</PHONE><PHONE>415-010-1234</PHONE><SALARY>52750.00</SALARY>
</EMPLOYEE><EMPLOYEE>
<EMPNO>27</EMPNO><FIRSTNAME>MICHAEL</FIRSTNAME><LASTNAME>THOMPSON</LASTNAME><PHONE>406-463-1234</PHONE><SALARY>41250.00</SALARY>
</EMPLOYEE></DEPARTMENT>
…
• …can simplify a database schema• …can reduce the number of joins• …can simplify application development• …can reduce number of DBMS API calls• …can improve performance
In relational, denormalizationintroduces redundancy
XML allows denormalizationwithout redundancy
create table dept(deptdoc XML);
16
12 Normalized Relational Tables representing 250K External Entities
XML
1 XML column representing 250K External Entities
Java ProgramPerforms 15SQL Selects
Java ProgramPerforms 1XML Select
3.2 million entities retrieved in 15 minutes at same server CPU utilization
5 million entities retrieved in 15 minutes at same server CPU, and much lower client CPU utilization
Two Equivalent Data RetrievalsComparing Normalized Relational and XML in a Customer Scenario
::
55% higherthroughput
18
Typical XML Collection Structure• Not either XML or relational• Hybrid database schema• Simple Storage Structure• Small number of tables• Few fixed fields in relational
columns• Variable data in XML• DB2 can extract selected XML
elements in relational columns if needed
Key1 Key2 XML
… … <DEPARTMENT deptid="15" deptname="Sales"><EMPLOYEE>
<EMPNO>10</EMPNO><FIRSTNAME>CHRISTINE</FIRSTNAME><LASTNAME>SMITH</LASTNAME><PHONE>408-463-4963</PHONE><SALARY>52750.00</SALARY>
</EMPLOYEE><EMPLOYEE>
<EMPNO>27</EMPNO><FIRSTNAME>MICHAEL</FIRSTNAME><LASTNAME>THOMPSON</LASTNAME> <SALARY>41250.00</SALARY>
</EMPLOYEE></DEPARTMENT>
… … <DEPARTMENT deptid="33" deptname="HR"><EMPLOYEE>
<EMPNO>17</EMPNO><FIRSTNAME>Mary</FIRSTNAME><LASTNAME>JONES</LASTNAME><PHONE>408-463-4963</PHONE><SALARY>52750.00</SALARY>
</EMPLOYEE></DEPARTMENT>
… … …
TABLE
19
Typical XML Collection Structure• Not either XML or relational• Hybrid database schema• Simple Storage Structure• Small number of tables• Few fixed fields in relational
columns• Variable data in XML• DB2 can extract selected XML
elements in relational columns if needed
• Can also define relational views over portions of XML
Key1 Key2 EmpID FNname LNname … …
15 … 10 Christine Smith … …
15 … 27 Michael Thompson … …
14 33 Mary Jones … …
… … … … … … …
Key1 Key2 XML
15 … <DEPARTMENT deptid="15" deptname="Sales"><EMPLOYEE>
<EMPNO>10</EMPNO><FIRSTNAME>CHRISTINE</FIRSTNAME><LASTNAME>SMITH</LASTNAME><PHONE>408-463-4963</PHONE><SALARY>52750.00</SALARY>
</EMPLOYEE><EMPLOYEE>
<EMPNO>27</EMPNO><FIRSTNAME>MICHAEL</FIRSTNAME><LASTNAME>THOMPSON</LASTNAME> <SALARY>41250.00</SALARY>
</EMPLOYEE></DEPARTMENT>
14 … <DEPARTMENT deptid="14" deptname="HR"><EMPLOYEE>
<EMPNO>33</EMPNO><FIRSTNAME>Mary</FIRSTNAME><LASTNAME>JONES</LASTNAME><PHONE>408-463-4963</PHONE><SALARY>52750.00</SALARY>
</EMPLOYEE></DEPARTMENT>
… … …
CREATE VIEW…TABLE
VIEW
20
Agenda• What is DB2 pureXML? • Why store XML in a Database?• XML Use Cases
• Forms Processing• SOA and Data Integration• Application and Event Logging• Adding Flexibility to Fixed Schema Applications• and more…
• Summary
21
XML Use Cases - Categories• XML at the Core of Application Functionality
• Forms • SOA, Web Services• Data exchange with government and regulatory bodies
• Auditing & Compliance• Event logging / Application logging• Persisting non-XML data for audit purposes
• Adding Flexibility• Flexible Dimensions in Data Warehouses• Custom/user-defined fields in packaged applications• Configuration files and properties
• Metadata• XML for metadata • XML annotations for metadata
• XML Content Management • Creating business reports• Document management
and more…
22
XML Use Cases - Categories• XML at the Core of Application Functionality
• Forms • SOA, Web Services• Data exchange with government and regulatory bodies
• Auditing & Compliance• Event logging / Application logging• Persisting non-XML data for audit purposes
• Adding Flexibility• Flexible Dimensions in Data Warehouses• Custom/user-defined fields in packaged applications• Configuration files and properties
• Metadata• XML for metadata • XML annotations for metadata
• XML Content Management • Creating business reports• Document management
and more…
23
• Schema Diversity• In relational: complex relational database schema
and complex application logic• In XML: no fixed schema required
• Sparse Data• In relational: very many NULL values• In XML: space savings (omit optional elements)
• Schema Evolution• In relational: costly, time consuming, normalization• In XML: much less costly, no need to alter database tables
Core: Forms Processing<USAddress>– <Address>
<AddressLine1>PO BOX 228</AddressLine1><City>SCHENECTADY</City><State>NY</State><ZIPCode>123080000</ZIPCode></USAddress>
– </Address></USAddress></StateOfIncorporation><HdrCode>
– <FederalReturnFiledOther>String</FederalReturnFiledOther><FilerClassificationCode>AA3</FilerClassificationCode><FormType>CT5</FormType><ReturnTypeCode>CT5</ReturnTypeCode><SoftwareDeveloper><DeveloperName>
<BusinessNameLine1>Sunrise Investments Inc</BusinessNameLine1><BusinessNameLine2>A A</BusinessNameLine2>
XML enables end-to-end forms processing:
XMLrelational
XML XML XML
SQL/XMLXQueryXML
relational
XML XML XML
SQL/XMLXQuery
24
Application 1 Application 2
XMLAdapter
DB2 pureXML
XML
ApplicationA
ApplicationB
ApplicationM
Operational Systems/Heterogeneous Data Producers
Core: XML – The Backbone of SOA 1
XML
XML
Application NApplication 3
XMLAdapter
…
Service Bus
XMLXML XML
Stored procedures
XMLAdapter
…
XMLXML
Data Consumers
XML
XML • All SOA messages and business objects are XML• Standard XML format…
• …eases the integration of heterogeneous sources• …avoids implementing n x m conversions• …simplifies the integration of new applications (A2A)
• Examples: OAGIS, FpML, Origo• SQL/XML Stored Procedures implement services
• Simplifies application layer, reduces JDBC calls
Service Consumers
Service Providers
25
Application 1 Application 2
XMLAdapter
DB2 pureXML
XML
ApplicationA
ApplicationB
ApplicationM
XML
XML
Application NApplication 3
XMLAdapter
…
Service Bus
XMLXML XML
Stored procedures
XMLAdapter
…
XMLXML XML
XML
• XML manipulation on parsed XML in DB2 with declarative languages (SQL/XML, XQuery)
• Parse once, access many• Better end-to-end performance and lower CPU than
XML manipulation in the middle tier
Reduced CPU Usagein Middle Tier
Operational Systems/Heterogeneous Data Producers
Core: XML – The Backbone of SOA 2
Data Consumers
Service Consumers
Service Providers
26
Audit: Application and Event Logging• Requirement: Record "everything" that's happening, e.g. in online
banking, investment brokerage, insurance applications• Trace every step a customer takes, by date/sessionID/userID etc.• Log every event in an A2A or B2B environment
• Purpose:• Functional: application level monitors, used by call-centers, audits, etc.• Technical: system and infrastructure trouble shooting, used by IT
• Data Characteristics• "Event records" in XML format, common header fields, highly variable
body that cannot be represented in relational format• Header fields in relational columns, body in XML in the same row
XML
27http://www.ibm.com/developerworks/websphere/techjournal/0608_shetty/0608_shetty.html
Audit: Logging SOA Messages• DB2 pureXML logs all SOAP messages
regardless of structure• Troubleshooting and traceability
• Use XML indexes, XQuery, SQL/XML to analyze the log
28
DB2 Data Warehouse
Time
Sales
Product
Store
(Fact table)
XML
(Dimension table)
(Dimension table)
(Dimension table)
Flexibility: Dimensions in a Data Warehouse• Traditional relational warehouse
• Dimension table have fixed columns• Often hard to change/extend the table
definitions• Cannot add new columns for every new
characteristic of every new product• Limited agility in business reporting
• XML-enriched warehouse• Add attributes to products,
customers, etc. through XML insert and update operations
• Flexibility without adding columns• No impact to existing queries, but
allows new queries/reports to be introduced quicklyCustomer
(Dimension table)
29
Flexibility: Augment Applications with Temp Data Store
RelationalTables
Package ApplicationJSON
identification basic contact info & addressescontact persons & addressesagencycollaboratorsNomenclatures
XML(JSONx)
WebServer
Complex mappings
JSON <-> Relational
Simple mappings
JSON <-> JSONx
HTML Templates
DB2
XQuery
• Application uses JSON to communicate with User Interface• Benefits of storing JSON as JSONx in DB2 pureXML
• Can see what is happening in the application debugging • Can query the temporary data to understand application usage• Can monitor how application usage changes over time
• Variation on Audit: Logging JSONx messages
30
Content: Creating Business Reports A Sample Architecture
• System is simple and easy to understand • Less code - Many SQL stored procedures
became one XQuery statement • Quick changes can be made, e.g,
• When interfaces change application changes require half a day
• Changes are mostly in declarative languages which sometimes do not even require a recompile
• Easy access to report status • Because all state info and interim reports are
in XML, a Web based viewer is used • Piece-meal and parallel development possible
• Because of architected XML interfaces between components
• Different versions of reports handled concurrently • Because report versions and states are all
treated as well formed XML and are stored and queried together
• Source data and final reports are XML-based• DB2 pureXML simplifies data gathering, assembly, & validation• Bonus:
• Consistent and straightforward system configuration Because the applications configuration files are in XML
DataReadiness
DataCompilation &
Validation
ReportRendering Publishing
DataRetrieval
Data Readiness and
Queue Loader
Workflow ControllerJobScheduling
DataValidation
ReportAssembly Publish
Business
Technical
pureXML
Many Data Sources
ReportRender
XML XML XML XML XML XML
Report QueueViewer
DataAssembly
XML
XML
XML
XML
DataReadiness
DataCompilation &
Validation
ReportRendering Publishing
DataRetrieval
Data Readiness and
Queue Loader
Workflow ControllerJobScheduling
DataValidation
ReportAssembly Publish
Business
Technical
pureXML
Many Data Sources
ReportRender
XML XML XML XML XML XML
Report QueueViewer
DataAssembly
XML
XML
XML
XML
31
Agenda• What is DB2 pureXML? • Why store XML in a Database?• XML Use Cases
• Forms Processing• SOA and Data Integration• Application and Event Logging• Adding Flexibility to Fixed Schema Applications• and more…
• Summary
32
Summary 1• DB2 pureXML:
• Native XML storage, indexing and processing in DB2• Access and update through SQL/XML, XQuery, XPath• Based on well-formed XML simplifying schema evolution
• When to store XML in a database?• When business objects are typically represented as
XML when transmitted • When data format (schema) changes over time• When schemas from different sources are combined to
represent an external entity• When schema is complex and highly variable
33
Summary 2• Use Cases include:
• XML at the Core of Application Functionality, e.g., forms • Auditing & Compliance, e.g., event logging• Adding Flexibility, e.g., flexible dimensions in data warehouse • XML Content Management, e.g., reporting
• Use Cases Audit and Flexibility are sensible to start with• Why XML in a database?
• Simpler to design (no data normalisations)• Speedier to develop (less mappings)• Faster to execute (less joins and shredding)• Quicker to evolve (less database structure changes)• Easier to understand (consistent data structures) • Straightforward to prototype (less customization)• Smooth integration with Web technologies (XML based)
34
Further Information 1• DB2 pureXML Wiki:
http://www.ibm.com/developerworks/wikis/display/db2xml/Technical+Papers+and+Articles
• DB2 pureXML Devotees: http://www.ibm.com/developerworks/wikis/display/db2xml/devotee
• XML Database Blog: http://nativexmldatabase.com/
• "pureXML™ in DB2 9: Which way to query your XML Data?" http://www.ibm.com/developerworks/db2/library/techarticle/dm-0606nicola/
• "XMLTABLE by Example", Part 1 & 2 http://www.ibm.com/developerworks/db2/library/techarticle/dm-0708nicola/ http://www.ibm.com/developerworks/db2/library/techarticle/dm-0709nicola/
• “Update XML in DB2 9.5”: http://www.ibm.com/developerworks/db2/library/techarticle/dm-0710nicola/
• 15 best practices for pureXML performance in DB2 9 http://www.ibm.com/developerworks/db2/library/techarticle/dm-0610nicola/
• Performance of DB2 9 pureXML vs. CLOB and shredded XML storage http://www.ibm.com/developerworks/db2/library/techarticle/dm-0612nicola/
• XML Database Benchmark: “Transaction Processing over XML (TPoX)” http://tpox.sourceforge.net/ , http://tpox.sourceforge.net/tpoxresults.htm
35
Further Information 2• Get started with Industry Formats and Services with pureXML
http://www.ibm.com/developerworks/data/library/techarticle/dm-0705malaika/ • Industry Bundles for pureXML http://www.alphaworks.ibm.com/tech/purexml/download• Universal Services for pureXML using Data Web Services
http://www.ibm.com/developerworks/data/library/techarticle/dm-0805malaika/ • Lotus Forms, XForms, and DB2 9 pureXML
http://www.ibm.com/developerworks/wikis/download/attachments/1824/LotusFormsXForms andDB2pureXML.pdf
• Build a pureXML and JSON application, Parts 1-3 http://www.ibm.com/developerworks/xml/library/x-db2JSONpt1/index.html?ca=drs-
• WebSphere DataPower and DB2 pureXML, Parts 1-2: DB2 pureXML as an audit log for WebSphere DataPower http://www.ibm.com/developerworks/data/library/techarticle/dm- 0806malaika
• IBM WebSphere Developer Technical Journal: Using DB2 Version 9 pureXML with WebSphere ESB http://www.ibm.com/developerworks/websphere/techjournal/0608_shetty/0608_shetty.html
• Using industry standard data formats with WebSphere ESB and DB2 Version 9 pureXML http://www.ibm.com/developerworks/websphere/techjournal/0706_elhilaly/0706_elhilaly.html
• Processing SEPA payments using DB2 pureXML and WebSphere Transformation Extender http://www.ibm.com/developerworks/data/library/techarticle/dm-0904sepapayments/
• IBM InfoSphere DataStage and DB2 pureXML, Parts 1-2: Integrate XML operational data into a data warehouse http://www.ibm.com/developerworks/views/data/libraryview.jsp?search_by=purexml+infosphere+s eries
36
• Comprehensive coverage of pureXML in DB2 for Linux, UNIX, Windows and DB2 for z/OS
• http://tinyurl.com/pureXML
37
pureXML Bootcamps – World-Wide• A 2 or 3 day pureXML Bootcamp – Run on demand – in
many cities globally• Includes tutoring on XQuery, SQL/XML, XML storage and
indexing, XML schema handling, universal services and more:
• The bootcamp has plenty of hands on labs to practice – including building a complete application
• Feedback includes “Fantastic – Fast paced”, “I really enjoyed the bootcamp”.
• See Q&A’s and photos here https://www.ibm.com/developerworks/wikis/display/ db2xml/BootCamps
• Watch the pureXML forum for announcements of upcoming bootcamps http://www.ibm.com/developerworks/forums/forum. jspa?forumID=1423
• Contact [email protected] if you would like a bootcamp run in your area
• Join the pureXML Devotee mailing list http://www.ibm.com/developerworks/wikis/display/ db2xml/devotee
38
Backup
39
Core: XML End-To-End – pureXML and Forms
Universal Services for pureXML using Data Web Services:http://www.ibm.com/developerworks/db2/library/techarticle/dm-0805malaikaForms and pureXML: http://www.ibm.com/developerworks/db2/library/techarticle/dm-0805malaika2
40
40
Relational Views over XML
<dept bldg="101"><employee id="901">
<name><first>John</first><last>Doe</last>
</name><office>344</office>
</employee><employee id="902">
<name><first>Peter</first><last>Pan</last>
</name><office>216</office>
</employee></dept>
CREATE VIEW empview(empid, firstname, lastname, office)AS SELECT X.* FROM dept,
XMLTABLE ('$DOC/dept/employee' COLUMNS empid INTEGER PATH '@id', firstname VARCHAR(30) PATH 'name/first', lastname VARCHAR(30) PATH 'name/last', office INTEGER PATH 'office') AS X
empid firstname lastname office901 John Doe 344902 Peter Pan 216
create table dept(doc XML);select lastname, officefrom empviewwhere empid = 901;
create index idx1 on dept(doc) generate keys using xmlpattern '/dept/employee/@id as sql double;
TABLE
VIEW
VIEW DEFINITION
SQL Access to XML