complex legacy system archiving/data retention with mongodb and xquery
TRANSCRIPT
Legacy System Archiving With XML, XQueryand MongoDB
Dave Watson
SVP, iWay Software
@watsondaveny
Agenda
XML Archive Overview and Business Use Cases
XML Archive Technical Discussion
Copyright 2009, Information Builders. Slide 2
iWay Archive
Copyright 2009, Information Builders. Slide 3
What is XML Archive
An extension of ESB for archiving data
Leverage ESB process-oriented integration and data
federation capabilities
Long term data retention
Large repository, large index (Big Data)
Search and retrieve capabilities (High performance)
Business use examples
Satisfy regulatory requirement
e-Discovery (e.g. research, forensic)
Business analytics
Archive – Solving Business Needs
Copyright 2009, Information Builders. Slide 4
Regulations / Reqrs Example Data Retention
Federal Record
Retention Requirement
Patient health records 75 years (after last
episode of care)
FDA 21 CFR Part 11 Clinical trials and FDA
approval
35 years
HIPAA (Healthcare) Pediatric medical records 21 years
Sarbanes-Oxley (public
companies)
Audit 7 years
SEC 17a-4 (Financial
services)
Account records
Corporate documentation
6 years
Life of the enterprise
Research Life science Long-term
Analytics Financial / Legal Long-term
Examples of Business Requirements:
Archive – Types of Data
Can handles all types of data, for example: Electronic Documents
Word, Excel, EDI, HL7, XML, …
Applications
ERPs, CRMs, SAP, SFDC, …
Database Data
IMS, DB2, Oracle, Sybase, SQL Server, MUMPS, …
Electronic Files
VSAM, Unix, Logs, …
Outlook, Lotus Notes
Others
Multimedia files, Paper, Blueprints, Forms, Claims, …
ESB adapter components can be used to connect to the different types of
data.
Archive – Archiving Needs
Copyright 2009, Information Builders. Slide 6
Archive Requirements
Policy Based – Logical selection of DB records/transactions to be archived
Store very large amounts of data in archive
Keep data for a very long periods of time
Become independent from Applications/DBMS/Systems – future proof
Protect authenticity of data – regulation and compliance
Access archived data when needed / as needed
Quickly search huge numbers of archived documents
Discard data after retention period – regulation and compliance
Examples of Archiving Requirements:
Copyright 2010, Information Builders. Slide 7
Store 75 years worth of patient data
Diverse Sources
XML
MUMPS
Oracle
HL7
Support archive, query and integration scenarios
XML to remain unchanged and exist outside the data store
Ability to query documents
Ability to retrieve original XML or part of XML using XQuery
Ability to integrate XML archived data in federated services
with operational sources (e.g. MUMPS, HL7, Oracle)
Archive – Example Business Use Case
Copyright 2007, Information Builders. Slide 8
Highly scalable high performance document
management database
Easily integrates into a ESB architecture
Multi-threaded parallel processing
Distributed processing
Just another data source along with, e.g., Oracle and
MUMPS databases
Leverage ESB Tools for process orchestration,
process monitoring, data mapping/transformation,
security and data aggregation capabilities.
Implementation and vendor neutral – archived data (e.g.
XML) stored in the operating system‟s native file system
Archive – Example Business Requirements
Copyright 2009, Information Builders. Slide 9
XML Archive Technical Discussion
Overview
Copyright 2009, Information Builders. Slide 10
Load Channel
Reads XML documents and loads them into the
document repository.
Query Channel
Handles query request and response against the
document repository.
Test Channel
Simple visual interface displaying functionality and
usage of the Query API.
Highly configurable ESB Java application that can be
customized to specific needs.
Technology Involved
Copyright 2009, Information Builders. Slide 11
ESB -
iWay Service Manager (commercial)
IBM WebSphere ESB (commercial)
Oracle Service Bus (commercial)
WS02 ESB (open source)
mongoDB - http://www.mongodb.org/
JSON - Java Script Object Notation
XQuery - XML query language
mongoDB
“Humongous”
Scalable, high-performance, document-oriented database.
JSON-style documents.
Mirror capable.
Auto-Sharding (clustering), horizontal scaling, automatic
failover, zero single point failure.
MapReduce support for complex processing. Work is
distributed among the cluster.
GridFS support.
A distributed file system.
Commercial support from 10gen (OEM by iWay Software)
Copyright 2009, Information Builders. Slide 12
XQuery
A query and functional programming language for XML
documents.
Is to XML documents what SQL is to databases.
“FLWOR” expressions.
FOR, LET, WHERE, ORDER BY, RETURN
Example:
for $x in /FEDREG/CNTNTS/AGCY where
$x/EAR=„Agricultural‟ order by $x ascending
return $x
Supports syntax for constructing new documents.
Copyright 2009, Information Builders. Slide 13
JSON – JavaScript Object Notation
Copyright 2009, Information Builders. Slide 14
The new data-interchange language of the web.
www.json.org
Base Loading Architecture
Copyright 2009, Information Builders. Slide 15
ESB
mongoDB
Listener Flow
XML to
JSON
Binary
Storage
Store
JSON
Store
XML
GridFS
Base Query Architecture
Copyright 2009, Information Builders. Slide 16
ESB
mongoDB
HTTP
Listener Flow
Query
DB
Binary
Storage
(Optional)
Get XML
GridFS
Requester
Loading ModificationExternal Storage
Copyright 2009, Information Builders. Slide 17
ESB
mongoDB
Listener Flow
XML to
JSON
File System
Store
JSON
Store
XML
Loading ModificationSAP Loading Architecture
Copyright 2009, Information Builders. Slide 18
ESB
mongoDB
Flow
XML to
JSON
Binary
Storage
Store
JSON
Store
XML
GridFS
RFC
Server
SAP
System
Store
IDOC
IDOC to
XML
Loading ModificationChange Data Capture Loading Architecture
Copyright 2009, Information Builders. Slide 19
ESB
mongoDB
Flow
XML to
JSON
Binary
Storage
Store
JSON
Store
XML
GridFS
CDC
Listener
RDBMS
Loading ModificationSalesforce.com Loading Architecture
Copyright 2009, Information Builders. Slide 20
ESB
mongoDB
Flow
XML to
JSON
Binary
Storage
Store
JSON
Store
XML
GridFS
SOAP
Listener
Salesforce
System
Loading ModificationFTP Loading Architecture
Copyright 2009, Information Builders. Slide 21
ESB
mongoDB
Flow
XML to
JSON
Binary
Storage
Store
JSON
Store
XML
GridFS
FTP
Server
File
System
Query ModificationWeb Service SOAP Query Architecture
Copyright 2009, Information Builders. Slide 22
ESB
mongoDB
SOAP
Listener Flow
Query
DB
Binary
Storage
(Optional)
Get XML/
IDOC
GridFS
Web
Service
Client
The Test Client
Note: The archive is designed to be called from other
flows or programs.
A simple AJAX based human interface for querying the XML
Archive.
Provides examples of the HTTP query interface provided by
the base XML Archive.
Installed with the base implementation of the XML Archive.
Copyright 2009, Information Builders. Slide 23
Simple Example
Copyright 2009, Information Builders. Slide 24
Loaded this simple XML Doc:
Displaying the Document
Copyright 2009, Information Builders. Slide 25
XML Link:
JSON Link:
Basic Query
Copyright 2009, Information Builders. Slide 26
Return all documents who have the name attribute of
the <a> element equal to “bob”.
Advanced Queries
Copyright 2009, Information Builders. Slide 27
Support for:
And
Or
Regular Expressions
Ranges
Query handler is a wrapper around the mongoDB
query language.
Basic XQUERY
Copyright 2009, Information Builders. Slide 28
Return only the <b> element from the document.
Formatted Result: