complex legacy system archiving/data retention with mongodb and xquery

28
Legacy System Archiving With XML, XQuery and MongoDB Dave Watson SVP, iWay Software @watsondaveny [email protected]

Upload: dataversity

Post on 20-Aug-2015

1.458 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Legacy System Archiving With XML, XQueryand MongoDB

Dave Watson

SVP, iWay Software

@watsondaveny

[email protected]

Page 2: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Agenda

XML Archive Overview and Business Use Cases

XML Archive Technical Discussion

Copyright 2009, Information Builders. Slide 2

Page 3: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

iWay Archive

Copyright 2009, Information Builders. Slide 3

What is XML Archive

An extension of ESB for archiving data

Leverage ESB process-oriented integration and data

federation capabilities

Long term data retention

Large repository, large index (Big Data)

Search and retrieve capabilities (High performance)

Business use examples

Satisfy regulatory requirement

e-Discovery (e.g. research, forensic)

Business analytics

Page 4: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Archive – Solving Business Needs

Copyright 2009, Information Builders. Slide 4

Regulations / Reqrs Example Data Retention

Federal Record

Retention Requirement

Patient health records 75 years (after last

episode of care)

FDA 21 CFR Part 11 Clinical trials and FDA

approval

35 years

HIPAA (Healthcare) Pediatric medical records 21 years

Sarbanes-Oxley (public

companies)

Audit 7 years

SEC 17a-4 (Financial

services)

Account records

Corporate documentation

6 years

Life of the enterprise

Research Life science Long-term

Analytics Financial / Legal Long-term

Examples of Business Requirements:

Page 5: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Archive – Types of Data

Can handles all types of data, for example: Electronic Documents

Word, Excel, EDI, HL7, XML, …

Applications

ERPs, CRMs, SAP, SFDC, …

Database Data

IMS, DB2, Oracle, Sybase, SQL Server, MUMPS, …

Electronic Files

VSAM, Unix, Logs, …

Email

Outlook, Lotus Notes

Others

Multimedia files, Paper, Blueprints, Forms, Claims, …

ESB adapter components can be used to connect to the different types of

data.

Page 6: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Archive – Archiving Needs

Copyright 2009, Information Builders. Slide 6

Archive Requirements

Policy Based – Logical selection of DB records/transactions to be archived

Store very large amounts of data in archive

Keep data for a very long periods of time

Become independent from Applications/DBMS/Systems – future proof

Protect authenticity of data – regulation and compliance

Access archived data when needed / as needed

Quickly search huge numbers of archived documents

Discard data after retention period – regulation and compliance

Examples of Archiving Requirements:

Page 7: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Copyright 2010, Information Builders. Slide 7

Store 75 years worth of patient data

Diverse Sources

XML

MUMPS

Oracle

HL7

Support archive, query and integration scenarios

XML to remain unchanged and exist outside the data store

Ability to query documents

Ability to retrieve original XML or part of XML using XQuery

Ability to integrate XML archived data in federated services

with operational sources (e.g. MUMPS, HL7, Oracle)

Archive – Example Business Use Case

Page 8: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Copyright 2007, Information Builders. Slide 8

Highly scalable high performance document

management database

Easily integrates into a ESB architecture

Multi-threaded parallel processing

Distributed processing

Just another data source along with, e.g., Oracle and

MUMPS databases

Leverage ESB Tools for process orchestration,

process monitoring, data mapping/transformation,

security and data aggregation capabilities.

Implementation and vendor neutral – archived data (e.g.

XML) stored in the operating system‟s native file system

Archive – Example Business Requirements

Page 9: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Copyright 2009, Information Builders. Slide 9

XML Archive Technical Discussion

Page 10: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Overview

Copyright 2009, Information Builders. Slide 10

Load Channel

Reads XML documents and loads them into the

document repository.

Query Channel

Handles query request and response against the

document repository.

Test Channel

Simple visual interface displaying functionality and

usage of the Query API.

Highly configurable ESB Java application that can be

customized to specific needs.

Page 11: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Technology Involved

Copyright 2009, Information Builders. Slide 11

ESB -

iWay Service Manager (commercial)

IBM WebSphere ESB (commercial)

Oracle Service Bus (commercial)

WS02 ESB (open source)

mongoDB - http://www.mongodb.org/

JSON - Java Script Object Notation

XQuery - XML query language

Page 12: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

mongoDB

“Humongous”

Scalable, high-performance, document-oriented database.

JSON-style documents.

Mirror capable.

Auto-Sharding (clustering), horizontal scaling, automatic

failover, zero single point failure.

MapReduce support for complex processing. Work is

distributed among the cluster.

GridFS support.

A distributed file system.

Commercial support from 10gen (OEM by iWay Software)

Copyright 2009, Information Builders. Slide 12

Page 13: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

XQuery

A query and functional programming language for XML

documents.

Is to XML documents what SQL is to databases.

“FLWOR” expressions.

FOR, LET, WHERE, ORDER BY, RETURN

Example:

for $x in /FEDREG/CNTNTS/AGCY where

$x/EAR=„Agricultural‟ order by $x ascending

return $x

Supports syntax for constructing new documents.

Copyright 2009, Information Builders. Slide 13

Page 14: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

JSON – JavaScript Object Notation

Copyright 2009, Information Builders. Slide 14

The new data-interchange language of the web.

www.json.org

Page 15: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Base Loading Architecture

Copyright 2009, Information Builders. Slide 15

ESB

mongoDB

Listener Flow

XML to

JSON

Binary

Storage

Store

JSON

Store

XML

GridFS

Page 16: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Base Query Architecture

Copyright 2009, Information Builders. Slide 16

ESB

mongoDB

HTTP

Listener Flow

Query

DB

Binary

Storage

(Optional)

Get XML

GridFS

Requester

Page 17: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Loading ModificationExternal Storage

Copyright 2009, Information Builders. Slide 17

ESB

mongoDB

Listener Flow

XML to

JSON

File System

Store

JSON

Store

XML

Page 18: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Loading ModificationSAP Loading Architecture

Copyright 2009, Information Builders. Slide 18

ESB

mongoDB

Flow

XML to

JSON

Binary

Storage

Store

JSON

Store

XML

GridFS

RFC

Server

SAP

System

Store

IDOC

IDOC to

XML

Page 19: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Loading ModificationChange Data Capture Loading Architecture

Copyright 2009, Information Builders. Slide 19

ESB

mongoDB

Flow

XML to

JSON

Binary

Storage

Store

JSON

Store

XML

GridFS

CDC

Listener

RDBMS

Page 20: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Loading ModificationSalesforce.com Loading Architecture

Copyright 2009, Information Builders. Slide 20

ESB

mongoDB

Flow

XML to

JSON

Binary

Storage

Store

JSON

Store

XML

GridFS

SOAP

Listener

Salesforce

System

Page 21: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Loading ModificationFTP Loading Architecture

Copyright 2009, Information Builders. Slide 21

ESB

mongoDB

Flow

XML to

JSON

Binary

Storage

Store

JSON

Store

XML

GridFS

FTP

Server

File

System

Page 22: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Query ModificationWeb Service SOAP Query Architecture

Copyright 2009, Information Builders. Slide 22

ESB

mongoDB

SOAP

Listener Flow

Query

DB

Binary

Storage

(Optional)

Get XML/

IDOC

GridFS

Web

Service

Client

Page 23: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

The Test Client

Note: The archive is designed to be called from other

flows or programs.

A simple AJAX based human interface for querying the XML

Archive.

Provides examples of the HTTP query interface provided by

the base XML Archive.

Installed with the base implementation of the XML Archive.

Copyright 2009, Information Builders. Slide 23

Page 24: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Simple Example

Copyright 2009, Information Builders. Slide 24

Loaded this simple XML Doc:

Page 25: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Displaying the Document

Copyright 2009, Information Builders. Slide 25

XML Link:

JSON Link:

Page 26: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Basic Query

Copyright 2009, Information Builders. Slide 26

Return all documents who have the name attribute of

the <a> element equal to “bob”.

Page 27: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Advanced Queries

Copyright 2009, Information Builders. Slide 27

Support for:

And

Or

Regular Expressions

Ranges

Query handler is a wrapper around the mongoDB

query language.

Page 28: Complex Legacy System Archiving/Data Retention with MongoDB and Xquery

Basic XQUERY

Copyright 2009, Information Builders. Slide 28

Return only the <b> element from the document.

Formatted Result: