2017 mlw chi the unique approach and value of the ... · the unique approach and value of the...
TRANSCRIPT
13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Damon Feldman, PhD, Solutions Director, MarkLogic
The ODH Pattern, the Data Hub Framework and a Case Study
The Unique Approach and Value of the Operational Data Hub
SLIDE: 2 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 2
What is an Operational Data Hub?§ Operational
- Real-time - “Run the business”
§ Data- Data Services
§ Hub- Hub and Spoke data integration
- Centralized operations and access- Governance, Security
SLIDE: 3 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 3
Data Silos§ A group of different storage systems with different data
§ Typically good systems suited to one task or within one department
- Not suited to enterprise-wide tasks
§ Shows up as slow, painful or puzzling rate of development
- Reports
- Analyses
- New applications
- Cross-application functionality
SLIDE: 4 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 4
Data Silos – One Example
§ Yeah, but….- Disability status in one DB (mainframe)- Work programs in another (SQL Server)- Some notices already sent
- Finally – where should we put the new notices? Removal events?
§ Data silos show up as slow, painful or puzzling rate of development
“Who’s on food assistance, but can work, but isn’t? Meaning, they are not caring for children in the home, neither disabled nor homeless. They’re of working age, not pregnant.
Otherwise, check their work and training activities, notify them, or end benefits.
“ABAWDS”
VernonPreviously unable to find high-paying work
Supporting a young son
Retrained as an energy retrofit installer
Vernon said he plans to continue furthering his career and being an example for his son and those around him. “The sky is the limit. Life is beautiful. It’s on you to choose your outcome.”
SLIDE: 6 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 6
Silo 2
ABAWD Data Flow§ Data from multiple silos§ OLDP + Reporting/BI
§ Follows the Customer 360 pattern
Data Loader
Ingest Fuzzy Search
Operational Data hub
Summary List
Person Details
Type Ahead
Real-time Access
Silo 1
Middle Tier
REST
Silo 3
ReportingPerson
360 Entity
Work Activities
SQL Server CSV
Human Services
DB2 Mainframe
ExcelNotice History
.xlsx files
Interview, Review & Notify
ProgressD
emographics
Res
ult/S
tatu
s/N
otic
es
SLIDE: 7 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 7
ABAWD Application Screenxxxxxxx
abawd
SLIDE: 8 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 8
Operational Data Hub Pattern
MESSAGEBUS
RDBMS
CONTENTFEED
ING
EST
ANALYTICALAPPS
OPERATIONAL APPS
DOWNSTREAMSYSTEMS
SERV
E
STAGING(RAW DATA AS IS)
FINAL(HARMONIZED, INDEXED DATA)
SOURCE 1 DOCUMENTS
SOURCE 2 DOCUMENTS
SOURCE NDOCUMENTS
ENVELOPED DOCS (ENTITY 1)
ENVELOPED DOCS (ENTITY 2)
ENVELOPED DOCS (ENTITY N)
HA
RM
ON
IZE
INDEX, SEARCH, DISCOVERY, &
HARMONIZATION
INDEX, SEARCH,& SERVICES
f(x)
SLIDE: 9 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 9
Operational Data Hub Pattern – Not a One Way Flow
MESSAGEBUS
RDBMS
CONTENTFEED
ING
EST
ANALYTICALAPPS
OPERATIONAL APPS
DOWNSTREAMSYSTEMS
SERV
E
STAGING(RAW DATA AS IS)
FINAL(HARMONIZED, INDEXED DATA)
SOURCE 1 DOCUMENTS
SOURCE 2 DOCUMENTS
SOURCE NDOCUMENTS
ENVELOPED DOCS (ENTITY 1)
ENVELOPED DOCS (ENTITY 2)
ENVELOPED DOCS (ENTITY N)
HA
RM
ON
IZE
INDEX, SEARCH, DISCOVERY, &
HARMONIZATION
INDEX, SEARCH,& SERVICES
f(x)
SLIDE: 10 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 10
ABAWD Processing - Operational Data Hub Pattern
MESSAGEBUS
RDBMS
CONTENTFEED
ING
EST
ANALYTICALAPPS
OPERATIONAL APPS
DOWNSTREAMSYSTEMS
SERV
E
STAGING(RAW DATA AS IS)
FINAL(HARMONIZED, INDEXED DATA)
SOURCE 1 DOCUMENTS
SOURCE 2 DOCUMENTS
SOURCE NDOCUMENTS
ENVELOPED DOCS (ENTITY 1)
ENVELOPED DOCS (ENTITY 2)
ENVELOPED DOCS (ENTITY N)
HA
RM
ON
IZE
INDEX, SEARCH, DISCOVERY, &
HARMONIZATION
INDEX, SEARCH,& SERVICES
f(x)
Ingest raw data as-isAll formats from all systems
Work Activities
Human Services
ExcelNotice History
SLIDE: 11 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 11
ABAWD Processing - Operational Data Hub Pattern
MESSAGEBUS
RDBMS
CONTENTFEED
ING
EST
ANALYTICALAPPS
OPERATIONAL APPS
DOWNSTREAMSYSTEMS
SERV
E
STAGING(RAW DATA AS IS)
FINAL(HARMONIZED, INDEXED DATA)
SOURCE 1 DOCUMENTS
SOURCE 2 DOCUMENTS
SOURCE NDOCUMENTS
ENVELOPED DOCS (ENTITY 1)
ENVELOPED DOCS (ENTITY 2)
ENVELOPED DOCS (ENTITY N)
HA
RM
ON
IZE
INDEX, SEARCH, DISCOVERY, &
HARMONIZATION
INDEX, SEARCH,& SERVICES
f(x)
Immediately use raw dataFast joins and lookupsMetadata with the dataSecurity from day 1
SLIDE: 12 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 12
ABAWD Processing - Operational Data Hub Pattern
MESSAGEBUS
RDBMS
CONTENTFEED
ING
EST
ANALYTICALAPPS
OPERATIONAL APPS
DOWNSTREAMSYSTEMS
SERV
E
STAGING(RAW DATA AS IS)
FINAL(HARMONIZED, INDEXED DATA)
SOURCE 1 DOCUMENTS
SOURCE 2 DOCUMENTS
SOURCE NDOCUMENTS
ENVELOPED DOCS (ENTITY 1)
ENVELOPED DOCS (ENTITY 2)
ENVELOPED DOCS (ENTITY N)
HA
RM
ON
IZE
INDEX, SEARCH, DISCOVERY, &
HARMONIZATION
INDEX, SEARCH,& SERVICES
f(x)
Move toward entitiesCentralize transformsMPP scale
Person 360 Entity
SLIDE: 13 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 13
ABAWD Processing - Operational Data Hub Pattern
MESSAGEBUS
RDBMS
CONTENTFEED
ING
EST
ANALYTICALAPPS
OPERATIONAL APPS
DOWNSTREAMSYSTEMS
SERV
E
STAGING(RAW DATA AS IS)
FINAL(HARMONIZED, INDEXED DATA)
SOURCE 1 DOCUMENTS
SOURCE 2 DOCUMENTS
SOURCE NDOCUMENTS
ENVELOPED DOCS (ENTITY 1)
ENVELOPED DOCS (ENTITY 2)
ENVELOPED DOCS (ENTITY N)
HA
RM
ON
IZE
INDEX, SEARCH, DISCOVERY, &
HARMONIZATION
INDEX, SEARCH,& SERVICES
f(x)
Serve data to any consumerIn many formats / “lenses”MPP scaleExports
Interview, Review & Notify
ProgressDemographics
SLIDE: 14 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 14
Key Aspects of the ODH Pattern
MESSAGEBUS
RDBMS
CONTENTFEED
ING
EST
ANALYTICALAPPS
OPERATIONAL APPS
DOWNSTREAMSYSTEMS
SERV
E
STAGING(RAW DATA AS IS)
FINAL(HARMONIZED, INDEXED DATA)
SOURCE 1 DOCUMENTS
SOURCE 2 DOCUMENTS
SOURCE NDOCUMENTS
ENVELOPED DOCS (ENTITY 1)
ENVELOPED DOCS (ENTITY 2)
ENVELOPED DOCS (ENTITY N)
HA
RM
ON
IZE
INDEX, SEARCH, DISCOVERY, &
HARMONIZATION
INDEX, SEARCH,& SERVICES
f(x)
• Transactional + analytic• Streaming + batch• Agile model / no up-front model
SLIDE: 15 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 15
Deeper Dive : Operational Data Hub Pattern§ Movement
§ Harmonization
§ Indexing
§ Security
§ Agility
§ Multi-model
§ Data Lenses
§ Combined Operational and Analytic workloads
§ Governance
SLIDE: 16 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 16
Deeper Dive Into The Operational Hub Pattern§ Movement§ Harmonization§ Indexing§ Security§ Agility§ Multi-model
§ Data Lenses
§ Combined Operational and Analytic workloads
§ Governance
Key Differentiating Factors
SLIDE: 17 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 17
Demystify Some Terms
Approach Movement Harmonization Indexing Security Agility
Data Lake Y Possible N N Some
Federation/Virtual DB N Y Delegated Delegated NO
Data Warehouse Y Y Y Some NO
Data Hub Y Y Star/Cols Maybe Y
Operational Data Hub Y Y Y Y Y
SLIDE: 18 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 18
Agility vs a Data Warehouse
Approach Movement Harmonization Indexing Security Agility
Data Lake Y Possible N N Some
Federation/Virtual DB N Y Delegated Delegated NO
Data Warehouse Y Y Y Some NO
Data Hub Y Y N Maybe Y
Operational Data Hub Y Y Y Y Y
§ Failed migration mainframe è RDBMS§ MarkLogic succeeded – in 6 months§ Keys to success
§ A little bit of canonical data - 30%§ A lot of raw data – 70%§ Powerful search + query on all data
Government Mainframe Migration
SLIDE: 19 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.Congratulations!
SLIDE: 20 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 20
Security vs a Data Lake
Approach Movement Harmonization Indexing Security Agility
Data Lake Y Possible N N Some
Federation/Virtual DB N Y Delegated Delegated NO
Data Warehouse Y Y Y Some NO
Data Hub Y Y N Maybe Y
Operational Data Hub Y Y Y Y Y
§ Cannot expose PII to the public§ Yet need all data for the Government
§ Used Flexible Replication + Redaction§ Could also (now) use Element-Level Security
§ Takeaway: Integration requires Security
Fairfax County Land Use (LDIP)
SLIDE: 21 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 21
Agility vs. Federation
Approach Movement Harmonization Indexing Security Agility
Data Lake Y Possible N N Some
Federation/Virtual DB N Y Delegated Delegated NO
Data Warehouse Y Y Y Some NO
Data Hub Y Y N Maybe Y
Operational Data Hub Y Y Y Y Y§ Message traffic from dozens of sources§ Federation using MarkLogic for data storage§ All messages are tracked, de-bulked, de-identified
§ Schema agnostic§ Envelope with standard header
HealthCare.gov Data Services
SLIDE: 22 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 22
Demystify Some Terms
Approach Movement Harmonization Indexing Security Agility
Data Lake Y Possible N N Some
Federation/Virtual DB N Y Delegated Delegated NO
Data Warehouse Y Y Y Some NO
Data Hub Y Y N Maybe Y
Operational Data Hub Y Y Y Y Y
SLIDE: 23 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 23
Deeper Dive Into The Operational Hub Pattern§ Movement
§ Harmonization
§ Indexing
§ Security
§ Agility
§ Multi-model§ Data Lenses
§ Combined Operational and Analytic workloads
§ Governance
SLIDE: 24 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 24
Multi-Model§ Data Hubs are about Integrating data “as-is”
- Documents- RDF (graph, triples, ontologies)- Relational - Text- Geospatial- Binary
§ First-class support- Index & Query- Standard languages for query and transform- Parse, validate. stream
BOLO List DB Suspected
Terrorists
Whitehall Knife Suspect
SLIDE: 25 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 25
Deeper Dive Into The Operational Hub Pattern§ Movement
§ Harmonization
§ Indexing
§ Security
§ Agility
§ Multi-model
§ Data Lenses§ Combined Operational and Analytic workloads
§ Governance
SLIDE: 26 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 26
Data Lenses§ Data Lenses are about getting
data out “as-needed”
- Store in one format- See data in four ways
§ Declarative (TDE)§ Transform / schema on read§ Transactional
§ Secure§ Everything works with everything
SEARCH SQL SPARQL
LENSES
DOCUMENTS(JSON OR XML)
SLIDE: 27 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 27
Deeper Dive Into The Operational Hub Pattern§ Movement
§ Harmonization
§ Indexing
§ Security
§ Agility
§ Multi-model
§ Data Lenses
§ Combined Operational and Analytic workloads§ Governance
SLIDE: 28 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 28
Data Modeling Cost: OLTP + Analytic
§ Operational vs. Analytical
§ Most people accept this cost, but why?
Work Activity
Disability & Human Services
Notice History
MDM
Data Marts
ABAWD Progress
Case Worker Monitoring
Operational“Run the Business”
Analytics“Observe the Business”
ING
ESTSTAGING
(RAW DATA AS IS) FINAL
(HARMONIZED, INDEXED DATA)
SOURCE 1 DOCUMENTS
SOURCE 2 DOCUMENTS
SOURCE NDOCUMENTS
ENVELOPED DOCS (ENTITY 1)
ENVELOPED DOCS (ENTITY 2)
ENVELOPED DOCS (ENTITY N)
HA
RM
ON
IZE
INDEX, SEARCH, DISCOVERY, &
HARMONIZATION
INDEX, SEARCH,& SERVICES
f(x)
SERV
E
SLIDE: 29 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 29
ABAWD: OLTP + Analytic
ING
ESTSTAGING
(RAW DATA AS IS) FINAL
(HARMONIZED, INDEXED DATA)
SOURCE 1 DOCUMENTS
SOURCE 2 DOCUMENTS
SOURCE NDOCUMENTS
ENVELOPED DOCS (ENTITY 1)
ENVELOPED DOCS (ENTITY 2)
ENVELOPED DOCS (ENTITY N)
HA
RM
ON
IZE
INDEX, SEARCH, DISCOVERY, &
HARMONIZATION
INDEX, SEARCH,& SERVICES
f(x)
Work Activity
Notice History
Disability & Human Services
SQL Reporting & BI
Transactional ABWDProcessing App
Person Relationships
Export to Federal
Transactional (OLTP)
Person Finder / Agile Mastering
SERV
ERDF SPARQL
SQL
CSV Transform
Document Access
SLIDE: 30 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 30
Deeper Dive Into The Operational Hub Pattern§ Movement
§ Harmonization
§ Indexing
§ Security
§ Agility
§ Multi-model
§ Data Lenses
§ Combined Operational and Analytic workloads
§ Security (again)§ Governance
SLIDE: 31 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 31
Security§ Integration should improve security (not harm)
§ Need security to share data
§ Default method – hard shell security
§ Leads to scattered security enforcement
Child Welfare
Child Support
Benefits
Case Review
Case workers
DBAs
SLIDE: 32 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 32
Security
§ Centralized Security- Internal and External threats
- Governable
- Shared fields, records, roles, configuration
Child Welfare
Child Support
Benefits
Case Review
Case workers
DBAs
Centralized Security
SLIDE: 33 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 33
Deeper Dive Into The Operational Hub Pattern§ Movement
§ Harmonization
§ Indexing
§ Security
§ Agility
§ Multi-model
§ Data Lenses
§ Combined Operational and Analytic workloads
§ Security (again)
§ Governance
SLIDE: 34 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 34
Governance§ Centralize and control
- Security
- Transforms
- Data quality
- Auditing
- Data models
§ Richer data = more governance
- Data and Metadata together
- Bi-temporal and temporal tracking
§ Archiving, Lineage, Tracking
SLIDE: 35 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 35
Data Warehouse Governance
Approach Movement Harmonization Indexing Security Agility
Data Lake Y Possible N N Some
Federation/Virtual DB N Y Delegated Delegated NO
Data Warehouse Y Y Y Some NO
Data Hub Y Y N Maybe Y
Operational Data Hub Y Y Y Y Y…. Governance/ HealthCare.gov – lots of data cleanups. MarkLogic tracks “audit” information to provide history and lineage for all changed data. The old and new data is always tracked – with zero modeling.
§ Failed Migration from a mainframe to a relational Oracle System§ CSV files existing for about a dozen data sources§ MarkLogic implementation – 6 months§ Keys to success
§ Standardized, canonical ”header” in an envelope§ Powerful search + query on all data§ Generic display of 70% of the data + Specific display of 30%
Government Mainframe Migration
SLIDE: 36 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 36
The Big Picture§ Operational Data Hub pattern is an Enterprise pattern
- Above single-system architecture level
- Initial focus on integration of systems, not specific systems
§ Move Data Concerns to the Data Layer
- Secure, govern, track history, check quality, track changes
- Transform, join, project via lenses
§ Combine operational and analytic functions
§ Write back and OLTP storage
- For cross-line-of-business data
SLIDE: 37 13 June 2017© COPYRIGHT MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Questions and Discussion