operational data vault
DESCRIPTION
I gave this presentation at the Advanced Architecture Conference, Bill Inmon, 2011 in Evergreen, Colorado. This presentation covers a new breed of data warehousing called Operational Data Warehousing. These are the next steps in business intelligence towards self-service BI and enabling users to do more with their enterprise data warehouse solution. Specifically, it talks about how the Data Vault model fits in to this picture. If you would like to use the slides, please e-mail me first, I'd be happy to discuss it with you.TRANSCRIPT
1
Data Vault:What’s Next?
© Dan Linstedt, 2011-2012 all rights reserved
2
Agenda• Introduction – why are you here?• Short Data Vault Review• What’s Next? Advanced Architecture…• Defining Operational Data Warehousing• Why is Data Vault a Good Fit?• <BREAK>• Fundamental Paradigm Shift• Business Keys & Business Processes• Technical Review• Query Performance (PIT & Bridge)• What wasn’t covered in this presentation…
3
A bit about me…• Author, Inventor, Speaker – and part
time photographer…• 25+ years in the IT industry• Worked in DoD, US Gov’t, Fortune 50,
and so on…
• Find out more about the Data Vault:o http://YouTube.com/LearnDataVaulto http://LearnDataVault.com
• Slides available:o http://SlideShare.neto Search: “Advanced Architecture Data Vault”
• Full profile on http://www.LinkedIn.com/dlinstedt
4
Why Are You Here?• Your Expectations?• Your Questions?• Your Background?• Areas of Interest?
• Biggest question:
What are the top 3 pains your current EDW / BI solution is experiencing?
5
Short Data Vault ReviewWhat is it and where did it come from?
Data Warehousing Timeline
20001960 1970 1980 1990
E.F. Codd invented relational modeling
Chris Date and Hugh Darwen Maintained and Refined Modeling
1976 Dr Peter ChenCreated E-R Diagramming
Early 70’s Bill Inmon Began Discussing Data Warehousing
Mid 60’s Dimension & Fact Modeling presented by General Mills and Dartmouth University
Mid 70’s AC Nielsen PopularizedDimension & Fact Terms
Mid – Late 80’s Dr Kimball Popularizes Star Schema
Mid 80’s Bill InmonPopularizes Data Warehousing
Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse”
1990 – Dan Linstedt Begins R&D on Data Vault Modeling
2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling
2010
2010- DVAlive and WellAround theWorld
7
Data Vault Modeling…
Took 10 years of Research and Design, including TESTING
to become flexible, consistent, and
scalable
8
What IS a Data Vault? (Business
Definition)
• Data Vault Modelo Detail orientedo Historical traceabilityo Uniquely linked set of
normalized tableso Supports one or more
functional areas of business
ProcurementSales DeliveryContracts
FinancePlanning
Operations
Business KeysSpan / CrossLines of Business
Functional Area
• Data Vault Methodology– CMMI, Project Plan– Risk, Governance, Versioning– Peer Reviews, Release Cycles– Repeatable, Consistent,
Optimized– Complete with Best Practices
for BI/DW
9
Supply Chain Analogy
Data Vault(EDW)
Source Systems
Data Marts
10
What Does One Look Like?
Customer
Sat
Sat
Sat
F(x)
Customer
Product
Sat
Sat
Sat
F(x)
Product
Order
Sat
Sat
Sat
F(x)
Order
Elements:•Hub•Link•Satellite
Link
F(x)
Sat
Records a history of the interaction
Hub = List of Unique Business KeysLink = List of Relationships, AssociationsSatellites = Descriptive Data
HUB
LINK
Satellite
Satellite
Colorized Perspective…Data Vault
Details
Business Keys
Associations
The Data Vault uniquely separates the Business Keys (Hubs) from the Associations (Links) and both of these from the Details that describe them and provide context (Satellites).
3rd NF & Star Schema
11
(separation)
(Colors Concept Originated By: Hans Hultgren)
12
A Quick Look at Methodology IssuesBusiness Rule Processing, Lack of Agility, and
Future proofing your new solution
13
EDW Architecture: Generation 1
• Quality routines• Cross-system dependencies• Source data filtering• In-process data manipulation
• High risk of incorrect data aggregation• Larger system = increased impact• Often re-engineered at the SOURCE• History can be destroyed (completely re-computed)
Sales
Finance
Contracts
Staging(EDW)
StarSchemas
Enterprise BI Solution
(batch)
Conformed DimensionsJunk Tables
Helper TablesFactless Facts
ComplexBusiness
Rules+Dependencies
Complex Business Rules #2
Staging + History
14
#1 Cause of BI Initiative Failure
Re-EngineeringFor
Every Change!
Anyone?
Let’s take a look at one example…
15
Re-Engineering
Customer
CustomerTransactions
Sales
Finance
Current Sources
Source
Join
BusinessRules
Data Flow (Mapping)
CustomerPurchases
** NEW SYSTEM**
IMPACT!!
16
Federated Star Schema Inhibiting
Agility
Time
Effort& Cost
High
Low
Start MaintenanceCycle Begins
Changing and Adjusting conformed dimensions causes an exponential rise in the cost curve over time
RESULT: Business builds their own Data Marts!
Data Mart 1
Data Mart 2
Data Mart 3
The main driver for this is the maintenance costs, and re-engineering of the existing system which occurs for each new “federated/conformed” effort. This increases delivery time, difficulty, and maintenance costs.
17
EDW Architecture: Generation 2
Sales
Finance
Contracts
Staging EDW(Data Vault)
StarSchemas
ErrorMarts
ReportCollections
Enterprise BI SolutionSOA
(real-time)
(batch)
(batch)
ComplexBusiness
Rules
The business rules are moved closer to the business, improving IT reaction time, reducing cost and minimizing
impacts to the enterprise data warehouse (EDW)
• Repeatable• Consistent• Fault-tolerant• Supports phased release
• Scalable• Auditable
FUNDAMENTAL GOALS
Unstructured
Data
18
NO Re-Engineering
Customer
CustomerTransactions
Sales
Finance
Current Sources
StageCopy
StageCopy
HubCustome
r
HubAcct
HubProduc
t
Link Transacti
on
Data Vault
CustomerPurchases
** NEW SYSTEM**
StageCopy
IMPACT!!
NO IMPACT!!!NO RE-ENGINEERING!
19
Progressive Agility and Responsiveness of
IT
Time
Effort& Cost
High
Low
Start MaintenanceCycle Begins
Foundational Base Built
New Functional Areas AddedInitial DV Build Out
Re-Engineering does NOT occur with a Data Vault Model. This keeps costs down, and maintenance easy. It also reduces complexity of the existing architecture.
20
Why is Data Vault a Good Fit?
21
What are the top business
obstacles in your data warehouse
today?
22
Poor Agility
Inconsistent Answer Sets
Needs Accountability
Demands Auditability
Desires IT Transparency
Are you feeling Pinned Down?
23
What are the top technology
obstacles in yourdata warehouse
today?
24
Complex Systems
Real-Time Data Arrival
Unimaginable Data Growth
Master Data Alignment
Bad Data Quality
Late Delivery/Over Budget
Are your systems CRUMBLING?
25
Have lead you down a
painful path…
Yugo
Worlds Worst Car
Existing Solutions
26
Projects Cancelled & Restarted
Re-engineering required to absorb new systemsComplexity drives
maintenance cost Sky highDisparate Silo Solutions
provide inaccurate answers!Severe lack of
Accountability
27
There must be a better way…
There IS a better way!
How can you overcome
these obstacles?
28
It’s Called the
Data Vault Model
and Methodology
29
What is it?
It’s a simpleEasy-to-use
PlanTo build your
valuableData Warehouse!
30
Uncomplicated Design
Simple Build-out
Rapid Adaptability
Understandable Standards
Effortless Scalability
Painless Auditability
Pursue Your Goals!
What’s the Value?
31
Why Bother With Something New?
Old Chinese proverb: 'Unless you change direction, you're apt to end up where you're headed.'
32
What Are the Issues?
This is NOT what you want happening to your project!
THE GAP!!
33
What Are the Foundational Keys?
Flexibility
Scalability
Productivity
34
Key: Flexibility
Enabling rapid change on a massive scale without downstream impacts!
35
Key: Scalability
Providing no foreseeable barrier to increased size and scope
People, Process, & Architecture!
36
Key: Productivity
Enabling low complexity systems with high value output at a rapid
pace
37
How does it work?Bringing the Data Vault to Your Project
38
Key: Flexibility
Adding new components to the EDW has NEAR ZERO impact to:• Existing Loading Processes• Existing Data Model• Existing Reporting & BI Functions• Existing Source Systems• Existing Star Schemas and Data Marts
No Re-
Engineeri
ng!
39
Case In Point:
Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA!
40
Key: Scalability in Architecture
Scaling is easy, its based on the following principles• Hub and spoke design• MPP Shared-Nothing Architecture• Scale Free Networks
41
Case In Point:
Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today!
42
Key: Scalability in Team Size
You should be able to SCALE your TEAM as well!With the Data Vault methodology, you can:
Scale your team when desired, at different points in the project!
43
Case In Point:(Dutch Tax Authority)
Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault
44
Key: Productivity
Increasing Productivity requires a reduction in complexity.The Data Vault Model simplifies all of the following:• ETL Loading Routines• Real-Time Ingestion of Data• Data Modeling for the EDW• Enhancing and Adapting for Change to the Model• Ease of Monitoring, managing and optimizing
processes
45
Case in Point:Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports.
These individuals generated:• 90% of the ETL code for moving the data
set• 100% of the Staging Data Model• 75% of the finished EDW data Model• 75% of the star schema data model
46
The Competing Bid?
The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system)
Our total cost? $30k and 2 weeks!
47
Results?
Changing the direction of the river takes less effort than stopping the flow
of water
48
< BREAK TIME >
49
What’s Next?A look at what’s around the corner for Data Warehousing and Business Intelligence, believe me, it’s going to get
interesting fast.
50
Operational Data VaultData Co-Location:• Transactions & Transaction History• Master Data & Master Data History• Metadata & Metadata History• External Data & External Data
History• Business Rules & Business Rule
History• Security / Access data & History• Unstructured Data Ties & History• Real-time Data Feeds DIRECTLY in
to the data store
Operational Applications ON TOP of the warehouse!
51
Extreme Automation!Automated Creation of Data Models:• Staging Models• Data Vault Models• Star Schema Models• Cube Models• Excel Models (spreadsheets)• Data Mining Models (table structures)
Automated Creation of ETL Processes:• Staging Loads• Data Vault (Data Warehouse Loads)• Star Schema Loads (80% solutions)• Cube Loads (80% solutions)• Excel Loads / Queries (80% solutions)• Data Mining Queries (80% solutions)
Other Automated Components:• Initial Metadata Population• Initial Master Data Population• Generated Testing Scripts
http://www.jmorganmarketing.com/should-social-crm-be-automated/
52
Results of all of this?EDW Will:• become BACK OFFICE!!• become SELF-RELIANT /
SELF-HEALING• adapt to new structures,
new hardware, and new data
• automatically backup and remove old data
Self-Reliance
http://images.businessweek.com/ss/06/10/bestunder25/source/1.htm
53
How Long Will it Take?My milestone predictions:• 1 yr: Operational Data Vault• 2 yrs: Beginning
automation of business rules
• 3 yrs: Beginning dynamic restructuring in the DV
• 4 yrs: Oper Apps contain BI & metadata & Master data GUI’s in a single place
• 5 yrs: the “all-in-one” appliance, containing 75% of what we need at the firmware levels to do all these things
http://thypolarlife.wordpress.com/2011/08/02/this-moment-in-time/
54
Why Should I Care?
• Because the Data Warehouse combined with the operational applications on top, make for a self-service BI environment
• Because this technology is the heart of Data Warehousing!
• Because the future is now• Because it will happen with or
without you… You do want a job right?
What About Tooling?
55
Auto-
matio
nOntolog
y
Cross-Referenc
e
ConfigTemplate
s
Source DDL
Target DDL
New Models
ETL Code
Documentation
Test Data
SQL Code
DataPatterns
56
Who’s Tooling Today?
WhereScapeQuipu
RapidACE
BI-ReadyCentennium
AnalytixDS
Nexus
57
What Does It Add Up To?
58
What’s the Key Ingredient?
59
Defining Operational Data WarehousingWhat is an ODW and How did we get here?
60
What IS An Operational DW?
• A raw, time-variant, integrated, non-volatile data warehouse, on top of which sits an operational application – “editing and changing data”.
• However, instead of updates and deletes in place, the data is “marked” deleted, and updates are turned in to Inserts, creating a delta audit trail along the way.
• Yes, it’s an operational application on top of the integrated data warehouse (or in this case, Data Vault model).
61
Mid 90’s “Active” DWBecomes ImportantBut has to wait for TechnologyTo Catch Up!
Oper/Active DW Timeline
20001980 1990
Data WarehousesSplit From OperationalSystems
2010
Real-Time & Oper BIMake the Scene(Users Want DirectControl & Up to the Minute Data)Teradata
makesReal advances in Active DW“Appliances” begin appearingOn-scene
2002 - Cendant-TRGCreates Worlds FirstOperational Data Vault
62
How Did We Get Here?
Parts are © Teradata – Stephen Brobst, CTO
7
DDW
Dynamic Alterations
To StructureSystem Of
Record
How do you dynamically adapt
to business?
15432
Event Based
Triggering TakesHold
PrimarilyBatch
Increase inAd-HocQueries
AnalyticalModeling
Grows
Continuous Update &
Time Sensitive QueriesBecome
Important
ActivateOperationalizePredictAnalyzeReportWhat
Happened?WHY did
it happen?
What WILL
Happen?
What IS happening?
What do you
WANT to Happen?
6
ODW
Application Direct Edits to
Data in the EDW
Can you change what is
happening?
63
ODV Overview
ODVDirect
Inserts
NO
STAGING
AREA
Web-Services(Direct Feeds)
Applications(Direct edits)
Virtual Marts(Direct Access)
Metadata Rules(Direct Edits)Batch Loads
(Direct Feeds)
Unstructured Feeds(Indirect Feeds)
64
What is the architecture?
Data Vault EDW• Stored• Analyzed / Scored
Virtual Marts
Real-TimeMiningEngine
Staging Area
Non-S.O.R.Historical Batch Data
SORReal-Time Data
Real-TimeCollector
Web Interface (usually)
OperationalSystems
OperationalAlerts
StrategicReports& OLAP
Operational Systems
UnstructuredSemi-Structured
Non-SORBatch Data
OperationalApplicationsMaster Data
OperationalMetadata
Management
Direct Edits
Direct Edits
• Flexible• Accountable• Compliant • Scalable
• Normalized• Dynamic• Granular• Historic• Integrated by business
key
MasterData
65
What must an ODW have?
• Operational Application(s) on-top of the single data store
• All the up-time and maintenance requirements of a standard operational application (24x7x365, 6 9’s reliance, etc…)
• Inflow and outflow of information; bi-directional data flow to & from the service bus (SOA/ESB, etc..)
• Capacity to incorporate and store existing batch loads and accept real-time data from other feeds
• Ability to interface with unstructured data sets
• All the inherent design necessities of an EDW
66
Why should I care?TWO REASONS:
67
Under the Covers…
Hub Seller
Hub Product
Link
Satellite
Sat 1
Sat 2
Sat 3
Sat 4
Hub Parts
Link
Satellite
Application
Data AccessControl Layer
OperationalData Vault
(ODW) Layer
1. Read Data for Edit
2. Lock Business Key Rows
3. Present in GUI
4. Accept Ins, Upd, Del
5. Perform Insert / Status change
6. Release Lock On Business Key Rows
Presents Data to User in Conformed Screens
68
Dropping by the Way-Side
• No…o ETLo BATCH DRIVEN PROCESSINGo “Synchronization” with the Source Systemo missing source data
o No scalability problemso No ODS needed!o No “Master Data” system neededo No Staging area needed
69
Positives• Data in the ODW can be governed• Audit trail built in• Delta’s only are stored• NEW applications can be created to
“automatically” generate Cubes/Star Schemas – these apps can be run by the users…
• Self-Service BI is enabled!• Master data can be “marked, scored,
stored” in the same place as the EDW
70
Old Components Still There?
• Staging areas will exist as long as there is external data to load and integrate
• ODS areas may still exist as long as there are other legacy applications existing as source systems
• Master Data areas may still exist as long as the logic is not built directly in to the “operational DW application”
71
Secure ODV Technical Layers
Common Data Object Area
Database Interface
Local DB Interface
Global DB Interface
Persistence Cache DB InterfaceSecurity Interface(Encryption Too)
Logging Interface
Scheduling Interface
Notification Interface
File Management
Interface
Inbound APIVisible Objects Outbound API
Authentication API
Pedigree API
Aggregation API
Security Key MgrAPI
Kit API
Transaction API
Packaging API
Master Data API
Busn. Intelligence API
Services
Format Interface
Web Server Locally BasedPersistent DB Cache for
Joining
Web Server Locally BasedPersistent DB Cache for
JoiningGlobal DB Local DB1 Local DB2
ComponentGroups
Vault Accessibility Subject Area API
72
What are the benefits?• Simplified Architecture• Single Copy of the data!• No “intermediate” IT work to do• Users become empowered, with direct access to
data sets• Of course, using the Data Vault model, you gain
ALL the benefits of the Data Vault (Scalability, flexibility, etc…)
• NOTE: Two or more “users” can actually EDIT different parts of the same record at the same time!
• Integrating external data basically makes it all available to the application immediately!
• NO NEED TO BUILD A SEPARATE EDW!!
73
What are the drawbacks?• No current “application” is using the Data Vault
for operational data• In other words, off-the-shelf apps in this area do
not yet exist – you have to “build it” yourself• Self-Service BI application technology is nascent
or non-existent today• Master Data & Metadata Applications are not
currently available on top of Data Vault
74
Technical ReviewHub, Link, Satellite - Definitions
75
HUB Data Examples
SQN CUST_ACCT LOAD_DTS RECORD_SRC1 ABC123 10-14-2000 SALES2 ABC-123 10-14-2000 SALES3 *ABC-123 10-14-2000 FINANCE4 123,ABCD 10-15-2000 CONTRACTS5 PEF-2956 10-16-2000 CONTRACTS
HUB_CUST_ACCTSQN PART_NUM LOAD_DTS RECORD_SRC1 MFG-25862 10-14-2000 MANUFACT2 MFG*25266 10-14-2000 MANUFACT3 *P25862 10-14-2000 PLANNING4 MFG_25862 10-15-2000 DELIVERY5 CN*25266 10-16-2000 DELIVERY
HUB_PART_NUMBER
SEQUENCE<BUSINESS KEY>{LAST SEEN DATE}<LOAD DATE><RECORD SOURCE>
Hub Structure
} Unique Index} Optional
76
Link Structures
LPS_SQNPRODUCT_SQNSUPPLIER_SQNLPS_LOAD_DTSLPS_REC_SOURCELPS_ENCR_KEY
Link_Product_Supplier Link_Customer_Account_Employee
LCAE_SQNCUSTOMER_SQNACCOUNT_SQNEMPLOYEE_SQNLCAE_LOAD_DTSLCAE_REC_SOURCE
UniqueIndex
SEQUENCE<HUB KEY SQN 1><HUB KEY SQN 2><HUB KEY SQN N>{LAST SEEN DATE}{CONFIDENCE}{STRENGTH}<LOAD DATE><RECORD SOURCE>
Link Structure
Unique Index
} Optional
Dynamic Link
77
Satellites Split By Source System
PARENT SEQUENCELOAD DATE<LOAD-END-DATE><RECORD-SOURCE>NamePhone NumberBest time of day to reachDo Not Call Flag
SAT_SALES_CUSTPARENT SEQUENCELOAD DATE<LOAD-END-DATE><RECORD-SOURCE>First NameLast NameGuardian Full NameCo-Signer Full NamePhone NumberAddressCityState/ProvinceZip Code
SAT_FINANCE_CUST
PARENT SEQUENCELOAD DATE<LOAD-END-DATE><RECORD-SOURCE>Contact NameContact EmailContact Phone Number
SAT_CONTRACTS_CUST
PARENT SEQUENCELOAD DATE<LOAD-END-DATE><RECORD-SOURCE>{user defined descriptive data}{or temporal based timelines}
Satellite StructurePrimaryKey
78
Why do we build Links this way?
History Teaches Us…If we model for ONE relationship in the EDW, we BREAK the
others!
79
Portfolio
Customer
M
M
5 yearsFrom now X
Portfolio
Customer
M
1
10 Years ago X
Portfolio
Customer
1
MToday:
Hub Portfolio
Hub Customer
1
M
The EDW is designed to handle TODAY’S relationship, as soon as history is loaded, it breaks the model!
This situation forces re-engineering of the model, load routines, and queries!
80
History Teaches Us…If we model with a LINK table, we can handle ALL the
requirements!
Portfolio
Customer
M
M
5 years from now
Portfolio
Customer
1
MToday:
Portfolio
Customer
M
1
10 Years ago This design is flexible, handles past, present, and future relationship changes with NO RE-ENGINEERING!
Hub Portfolio
Hub Customer
1
M
LNKCust-Port
M
1
Base EDW Created in CorporateFinancials in USA
HubHub
SatSatSatSat
HubHub
SatSatSatSat
LinkLink
SatSatSatSat
Applying the Data Vault to Global
DW2.0
HubHub
SatSatSatSatLinkLink
Manufacturing EDW in China
HubHub
SatSatSatSat
Planning in Brazil
LinkLink
HubHub
SatSatSatSatLinkLink
81
82
Hub Customer Hub OrderLnk Cust-Order
Sat Customer Sat Order Sat Order
DASD – Raid 0+1
Each table receives it’s own I/O channel, and it’s own Raid 0+1 Disk
DASD – Raid 0+1DASD – Raid 0+1
DASD – Raid 0+1 DASD – Raid 0+1 DASD – Raid 0+1
Extreme Data Vault Partitioning
83
Query PerformancePoint-in-time and Bridge Tables, overcoming query issues
84
Purpose Of PIT & Bridge• To reduce the number of joins, and to reduce the
amount of data being queried for a given range of time.
• These two together, allow “direct table match”, as well as table elimination in the queries to occur.
• These tables are not necessary for the entire model; only when:o Massive amounts of data are foundo Large numbers of Satellites surround a Hub or Linko Large query across multiple Hubs & Links is necessaryo Real-time-data is flowing in, uninterrupted
• What are they?o Snapshot tables – Specifically built for query speed
85
PIT Table Architecture
Hub Custome
r
HubOrder
Hub Product
Link Line Item
SatelliteLine Item
Sat 1
Sat 2
Sat 3
Sat 4
PIT Sat
Sat 1
Sat 2
Sat 3
Sat 4
PIT Sat
Sat 1
Sat 2
PARENT SEQUENCELOAD DATE{Satellite 1 Load Date}{Satellite 2 Load Date}{Satellite 3 Load Date}{…}{Satellite N Load Date}
Satellite: Point In Time
PrimaryKey
86
PIT Table Example
SQN LOAD_DTS NAME1 10-14-2000 Dan L1 11-01-2000 Dan Linedt1 12-31-2000 Dan Linstedt
SAT_CUST_CONTACT_NAMESQN LOAD_DTS CELL1 10-14-2000 999-555-12121 10-15-2000 999-111-12341 10-16-2000 999-252-28341 10-17-2000 999.257-28371 10-18-2000 999-273-5555
SAT_CUST_CONTACT_CELLSQN LOAD_DTS ADDR1 08-01-200026 Prospect1 09-29-2000 26 Prosp St.1 12-17-2000 28 November1 01-01-2001 26 Prospect St
SAT_CUST_CONTACT_ADDR
SQN LOAD_DTS SAT_NAME_LDTS SAT_CELL_LDTSSAT_ADDR_LDTS1 08-01-2000 NULL NULL 08-01-20001 09-01-2000 NULL NULL 08-01-20001 10-01-2000 NULL NULL 09-29-20001 11-01-2000 11-01-2000 10-18-2000 09-29-20001 12-01-2000 11-01-2000 10-18-2000 09-29-20001 01-01-2001 12-31-2000 10-18-2000 01-01-2001
Snapshot Date
87
BridgeTable Architecture
Hub Seller
Hub Product
Link
Satellite
Sat 1
Sat 2
Sat 3
Sat 4
Bridge
Hub Parts
Link
Satellite
UNIQUE SEQUENCELOAD DATE{Hub 1 Sequence #}{Hub 2 Sequence #}{Hub 3 Sequence #}{Link 1 Sequence #}{Link 2 Sequence #}{…}{Link N Sequence #}{Hub 1 Business Key}{Hub 2 Business Key}{…}{Hub N Business Key}
Satellite: BridgePrimary
Key
88
Bridge Table Data Example
SQN LOAD_DTS SELL_SQN SELL_ID PROD_SQN PROD_NUMPART_SQN PART_NUM1 08-01-2000 15 NY*1 2756 ABC-123-9K 525 JK*2*42 09-01-2000 16 CO*24 2654 DEF-847-0L 324 MN*5-23 10-01-2000 16 CO*24 82374 PPA-252-2A 9938 DD*2*34 11-01-2000 24 AZ*25 25222 UIF-525-88 7 UF*9*05 12-01-2000 99 NM*5 81 DAN-347-7F 16 KI*9-26 01-01-2001 99 NM*5 81 DAN-347-7F 24 DL*0-5
Snapshot Date
Bridge Table: Seller by Product by Part
89
What WASN’T Covered• ETL Automation• ETL Implementation• SQL Query Logic• Balanced MPP design• Data Vault Modeling on Appliances• Deep Dive on Structures (Hubs, Links, Satellites)• What happens when you break the rules?• Project management, Risk management &
mitigation, methodology & approach• Automation: Automated DV modeling, Automated
ETL production• Change Management• Temporal Data Modeling Concerns… And so on…
90
Conclusions
Who’s Using It?
The Experts Say…“The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon
“The Data Vault is foundationally strong and exceptionally scalable architecture.”
Stephen Brobst
“The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney
More Notables…
“This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.”
Howard Dresner
“[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit
from..”Scott Ambler
94
Where To Learn More• The Technical Modeling Book:
http://LearnDataVault.com
• The Discussion Forums: & eventshttp://LinkedIn.com – Data Vault Discussions
• Contact me:http://DanLinstedt.com - web [email protected] - email
• World wide User Group (Free)http://dvusergroup.com
• Certification Training:o Contact me, or learn more at: http://GeneseeAcademy.com
95
ODV – Case StudyOperational Data Vault – IN THE REAL WORLD!
96
E-Pedigree, Drug Track & Trace
Product Authenticator
Secure Integration Services
Secure Integration Services
CorporateSerializationVault
CorporateSerializationVault
SerializationAnalyticsEngine
SerializationAnalyticsEngine
Product ReturnsAnd Recalls
E-PedigreeManagement
ManufacturerProduct PackagerSupply Chain
3rd Party LogisticsDistribution Warehouse
PackagingOrders
ProductPackaging
CorpSiteServer
SSI Reporting, Analytics, and Data Mining
97
Label Serialization Vault
SerializationVault
SerializationVault
Cust Pkg Line
Cust Pkg Line
Warehouse(WMS)Warehouse(WMS)
ERPERP
E-PedigreeE-Pedigree
EPC GlobalStandards
WS/SOAP
ShippingReasons
Flat FilesWS/SOAP
ASN
Product Master Data
Data
Master Data•Products•Locations•Trading Partners•UsersShipping Data•Transactions
Serialization/Packaging Data•Serial #’s•Hierarchical Relationships•Containers
SerializationMarts
SerializationMarts
Corp DomainCorp
ApplicationsCorp
Applications
Serialization VaultGlobal – Master DataLocal – Private Data
98
Corporate Security
04/10/2023
Tracking #Machine InfoTracking #
Machine Info
Pros Unique Logins Limit Access Physical Data Separation in
Logical “Database” units No single login has 100% data
access. Customers can be CHARGED for
disk space, indexing, utilization
Pros Unique Logins Limit Access Physical Data Separation in
Logical “Database” units No single login has 100% data
access. Customers can be CHARGED for
disk space, indexing, utilization
Cons Maintenance, Backup and Restore Changes to the data model
ripple (larger impacts) as more customers are signed up.
Each “support call” requires separate login to see the data set.
Cons Maintenance, Backup and Restore Changes to the data model
ripple (larger impacts) as more customers are signed up.
Each “support call” requires separate login to see the data set.
ManufacturerData VaultData Vault
SQL View LayerSQL View Layer
Mart1
Mart1
Mart2
Mart2
Mart3
Mart3
CustomerLogin
CorpLogin
Encrypt KeyEncrypt Key
EmployeeValidation
AdminLogin
Encrypt KeyEncrypt Key
Web-Services and Flat File Delivery
ShipperData VaultData Vault
SQL View LayerSQL View Layer
Mart1
Mart1
Mart2
Mart2
Mart3
Mart3
CustomerLogin
CorpLogin
Encrypt KeyEncrypt Key
Data Exchange/Sharing Through Code Only
Global
99
Web Services File Delivery
Web-Services and Flat File DeliveryMachine
Global DBMachine
Local DBMachine
Local DBMachine
• Encryption at multiple levels• Multi-machine Utilization• RAM Based encryption decryption
through services
100
Secure Machine Transfers
Encrypt / Decrypt
Web-Services and Flat File Delivery Machine
Encrypt / Decrypt
https layer
Encrypted / Compressed
Storage
DBMSMachine
VPN Tunnel
Encrypted Local
Director Database
External IP Cards
101
Secure Client Data Interchange
CustomerLogin
CorpLogin
Corp Encrypt KeyCorp Encrypt Key
Web ServicesWeb Services
EncryptedFlat Files
Corp Managed / Owned Copy
• Decrypt using Corp Key, then Re-Encrypt with Customer Unique Key before storing
• Customer Owned Key (Dictated by Customer)
• Corporate Owned Key (Encrypts data internally)
Customer Local Copy
DecryptionKey
DecryptionKey
Web ServicesWeb Services
+HTTPS
+ SFTP
Customer Copy
102
Security: ODV Web Services
Global DB
CustomerLogin
CorpLogin
Corporate Encrypt KeyCorporate Encrypt Key
Web ServicesWeb ServicesJava Script
Or PHPJava Script
Or PHP
Web BrowserCorp Managed / Owned Copy
Web Site / Server
Corporate Owned Encryption Key
103
Inflow/Outflow Applications
Customer CorporationSource
MachineEncrypts Data
Using CustomerKey
Transmit Encrypted Data over HTTPS
Corp Decrypts Data
According to Customer Key
Corp Re-EncryptsData According to
Internal KeyFor Specific
Customer
Corp Re-EncryptsData According to
Internal KeyFor Specific
Customer
DB
Web Service Sender Web Service Collector
Customer Corporation
Corp DecryptsData According to
Internal KeyFor Specific
Customer
Corp DecryptsData According to
Internal KeyFor Specific
Customer
Corp Encrypts Data
According to Customer Key
Customer Decrypts
DataAccording to Customer Key
Transmit Encrypted Data over HTTPS DB
104
ODV: Secure File Request
Customer Corporation
Customer Decrypts
FileAccording to Customer Key
Transmit Encrypted Data over FTPS
Encrypted File
** Note: Each Customer DB is encrypted via an internally owned Corp key which is unique to EACH customer.
105
ODV: Front-End Ping Request
Customer Corporation
DBMS
Corp One-WayHash of key
NumberTo Execute Ping
Corp One-WayHash of key
NumberTo Execute Ping
Web-BasedPING
Validation
Web-BasedPING
Validation
Unencrypted Data Transfer
Login / Auth