data vault what's next: part 2
DESCRIPTION
Part 2 of a 2 part presentation that I did in 2009, this presentation covers more about unstructured data, and operational data vault components. YES, even then I was commenting on how this market will evolve. IF you want to use these slides, please let me know, and add: "(C) Dan Linstedt, all rights reserved, http://LearnDataVault.com" in a VISIBLE fashion on your slides.TRANSCRIPT
Data Vault Modeling
What’s Next? Part 2
© Dan Linstedt 2009-2012
This was PART 2 of a presentation I gave at an Array Conference In the Netherlands, in 2009.
A bit about me…
• Author, Inventor, Speaker – and part time photographer…
• 25+ years in the IT industry• Worked in DoD, US Gov’t, Fortune 50, and
so on…
• Find out more about the Data Vault:o http://www.youtube.com/LearnDataVaulto http://LearnDataVault.com
• Full profile on http://www.LinkedIn.com/dlinstedt
LearnDataVault.com
Where are We Today?• IF you are using Data Vault…
o Auto Generation of Staging Loadso Auto Generation of Data Vault Loadso Auto Generation of Data Vault Reconciliation Routineso Auto Generation of RAW Star Schemaso Rapid Build out of Star Schemas
• If you are lucky…o Auto Generation of the Data Vault Modelo Auto Consolidation of Source System Data Modelso Auto Generation of the Staging Data Model
LearnDataVault.com
Where do all these pieces fit?
DW2.0 Framework!
LearnDataVault.com
LearnDataVault.com
DW2.0 Framework
METADATA
Interactive
Archival
Integrated
Near-Line
Tactical
Historical
Strategic
Extended
Enterprise Data Warehouse
Active Data Mining
TransformationActive
Cleansing
Cube Processing
TemporalIndexing
SemanticManagement
Enterprise Service Bus / SOA / Web Services
Unstructured Data:• Email• Plain Text• Word Docs• Images
HOT
MEDIUM
TEMP
WARM
COLD
SSD!(Cloud RAM)
CloudStorage
How do we get there?
LearnDataVault.com
Virtual Marts: What are they?
They Are:• RAM based data marts, or SSD drive based Data
Marts• OLAP cubes (most of the time) built on the fly by
new queries• “hot-data” that are continually accessed by the BI
tool• the result sets of the most frequently used queries• built dynamically, are accessed regularly, and are
destroyed after “idle” for a specific time• FAST• only a subset of data from the EDW
NOTE: They have WRITE-BACK capabilities!!LearnDataVault.com
Virtual MartsREQUIREMENTS• Cloud based RDBMS
o with expandable RAMo Unlimited computing powero Maximum parallelismo Extreme scalability
• OR: Big Hardware with similar attributes
LearnDataVault.com
BENEFITS• Highly Alterable Answer Sets• Write Back to BDV• Dynamic create/destroy
capability• No “copy” of the data except
in RAM
Virtual Marts: How do I build one?
• You can, if you have Solid-State-Disk (RAM-DISK) in your database server
• You can if you are using Cloud Technology• Building one is the job of the 2010 RDBMS engine
(today’s database engines do not provide these capabilities)
• However: To emulate, you can build one as follows:o Monitor the queries most frequently executedo Build the Cubes / stars on a regular schedule (automated queries)o Tear the cubes down when queries no longer access the data
Remember: It will be YOUR job to maintain, monitor and manage these components until the database engines get there with HOT data.
LearnDataVault.com
Virtual Marts Affect The BDV
Write Back Capability:• from Virtual Marts affect business decisions• New Business transactions/changed transactions will be
fed back to operational systems• Changes will be sent on the bus to notify other systems
of business decisions
• User security and control will have to be in place to authorize WHO can change WHAT in which parts of the marts.
• Tracking of each change will become a required standard
Eventually the Virtual Marts will become a MIXED BI Application with an operational front end!
LearnDataVault.com
Unstructured Data: What is it?
• It is: Information that resides on your desktop, on your servers, on the web, is multi-lingual, and conceptually based.
• Technically: Documents, E-Mails, Transcripts, Videos, Images, Sound Files.
• It is 80% of the data yet un-used by EDW/BI operations around the world
• It is 10x harder to deal with than structured data due to privacy concerns, ownership issues, and ethical concerns.
• Data Governance, and Data Stewardship play a HUGE role in the success/failure of working with Unstructured Data Sets
LearnDataVault.com
LearnDataVault.com
Unstructured Data
REQUIREMENTS• Pre-Processed data sets• Pointers to data sets• Use of & Loading of Ontologies• Multi-Language processing
BENEFITS• Highly Alterable Answer Sets• Write Back to BDV• Dynamic create/destroy
capability• No “copy” of the data except
in RAM
Unstructured Data Engines Vs
Search Engines
Unstructured Data Engine Search Engine
LearnDataVault.com
• Indexes Documents• Locates ALL potential
matches• Uses Data Mining / Neural
Nets• Correlates across multiple
languages, multiple meanings of phrases
• Induction based reasoning• Similarity Ratings based on
Confidence and Strength• Deep Analysis (focused on 1
question)• Utilizes Ontologies
• Indexes key terms• Locates “most likely match”• Uses Statistical Analysis• Correlates based on “Term
matching”• Wide search, but not “deep
analysis”
U-Data & Data Vault
LearnDataVault.com
Unstructured Data – Loaded To Database
Structured RAW Data Vault
Dynamic LinksBuilt from Analyzing Queries
And OntologiesUsed to Load Cubes!
Ontology, Loaded to Database
U-Data & Ontologies• Ontologies describe term relationships• Ontologies house term hierarchies• Ontologies can correlate terms across languages• Ontologies can provide synonyms, homonyms, and
antonyms• Ontologies are the key piece of Metadata needed
to cross unstructured mining results to structured data sets in source systems
• Ontologies define the manner in which natural language ties together concepts
Ontologies (or pieces of them) are required for success within the understanding of Unstructured Data & Structured Data Combinations
LearnDataVault.com
Ontologies and BI Applications
• Business Users will shift their BI applications to include managing data sets THROUGH ontology specifications
• Business Users will assign governance to ontologies and manage changes to ontologies as their metadata definitions
• Tomorrows BI tool set will provide visualizations of Ontologies cross-mapped to analytical data sets
Ontologies ARE the metadata of tomorrow
LearnDataVault.com
LearnDataVault.com
Plateau: Operational Data
Warehouse
REQUIREMENTS• Web-Services feeds with real-time
data• Applications for metadata
management on top of the EDV• Applications for Ontology
Management on top of the EDV• Applications to edit/maintain
Operational Data• Virtual Data Marts• In-DB Data Mining Engine CapabilitiesBENEFITS
• Direct ties between the operational world and the Data Warehouse
• Rapid turn around/impact analysis by business users
Operational DV: How to Build One
• The Easy Way:o Start with standard Data Vault Modelingo Attach Web-Services for In-flow/Out-Flow of Data (putting the DV on the ESB
as a 24x7x365 operational component)o Use Business Workflow Engines to monitor, create, edit, change and build
applications on top of the web-services and web messages componentso Never allow direct access to the data in the Data Vault EXCEPT through web-
services
• The Hard Way:o Start with Standard Data Vault Modelingo Attach Web Services for In-Flow/Out-Flow of Datao Build a common data access layer (CDAL) that houses transactions in RAM
(manages locking of data sets)o Build applications on top of the CDALo Put the whole thing on the CLOUD to allow dynamic data marts
LearnDataVault.com
The Experts Say…“The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon
“The Data Vault is foundationally strong and exceptionally scalable architecture.” Stephen Brobst
“The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney
LearnDataVault.com
More Notables…
“This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” Howard Dresner
“[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners
will benefit from..” Scott Ambler
LearnDataVault.com
Where To Learn More• The Technical Modeling Book:
http://LearnDataVault.com
• The Discussion Forums: & eventshttp://LinkedIn.com – Data Vault Discussions
• Contact me:http://DanLinstedt.com - web [email protected] - email
• World wide User Group (Free)http://dvusergroup.com
LearnDataVault.com