all about dv 2 - bi-podiumbi-podium.nl/mediafiles/workshops/52_presentaties/...memorizing a report...
TRANSCRIPT
All About DV 2.0© Dan Linstedt, 2015 – all rights Reserved
FOR: Data Vault Day by BI-Podium
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Data Vault 2.0 Definition:A System of Business Intelligence
containing the necessary components needed to accomplish enterprise vision in
Data Warehousing and Information Delivery.
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
What Does That REALLY Mean?
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
DV2.0 System – Foundation Pillars
Methodology
Architecture
Model
• Multi-Tier• Scalable• Supports NoSQL
• Flexible• Scalable• Hub & Spoke
… A system of Business Intelligence comprised of…
Implementation
• Consistent• Repeatable• Pattern Based
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
DV2.0 System – Foundation Pillars
Methodology
Architecture
Model• Flexible• Scalable• Hub & Spoke
… A system of Business Intelligence comprised of…
Implementation
• Consistent• Repeatable• Pattern Based
• Multi-Tier• Scalable• Supports NoSQL
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
DV2.0 System – Foundation Pillars
Methodology
Architecture
Model
… A system of Business Intelligence comprised of…
Implementation
• Consistent• Repeatable• Pattern Based
• Multi-Tier• Scalable• Supports NoSQL
• Flexible• Scalable (Big Data)• Hub & Spoke
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
What then…are the Differences Between
DV1.0 and DV2.0?
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Methodology
Architecture
Model
• Consistent• Repeatable• Pattern Based
• Multi-Tier• Scalable• Supports NoSQL
• Flexible• Scalable• Hub & Spoke
… A system of Business Intelligence comprised of…
Implementation
Data Vault 1.0 is FOCUSED on the
Modeling!Oh.. And did I mention Relational Data Only?
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Methodology
Architecture
Model
• Consistent• Repeatable• Pattern Based
• Multi-Tier• Scalable• Supports NoSQL
• Flexible• Scalable• Hub & Spoke
… A system of Business Intelligence comprised of…
Implementation
Data Vault 2.0 Includes big data, NoSQL, agility,
process, design, architecture, methodology and so much more….
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Bottom Line?DV2 encapsulates the
growing and changing marketto bring true business value.
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
ISSUES FACED TODAY
Inside the pressure cooker that is BI and EDW
(C) Dan Linstedt, 2015 all rights reserved
Business Issues…Big Data (volume, velocity)
Unstructured/Multi-Structured Data (variety)
Managed Self-Service BI (analytics)
Managed Self-Service Data Discovery (bypassing IT)
Auditability / AccountabilityOwnership and Governance
Security and Privacy
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Market Driver: Big Data
❖Volume
❖Velocity
❖Varietyhttp://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Market Driver: Unstructured Data❖ Images
❖ Video
❖ Audio
❖ Documents
What about…❖ E-Mail & XML?
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Wait…All this talk about
Big Data…Unstructured Data...“Data Warehousing”
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Anyone else see a Pattern Here?
Big Data
IT??
“Data Warehousing”…
and… Information Delivery
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
We need BOTH!
Diametrically opposed goals!
✓ Sourcing✓ Latency✓ Scalability✓ Auditability✓ Historical Storage
✓ Interpretation✓ Interpolation✓ Correlation✓ Quality✓ Rapid Delivery
“Data” Warehouse Goals
Information Mart Goals
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
How do we turn DATA in to Information,
and ALIGN it?
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Data → Information → Knowledge
Maslow’s Pyramid, reinvented: The more and better the data, the more quickly and more accurately organizations and management can make accurate decisions based on quality information and knowledge. The better the foundation, the more accurate the solutions.
See more at: http://gfxspeak.com/2012/01/09/big-fortunes-lie-hidden-in-big-files/#sthash.rRSAXIx9.dpuf
Wisdom
Knowledge
Information
Data
INTERNALIZATION
MENTAL APPLICATION
PROCESSED DATA / PERCEPTION
DISCRETE ELEMENTS / FACTS
Correct personal choice whether to climb Mt. Everest
Memorizing a report on the practical, best way to reach Mt. Everest’s peak
Books and reports on geological characteristics and weather patterns of Mt. Everest
Facts about Mt. Everest, height, average temperature, and so on
Org
aniz
atio
n a
nd
Mea
nin
g
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Data → Information → Knowledge
Wisdom
Knowledge
Information
Data
INTERNALIZATION
MENTAL APPLICATION
PROCESSED DATA / PERCEPTION
DISCRETE ELEMENTS / FACTS
Correct personal choice whether to climb Mt. Everest
Memorizing a report on the practical, best way to reach Mt. Everest’s peak
Books and reports on geological characteristics and weather patterns of Mt. Everest
Facts about Mt. Everest, height, average temperature, and so on
The DV2.0 Architecture defines separation of data
from information.
The DV2.0 Model defines standards for high
performance Big Data storage and retrieval.
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
ContextGlobal Local Personal
Experience
It’s All About Context!
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Data Information Knowledge Wisdom
Producers Consumers
http://www.nathan.com/thoughts/unified/3.html
ResearchCreation
GatheringDiscovery
PresentationOrganization
ConversationStorytellingIntegration
ContemplationEvaluation
InterpretationRetrospection
Market Driver: IT Agility
Desired by Business:• 2 / 3 week sprints• Dynamic Team Size and Variability• Parallel Team Efforts• Highly Adaptive EDW System• Low cost / low impact to absorb changes• Incremental build out / organic build• Automated Processes• Scalable• Auditable• “Data” tied to business processes (assets)
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Market Driver: Managed Self-Service BI
Desired by Business:• All access pass to information• Recording / tracking of BR changes• Security, Governance of BR• Point and click distribution / sharing• Easy GUI for data mining operations• Leverage of existing integration
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
MANAGED SELF-SERVICE BI
Managing effectively, but empowering users
Changing Gears: One Part of Success
(C) Dan Linstedt, 2015 all rights reserved
If you give a kid a bunch of finger paint, does that automatically make them a master artist?
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Why then, do we assume that giving access to raw data will make BI users “self-sufficient?”
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Stages of Data Anarchy - Threat to SSBI
http://en.community.dell.com/dell-blogs/direct2dell/b/direct2dell/archive/2012/11/27/data-anarchy-a-real-threat-to-self-service-bi-part-i
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Derived Reports & Dashboards
• Joining Existing Pieces
• Actual Data Set & Properties Preserved
• Original Data Model Still Maintained
Creating New Data Sets from
Original Source• Reports &
Dashboards created from New
Data• Original Data
Model May Be Compromised
Mashing up Non-IT Blessed Data
• End-Users ingest personal data sets in to Master Data• Data Model,
Security, Privacy, Ethics, and Data Quality severely Compromised
Multiple Copies of Data Cubes distributed
(ungoverned) by Business Users
• No Single Version of Truth OR Facts• Enterprise Concept Model NON-EXISTENT!
Controlled Chaos Data Anarchy
True Self-Service BI is a misnomer, a false-hood
True Self-Service BI is opens Pandora’s box of data,not information!
True Self-Service BI is an enterprises’ worst nightmare
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
The correct approach is: Managed
Self-Service BI
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Comparing & Contrasting
http://www.slideshare.net/johnberry21/microsoft-power-bi-38584223 http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Managed Self-Service BI
➢ Often focuses on Mashups of internal and external data
➢ Ad-Hoc adaptive data models
➢ Designed by Business & IT
➢ Support for any sized data sets
➢ May include Real-Time Data
➢ Change Control Governed Automatically by hosted server processes
Self-Service BI
➢ Mashup of Any Data
➢ Ad-Hoc Structures
➢ Designed by BI Users
➢ Complaints about Big Data Access & Performance
➢ Difficult to embed real-time data
➢ ZERO change Control
➢ Security Breaches, Ethics Violations, Morality Questions
➢ Focuses on Internal Business Data
➢ Highly Structured, well-defined data model
➢ Designed by BI Professionals
➢ Support for Large Data Sets
➢ May contain Real-Time Data
➢ High Level of Change Control
Conventional BI
Governance & MSSBI
http://www.sas.com/content/dam/SAS/en_us/image/sas-com/graphs-charts/bi-competency-chart.png
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Where’s the INFORMATION?
Why is it managed?
Business users have controlled access to information in the EDW system
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Managed Self-Service BI
Put the world in your business users’ hands, but give them a map to help them find their way…http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
What is the
answer?
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
DV2.0 System – Foundation Pillars
Methodology
Architecture
Model
• Consistent• Repeatable• Pattern Based
• Multi-Tier• Scalable• Supports NoSQL
• Flexible• Scalable (Big Data)• Hub & Spoke
… A system of Business Intelligence comprised of…
Implementation
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Agile MethodologyBENEFITS:• Drives Agile Deliveries (2/3 weeks)• Includes CMM, Six Sigma, TQM• Manages Risk, Governance, Versioning• Defines Automation, Generation• Designs Repeatable Optimized Processes• Combines Best Practices for BI
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
DV 2.0 Methodology & CMM
5 Optimized business processes, repeatable, scalable, fault-tolerant. Automatable (generate-able)
4 Metrics, Estimates vs Actuals, Function Point Analysis, Identification of broken processes
3 Defined Business Processes, Defined Goals, Defined Objectives
2 Risk assessments / analysis, managed processes, basic alignment efforts
1 Process unpredictable and poorly controlled
Follows: SEI/CMMI Level 5, PMP, Six Sigma, TQM, and Agile elements
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Agile Methodology = Automation
Repeatable, Consistent, Standardizedhttp://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
ModelBENEFITS:• Follows Scale Free Architecture• Based on Hub & Spoke Design• Backed by Set Logic & MPP Math
Hub Link
Satellite
Satellite
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Data Vault 2.0 Model
DV2 uses Hash Keys
Why?http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Relationships Are HARD without
COMMON KEYS
NoSQL RDBMS
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
COMMON KEYS
NoSQL RDBMS
Make Communication Easierhttp://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Hashing / Data Vault 2.0 Model
JSON DOC {LNK_OU_COMP_MD5,SAT_LDTS,SAT_LEDTSSAT_RSRC,ORG_UNIT_DETAILS { UNIT_DESCRIPTION, UNIT_LOCATION { UNIT_LAT, UNIT_LON } UNIT_DATES { UNIT_START_PRODUCTION, UNIT_END_PRODUCTION }}
RDBMS
NoSQL / Hadoop
JSONDocumentAudio fileVideo FileMulti-StructuredXML
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
ArchitectureBENEFITS:
• Enhances De-Coupling
• Ensures Low Impact Changes
• Provides Managed Self-Service BI
• Includes Seamless NoSQL
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
RDBMS
DV2.0 Architecture
Finance
Planning
Production
Map / Reduce CODE
NoSQL
Staging EDW – DV2 Information Marts
SoftRules
HardRules
In Memory
Appliances
Analytic Tooling
Sources
Batch
Excel
Word
Real Time
Real Time
Business Rules Engine / ESBSoft Rules
Master Data App
Cubes
Write Back
OntologyModeling
& Metadata
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
ImplementationBENEFITS:
• Enhances Automation
• Ensures Scalability
• Provides Consistency
• Includes Fault-Tolerance
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Data Vault 2.0 is an Enterprise BI System
• Scalability• Flexibility• Consistency• Repeatability• Agility• Adaptability• Auditability
Arc
hit
ectu
re
Model
Meth
od
olo
gy
Implementation
Foundation Pillars
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
HOW DO WE GET THERE?Understanding the value propositions
(C) Dan Linstedt, 2015 all rights reserved
Optimization : Being Agile
You can’t optimize what you can’t measure
You can’t measure what you can’t identify
You can’t identify what you don’t define
You can’t define what you don’t understand
DV2.0 Methodology provides successful measurement techniques
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Reducing Cycle Time
You can’t optimize what you can’t measure
You can’t measure what you can’t identify
You can’t identify what you don’t define
You can’t define what you don’t understand
DV2.0 Methodology provides successful measurement techniques
CMMI LEVEL 5!!!& Automation Enablement
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Finding Business ProblemsBusiness Perception“What I believe is happening”“What I know should happen”
Source Applications“What is actually being collected”“What is really happening”
Analytics“What might happen”“What did happen”“What if this happens”
GAP
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Source System Data(RAW, Un-integrated)
Altered, Merged, Integrated(EDW & Marts)
Reconciliation
Business RequirementsBut… How do you reconcile?
Answer: Identify and Tracka) Implementation line itemsb) Business processesc) Information delivered from the martsd) Data used in building informatione) Balancing data to source systems
DV2.0 Methodology contains these Foundational Concepts
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Rapid Model Expansion
Rapid Model Expansion
Merge New Business Units Rapidly
METHODOLOGY & RAPID DELIVERY
Disciplined Agile Delivery & DV2
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
56
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
What IS agility?
Agile software development is a group of software development methods based on iterative and incremental development, where requirements and solutions evolve through collaboration between self-organizing, cross-functional teams. It promotes adaptive planning, evolutionary development and delivery, a time-boxed iterative approach, and encourages rapid and flexible response to change.http://en.wikipedia.org/wiki/Agile_software_development
Business agility is the ability of a business to adapt rapidly and cost efficiently in response to changes in the business environment. http://en.wikipedia.org/wiki/Business_agility
57
Our Focus…
• People– Why can’t my people deliver in an agile fashion?
• Processes– What does it mean to have agile delivery?
• Technology– How can technology enable agile delivery?
Why can’t my people deliver in an agile fashion?Exploring the foundations of Agile…
Why can’t my people deliver in an agile fashion?
InspirationMotivation
Education Tooling / Enablers
Contemplate the following:Agile = frameworkDV2 = methodology
Frameworks vs Methodologies
What makes “Agile” – well, “Agile”?
LearnDataVault.com (C) Dan Linstedt, 2014, all rights reserved
PEOPLE!!
Behind the people?Pattern Driven Automation Tooling
62
Agility is the ability to be quick and graceful. www.vocabulary.com/dictionary/agility
Scrum is a simple framework (small set of rules) for effective team collaboration on
complex projects.https://www.scrum.org/Resources/What-is-Scrum
Kanban is a scheduling system for lean and just-in-time (JIT) production.
http://en.wikipedia.org/wiki/Kanban
Disciplined Agile Delivery (DAD) is a hybrid framework that builds upon the solid foundation of other methods and software process frameworks.
http://disciplinedagiledelivery.wordpress.com/introduction-to-dad/
There are nearly as many definitions of lean as there are lean practitioners and consultants. In short, the aim of any lean initiative is to eliminate waste. http://www.qualitydigest.com/feb02/html/lean.html
Data Vault 2.0 Methodology is a group of implementation directives, a set of repeatable standards that demonstrate how to efficiently
implement the DAD framework.
CMMI is a capability maturity model framework for measuring the progress
of the methodology components.
Six Sigma is a set of measurement tools, to assist in metrics collection of failure rates and errors in methodology implementation.
TQM is total quality management, the round-trip methodology for controlling quality of data, and processes.
DAD Goals
http://en.wikipedia.org/wiki/Disciplined_agile_delivery
DV2 Methodology provides workable and repeatable steps for implementing these
goals within the DAD framework.
Is it (DV2 Methodology) agile?
Acurity (main system) did a structural change to every single table it contains and contributes to the data warehouse (40 odd tables).
We automated the change impacts and they were dealt with by one developer over a period of 3 weeks.
Most of the work was to use the metadata to drive changes in database and CDC subscriptions. Most importantly – no re-engineering occurred!
My Customer Says…
Nols Ebersohn, Manager of Information Architecture, QSuper
How can technology enable agile delivery?Understanding the role of technology in AGILE systems…
Technology must automate the generation of standard processes, removing errors and enabling push-button consistency.
AnalytiX Data Services develops differentiable and value added technology products for the data integration industry which aim to close capability and functionality gaps and automate manual processes in the pre-ETL source-to-target data mapping, ETL conversion and code automation space.
DV2.0 has been proven in the RELATIONAL world
by customer successes
Does DV2 really Work?
Decide for yourself…
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
C.I.T.O Queensland Super Fund“DV2.0 brings the assurance that we can cope with an increased velocity in change, without falling behind in our ability to support time sensitive decision-making.
The quality improvement and estimate accuracy resulting from the disciplined process are bonus factors in project delivery.”
(C) Dan Linstedt, 2015 all rights reservedhttp://KeyLDV.com
Nols Ebersohn (Qsuper, Mgr of Information Architecture)
“DV2.0 training provides all the patterns and sample code, so the learning curve for developers is contracted.
We ingested 7 systems, 6500 data items into our DV2.0 with the use of 3 ETL templates in 8 months, all using 2 week sprints for delivery cycles.”
(C) Dan Linstedt, 2015 all rights reservedhttp://KeyLDV.com
Data Vault 2.0 Key Findings
Define
Separate
Govern
KPA’s & KPI’s to identify, track, and measure
Raw Data Integration from Interpolation and Interpretation
to be compliant and scalable
Implementation, and Methodology to be nimble, agile, and responsive
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Who’s Using DV2?
http://KeyLDV.com (C) Dan Linstedt, 2015 all rights reserved
Book & Training: http://KeyLDV.com/ (Intro to Data Vault CORPORATE PACKAGES AVAILABLEis a FREE course)
Contact Us:[email protected]@LearnDataVault.com
http://LinkedIn.com – “Data Vault Discussions” Forum
Custom Packages:
• Kick Start Package• Accelerator Package• Advanced Assessment Package