all together now: a recipe for successful data governance
TRANSCRIPT
Twitter Tag: #briefr
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Twitter Tag: #briefr
! July: Disruption
! August: Analytics
! September: Integration
! October: Database
! November: Cloud
! December: Innovators
Twitter Tag: #briefr
! Disruptive Innovation produces an unexpected new market and value network, and is usually geared toward a new set of customers.
! The consumer technology market teems with such game-changers: mp3 players, iPhone/iPads, portable storage devices, digital media, etc.
! While disruptive technologies often take a degree of time to obtain a foothold in the market, they can have a serious impact on industry incumbents, who can be slow to innovate.
Twitter Tag: #briefr
David Loshin, president of Knowledge Integrity, Inc, is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding business intelligence best practices and has written numerous books and papers on data management, including the just-published “Practitioner’s Guide to Data Quality Improvement.” David is a frequent invited speaker at conferences, web seminars, and sponsored web sites and channels including www.b-eye-network.com. His best-selling book, “Master Data Management,” has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at www.mdmbook.com. David can be reached at: [email protected] or (301) 754-6350.
Twitter Tag: #briefr
! Focuses on agility and flexibility for data governance and standards
! Offers a core technology suite, DataStar, that delivers data modeling, integration, aggregation and automation.
! Developed a NoSQL alternative for data consolidation
Twitter Tag: #briefr
Dr. Geoffrey Malafsky earned a Ph.D. in Nanotechnology from Pennsylvania State University. He was a research scientist at the Naval Research Laboratory before becoming a technology consultant in advanced system capabilities for numerous Government agencies and corporate clients. He has over thirty years of experience and is an expert in multiple fields including Nanotechnology, Knowledge Discovery and Dissemination, and Information Engineering. He founded and operated the technology consulting company TECHi2 prior to founding Phasic Systems Inc., where he is the CEO and CTO.
Bringing Agility and Flexibility to Data Design and Integration Phasic Systems Inc Delivering Agile Data www.phasicsystemsinc.com
Introduction to Phasic Systems Inc
• Bringing Agile capabilities to data lifecycle for business success • Methods and tools tested and refined over years of in-depth large-
scale efforts • Solve toughest data problems where traditional methods fail • Based on extensive consulting lessons learned and real-world
results • Began in 2005 to commercialize advanced Agile methods
successfully deployed in competitive development contracts
10
Phasic Systems Inc Management
• Geoffrey Malafsky, Ph.D, Founder and CEO ▫ Research scientist ▫ Supported many organizations in their quest to access the right
information at the right time • Tim Traverso, Sr VP Federal ▫ Technical Director, Navy Deputy CIO
• Marshall Maglothin, Sr VP HealthCare ▫ Sr. Executive multiple large health care systems
• Deborah Malafsky Sr VP Business Development
11
Our Agile Methods • Why be Agile? ▫ Provide flexibility and adaptability to changing business needs while
maintaining accuracy and commonality ▫ Segmented approach is too slow, rigid, and costly
• How? ▫ Treat data lifecycle as one continuous operation from governance to
modeling to integration to warehouses to Business Intelligence ▫ Emphasize value produced at each step and overall coordination ▫ Seamlessly fit with existing organization, procedures, tools but add Agility,
commonality, flexibility, and reduced cost and time • We are Agile and comprehensive ▫ Typical 60-90 day engagement ▫ Deliver completed products not just plans or partial results
12
Methods and Tools • DataStar Discovery: Agile data governance, standards and design ▫ Add business and security context to data ▫ Flexible, common data definitions/ semantics, models
• DataStar Unifier: Agile warehousing and aggregation ▫ Simplified, common semantics using Corporate NoSQL™ ▫ Source to target mapping with flexibility, standardization ▫ Aggregate data using all use case and system variations simply and
easily into standard or NoSQL databases
13
14
“As a COO of a Wall Street firm and a former Vice Admiral in the United
States Navy in charge of a large integrated organization of thousands of people
and numerous IT systems, I have seen firsthand the critical role that high-quality
enterprise data plays in day-to-day operations of an organization. Without
timely access to reliable and trusted data all of our operations were vulnerable
to poor decision making, weak performance, and a failure to compete. With
Phasic Systems Inc.’s agile methodology and technology, we were finally able to
solve our data challenges at a fraction of the time, cost, and organizational
turmoil that all the previous and more expensive, time-consuming approaches
failed to do. Phasic Systems Inc. offers a new and much-needed approach to
this important area of Business Intelligence.”
PSI Customer Testimonial
VADM (ret) J. “Kevin” Moran
15
The Business Case Today’s Response Timeline (15 to 27 Months)
Tomorrow’s Initial Response Timeline with PSI (Subsequent Response Timeline – Days)
IT Groups • Develop Systems & Applications • Physical Data Models • Databases / Data Warehouse • ETL controls • MDM
Business Groups • Requirements • Conceptual/Logical Models • Data Quality • Business Rules • Standards
BI Groups
• BI Data Models • Reports • Dashboards
Users • Capability Problems • New Capabilities • Missing Data
3 to 6 Months 6 to 9 Months 3 to 6 Months 3 to 6 Months
• Requirements • Conceptual Data Model • Logical Data Model • Business Rules • Standards • BI Data Models • Data Quality
• Develop Systems & Applications • Physical Data Models • Databases / Data Warehouse • ETL controls • MDM
2 to 6 Months
Agile: Overcome Hurdles • Group rivalry ▫ Embrace important business variations; recognize no valid reason
to force everyone to use only one view exclusively. • Terminology confusion ▫ Use a guided framework of well-known concepts to rapidly identify,
and implement variations as related entities. • Poor knowledge sharing ▫ Use integrated metadata where important products (business
models, data models, glossaries, code lists, and integration rules) are visible, coordinated, and referenceable
• Inflexible designs ▫ Use a hybrid approach (Corporate NoSQL™) for Agile
warehousing and integration blending traditional tables and NoSQL for its immense flexibility and inherent speed
16
Schema Are Not Enough
Must be agile in order to adapt quickly to new business needs ▫ Continuous change is norm: requirements, consolidation ▫ We must use all the important business variations of key terms (e.g.
account, client, policy) – No such thing as single version for all!
Governance Design MDM
Integration ?
Which Value? Whose?
?
My “customer” or your “customer”?
Sales, Accounting
CEO/CFO/CIO SAP/IBM/ORACLE
How is data used?
D. Loshin 2008
Status Quo: Non-Agile
18
Agile: Visible, Common
Unified Business Model™ 19
Intuitive, List-based
Real Estate Listing Example
• Seems simple and well-defined ▫ Each house has a type, id, address, etc.. ▫ Industry standards: OSCRE, RETS
• Yet, data systems are very different ▫ Data model tied tightly to business workflow ▫ Extensions and “make-it-work” changes added over time
• Similar to customer relationship mgmt, ERP, and many other fields
20
Semantic Conflict in Real Estate Models
21
NKY
HOMESEEKERS
NKY attribute ‘basement’ does not have a corollary in
HOMESEEKERS
Data Value Semantic Errors = Inconsistent, Difficult to Merge, Report, Analyze
22
Lot_dimensions: implied semantics for size data. Actually has all sorts of data
Semiannual_taxes: implied semantics for numeric data. Actually has all sorts of data
23
NKY HomeSeekers Texas
24
25
Fully Integrated Metadata for Business, IT, and BI
26
27
DataStar Corporate NoSQL™ • Large systems use NoSQL for its flexibility, performance,
and adaptability ▫ But, it is poorly suited for corporate use – lacks connection to
business • DataStar Corporate NoSQLTM ▫ Blends traditional techniques and NoSQL ▫ Entities come directly from Unified Business Model ▫ Object structure with simple tables ▫ Key-value pairs are basic repeating structure of all tables ▫ Business driven terminology ▫ Easily handles semantic variations & updates w/o changes to
logical or physical models ▫ Can be as ‘dimensional’ or ‘normalized’ as desired
28
Speed &
Agility
Position Data Model 29
Results • Applied to production data: ▫ Fully cleaned & integrated data governance approved � Requirement: 500,000 records in 2 hrs on Sun E25K � Actual: 50 minutes on 3 year low-cost server
• Governance documents produced and approved ▫ Legacy data models – first time in ten years ▫ Common data model – directly derived from ontology.
Position-Resume model • Standing governance board created with short decision-
making monthly meetings ▫ Position-Resume Governance Board
• Process approach and technology applied to new IT systems
Navy HR Data Analysis • Groups “share” data and control only if they don’t lose
project control or funds • Governance, business process, data engineers create
separate designs and don’t know how to coordinate • Try hard to follow industry guidance but stuck • Actual data is very different than policy, mgmt awareness ▫ Example 1: Multiple Rate/Rating entries. Person xxxxxx has 5
entries: 4 end on the same date, 2 have start dates after they their end dates , 2 start and end on the same days but are different ▫ Example 2: 30 different values used for RACE but only 6
allowed values in the Navy Military Personnel Manual derived from DoD policy
Agile Warehousing and BI 32
Agile Warehousing and BI 33
v
34
Resume Data Model
Key-Value Vocabulary
35
Resume Identifiers
Key-Value Vocabulary
36
Competency KSAs
Twitter Tag: #briefr
Agility and Collaboration for Data Governance
David Loshin Knowledge Integrity, Inc.
www.knowledge-integrity.com
38 © 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
Business Metadata Interdependencies
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
41
Concept
Context
Process
Business Policy
Objective: Translate Business Policies into Data Rules
Business Goals
Business Policy
Information Policy Metadata Business
Rules Data Rules
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
42
Operational governance integrates monitoring conformance to data rules
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
44
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
45
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
46
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
47
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
48
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
49
Motivation: Complexity in Data Meanings & Semantics
p What is a customer?
p These are potentially conflicting definitions
p Representations and underlying meanings from different business functions may differ
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
50
Sales: Someone who pays for our products or services
Support: Someone who has a license for use of our product
Finance
Sales
Marketing
Customer Service
Human Resources
Legal
Compliance
“customer”
?
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
51
Build from the Bottom Up
Concepts Business Terms Definitions Semantics
Business Definitions
Conceptual Domains
Value Domains
Reference Tables Mappings
Reference Metadata
Critical Data Elements
Data Element Definitions Data Formats Aliases/Synonyms
Data Elements
Entity Models Relational Tables Domain Directory
Information Architecture
Information Usage
Information Quality
Data Quality SLAs Access Control
Data Governance
Business Terms
p Within different contexts, business terms may be used with a specific definition to refer to: n An action n An entity n A characteristic
p A business term may be used multiple times with different definitions
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
52
Example – Identifying Business Terms p Order Confirmation
If you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information, or if you experience an error message or service interruption after submitting payment information, it is your responsibility to confirm with FizzDizzle Customer Service whether or not your order has been placed.
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
53
Example – Identifying Business Terms p Order Confirmation
If you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information, or if you experience an error message or service interruption after submitting payment information, it is your responsibility to confirm with FizzDizzle Customer Service whether or not your order has been placed.
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
54
• You • Confirmation number • Confirmation page • Confirmation email • Payment information • Error message • Service interruption • FizzDizzle Customer Service • Order
Nouns
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
55
Example – Identifying Business Terms p Order Confirmation
If you do not receive a confirmation number (in the form of a confirmation page or email) after submitting payment information, or if you experience an error message or service interruption after submitting payment information, it is your responsibility to confirm with FizzDizzle Customer Service whether or not your order has been placed.
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
56
• Receive • Submitting • Experience • Confirm • Placed
Verbs
Bring it All Together: The Chain of Definition
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
57
Harmonization
p Use Chain of Definition to determine when: n Similarly-named data
elements refer to the same data element concept
n Same-named data elements refer to different data element concepts
n Consolidating when possible and
n Differentiating when necessary
© 2011 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
58
Data Element
Type
FirstName VARCHAR(35)
LastName VARCHAR(40)
SSN CHAR(11)
Telephone VARCHAR(20)
Data Element
Type
First VARCHAR(25)
Middle VARCHAR(25)
Last VARCHAR(30)
SocialSec CHAR(9)
Impact Assessment
p Use chain of definition model to identify the instances that are impacted as a result of harmonization
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
59
Data Element
Type
FirstName VARCHAR(35)
LastName VARCHAR(40)
SSN CHAR(11)
Telephone VARCHAR(20)
Data Element
Type
First VARCHAR(25)
Middle VARCHAR(25)
Last VARCHAR(30)
SocialSec CHAR(9)
Questions and Open Discussion
p www.knowledge-integrity.com
p If you have questions, comments, or suggestions, please contact me David Loshin 301-754-6350 [email protected]
© 2011 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
60 60
www.dataqualitybook.com
www.mdmbook.com
© 2012 Knowledge Integrity, Inc. www.knowledge-integrity.com (301)
754-6350
Twitter Tag: #briefr
! One of the common themes in the material you provided is the need for collaboration as part of the lifecycle management for the creation of a unified business model. To what extent is this collaboration driven by the software and how much requires processes designed around the software?
! What is your approach for transferring the knowledge for identifying semantic conflicts and resolving them within the organization?
! A lot of the slides suggest that the intent of the use of the technology is for developing data warehouse or business intelligence models. Is the use limited to consuming data from existing systems, or can it be used for reengineering operational or transaction systems, and if so how, and if not, why?
Twitter Tag: #briefr
! One of the barriers to value for existing metadata and governance tools is the need for ongoing maintenance of the content. How can the product be used to facilitate ongoing management and assurance of consistency of business terminology?
! Presuming that I am now a data consumer (say a business analyst) within the organization, how would I use this technology to clarify the definitions and lineage of business terms presented to me in a BI report?
Twitter Tag: #briefr
! What is your approach for capturing the semantics of implicit business concepts? In your real estate example, one of the columns for lot dimensions had implied semantics for size data, with an implication of measurement systems, units of measure, and even “topography” of the lot size. This implies the use of business concepts that are not explicit (acreage vs. square footage, transformations across frames of reference, qualification of lot shape, presentation of dimensionality). How does the tool capture implicit semantic information?
! Going back to collaboration: What types of interactive notifications are integrated into your environment to apprise individuals of changes to business terms, data element concepts, data elements, value domains, etc.?
Twitter Tag: #briefr
Twitter Tag: #briefr
! July: Disruption
! August: Analytics
! September: Integration
! October: Database
! November: Cloud
! December: Innovators
Twitter Tag: #briefr