met gecombineerde databronnen uw business laten groeien
Post on 04-Jul-2015
834 Views
Preview:
DESCRIPTION
TRANSCRIPT
© 2014 IBM Corporation
Governance in a (Big) Data environment The Data Reservoir
Johan Huizingalaan 765
1066 VH Amsterdam
The Netherlands
November 2014
Ron van der Starre,
Information Architect,
IBM Software Group
© 2014 IBM Corporation 2
Most organizations struggle to get started with
(Big)Data
Most organizations do not have the in-house
expertise
Accelerating initial success and demonstrating
business value is key to gaining organizational
support
© 2014 IBM Corporation 3
The reality of information management lacking governance today
© 2014 IBM Corporation 4
Top sources of information used as part of initial efforts – typically start with data already being captured
Source: The real world use of Big Data, IBM & University of Oxford
Data sources
Respondents with active big data efforts were asked which data
sources are currently being collected and analyzed as part of active
big data efforts within their organization.
88%
73%
59%
57%
43%
42%
42%
41%
41%
40%
38%
34%
92%
81%
70%
65%
27%
19%
36%
47%
32%
0%
21%
22%
Transactions
Log Data
Events
Emails
Social Media
Sensors
External Feeds
RFID Scans or POS Data
Free-form Text
Geospatial
Audio
Still Images / Videos
Financial services
respondents
Global respondents
© 2014 IBM Corporation 5
Data blues & skills issues
A disproportionate portion of the time spent in analytics project is about data preparation: acquiring/preparing/formatting/normalizing the data
In addition to raw data, augmented data/analytical assets can significantly speed up the analytics process and partially bridge the talent gap
© 2014 IBM Corporation 6
A growing demand …
Business Teams want • Open access to more information • More powerful analysis and visualization tools
IT Teams are • Concerned about cost
• Concerned about governance and regulatory requirements
© 2014 IBM Corporation 7
How to manage your existing and new data initiatives
Refer to data as a company high valued asset
– Define a strategy on how to use data as a differentiator
– Embed the strategy firmly in the organization
Enable value of data by strong governance principles
– Strong empowerment by senior management (or CDO)
– Define clear and efficient procedures and policies to assure compliance to risk and security
requirements
Improve trust in, and understanding of data
– Define clear and agreed busines terms to data
– Profile data to assess and baseline data quality
– Clear reporting on progress and data quality initiatives
– Data quality as an every day job
© 2014 IBM Corporation 8
A business scenario - enhanced 360º view of the customer data
Behavioral data - Orders - Transactions - Payment history - Usage history
Descriptive data - Attributes - Characteristics - Relationships - Self-declared info - (Geo)demographics
Attitudinal data - Opinions - Preferences - Needs and Desires
Interaction data - Email / chat transcripts - Call center notes - Web Click-streams - In person dialogues
Who? What?
Why? How?
© 2014 IBM Corporation 9
Other business scenarios we see
Subject matter experts want access to their organization’s data to explore the content, select, control, annotate and access information using their terminology with an underpinning of protection and governance.
Data Scientist seeking data for new analytics models.
Marketeer seeking data for new campaigns.
Fraud investigator seeking data to understand the details of suspicious activity.
• Day-to-day activity. • Requiring ad hoc access to a
wide variety of data sources. • Supporting analysis and
decision making. • Using the subject matter
experts terminology.
© 2014 IBM Corporation 10
The Vision Statement for the Data Reservoir
Enable an organization to operate as one for all platforms, functions and clients to have an agile and self-service operating model with trust and confidence across traditional and new sources of data.
Enablers
1. Agile and self-service • Find Information • Access Information • Provision Information • Integrate (Cleanse, Transform, Enact, Match,
Enhance) • Project and enhance • Hypothesis Validation • Model and report generation • Archive / Remove / Revive • Refine • Curation
2. Trust and confidence • Information lifecycle and governance • Data quality • Reference data • Entity matching and resolution • Lineage/Provenance • Classification • Regulatory compliance reports
3. Traditional and new sources of information • New types of repositories, tools and processors • Heterogeneous information virtualization
© 2014 IBM Corporation 11
What is a Data Lake/Reservoir?
A Data reservoir is a data lake that provides data to an organization for a variety of analytics processing including:
– Discovery and exploration of data – Simple ad hoc analytics – Complex analysis for business decisions – Reporting – Real-time analytics
It is possible to deploy analytics into the data reservoir to generate additional insight from the data loaded into the data reservoir.
A data reservoir manages shared repositories of information for analytical purposes.
Each Data Reservoir Repository is optimized for a particular type of processing.
– Real-time analytics, deep analytics (such as data mining), exploratory analytics, OLAP, reporting, …
Data values may be replicated in multiple repositories in the data reservoir. However the data reservoir ensures the copying and updating of this data is managed and governed using well-defined information supply chains.
Information in the data reservoir can be accessed through different types of interfaces and provisioning mechanisms provided the Data Reservoir Services.
Data Reservoir
Information Management and Governance Fabric
Data Reservoir Services
Data Reservoir Repositories
© 2014 IBM Corporation 12
Governance - differing user perspectives
Data Stores
Curation of Metadata about Stores, Models, Definitions
Information Governance Catalogue
Search for, locate and download data and related artifacts.
Provision Sand Boxes.
Add additional insight into data sources through automated analysis.
Develop data management models and implementations.
Data Stores
Data Stores
Sand Box Define governance policies, rules
and classifications. Monitor compliance.
View lineage (business and technical) and perform impact analysis.
© 2014 IBM Corporation 13
Information Governance Overview
Governance Activity Type Information Exchange Role that technology can play
Communication Policies & Metrics Delivering education, best practices, assessments, templates.
Compliance Design Changes Implementing control points and enforcement points.
Support for design and code reviews.
Test Data Management.
Exception Exception Requests Exception process management, incident reporting.
Feedback Measurements Dashboards and reports on compliance.
Vitality New Requirements Change process management
Successful Information Governance is implemented with a combination of:
• Skilled people, correct roles and organization
• Processes that create a pragmatic, targeted and agile work environment.
• Standards, templates and assets that improve consistency between implementations.
• Technology that automates classification, enforcement validation, and correction of data.
© 2014 IBM Corporation 14
Policy
Three lifecycles of information governance
Policy Policy
Policy Operations Development
Metadata
© 2014 IBM Corporation 15
InfoSphere Integration and Governance features
Define terms and policies
Information Profiling
Job Creation
Data Modeling
Mapping
Metadata Exploration
Compliance Reports
Model information supply chains
Rule Definition
Rule Execution
Exception Management
Information Maintenance
Matching
De-duplication
Information Provisioning
Discovery of Structure
Lineage
Policy
Policy Policy
Policy Operations Development
Metadata
© 2014 IBM Corporation 16
Data Reservoir system context diagram
Line of Business
Applications
Decision Model
Management
Governance, Risk and
Compliance Team
Simple,
Ad Hoc
Discovery
and
Analytics
Reporting
Events to Evaluate
Information Service Calls
Data Feed Out
Data Feed In
Information Service Calls
Search Requests
Report Queries
Understand Information
Sources
Understand Information
Sources
Deploy Decision Models
Understand Compliance
Report Compliance
Information Service Calls
Data Export
Data Reservoir
Advertise Information
Source
Information
Curator
Information Federation
Calls
Enterprise IT
System of Record
Applications
Front Office
Applications
Back Office
Applications
En
terp
rise
Se
rvic
e B
us
New Sources
Third Party Feeds
Third Party Services
Support
Services
Mobile and other
Channels Deploy
Real-time Decision Models
Other
Data Reservoirs Other
Data Reservoirs
Inter-lake Exchange
Internal Sources
10001
01011
01101
Data Reservoir Operations
Curation Interaction
Management
Notifications
Data Export
Data Import
Data Import
Deploy Real-time
Decision Models
© 2014 IBM Corporation 17
Data Reservoir major subsystems
Line of Business
Applications
Decision Model
Management
Governance, Risk and
Compliance Team
Simple,
Ad Hoc
Discovery
and
Analytics
Reporting
Events to Evaluate
InformationService Calls
Data FeedOut
Data FeedIn
InformationService Calls
SearchRequests
ReportQueries
UnderstandInformation
Sources
UnderstandInformation
Sources
DeployDecisionModels
UnderstandCompliance
ReportCompliance
InformationService Calls
DataExport
Data Reservoir
Catalog
Interfaces
Advanced Data
Provisioning
AdvertiseInformation
Source
Information
Curator
InformationFederation
Calls
DeployReal-timeDecisionModels
DeployReal-timeDecisionModels
Other
Data ReservoirsOther
Data Lakes
Inter-lakeExchange
Data Refineries
AnalystInteraction
Data Reservoir Operations
CurationInteraction
Information Integration & GovernanceManagement
Notifications
DataExport
DataImport
DataImport
Data Reservoir
Repositories
Enterprise IT
System of RecordApplications
Front Of f ice
Applications
Back Off ice
Applications
Enterp
rise Service B
us
New Sources
Third Party Feeds
Third Party Services
Support
Services
Mobile and other Channels
Internal Sources
10001
0101101101
© 2014 IBM Corporation 18
Start small, think big …
Line of Business
Applications
Decision Model
Management
Governance, Risk and
Compliance Team
Simple,
Ad Hoc
Discovery
and
Analytics
Reporting
Events to Evaluate
InformationService Calls
Data FeedOut
Data FeedIn
InformationService Calls
SearchRequests
ReportQueries
UnderstandInformation
Sources
UnderstandInformation
Sources
DeployDecisionModels
UnderstandCompliance
ReportCompliance
InformationService Calls
DataExport
Data Reservoir
Catalog
Interfaces
Advanced Data
Provisioning
AdvertiseInformation
Source
Information
Curator
InformationFederation
Calls
DeployReal-timeDecisionModels
DeployReal-timeDecisionModels
Other
Data ReservoirsOther
Data Lakes
Inter-lakeExchange
Data Refineries
AnalystInteraction
Data Reservoir Operations
CurationInteraction
Information Integration & GovernanceManagement
Notifications
DataExport
DataImport
DataImport
Data Reservoir
Repositories
Enterprise IT
System of RecordApplications
Front Of f ice
Applications
Back Off ice
Applications
Enterp
rise Service B
us
New Sources
Third Party Feeds
Third Party Services
Support
Services
Mobile and other Channels
Internal Sources
10001
0101101101
Information
Integration &
Governance
Access
Analyst
Interaction
Harvested
Data
DEEP DATA
Descriptive
Data
INFORMATION
VIEWS
CATALOG
Information
Ingestion
Information
Access
INFORMATION
BROKER
OPERATIONAL
GOVERNANCE
HUB
STAGING AREAS
Find
Access
Front Office
Applications
Internal Sources
Simple,
Ad Hoc
Discovery
and
Analytics
INFORMATION WAREHOUSE
Example
© 2014 IBM Corporation 19
Tools to address practical challenges managing Big Data
InfoSphere BigInsights for Hadoop
• For data at rest
• 100% standard Hadoop
• IBM Big SQL, BigSheets
• Developer tools, Accelerators
• Ease of use for all roles
InfoSphere Information Server
• For all data integration data requirements
• Business driven Information Governance
Catalog
• Sustainable data quality
• Governance Dashboard
© 2014 IBM Corporation 20
Tools to address practical challenges managing Big Data
InfoSphere Watson Explorer
• Enterprise Search engine
• Discover, explore structured and
unstructured data
© 2014 IBM Corporation 21
Five key findings and key success criteria
Focus on how to generate increased customer insights
in support of an existing initiative 2
Delivering analytical insights faster is a differentiator
and provides business value 5
Start with existing sources of internal data that
must be captured and maintained anyway 1
Success depends upon a scalable and extensible platform,
with security and governance 4
Determine up front what KPIs you are trying to impact
and how you will deliver business value 3
© 2014 IBM Corporation 22
Governing and Managing Big Data for Analytics and Decision Makers
Line of Business
Applications
Decision Model
Management
Governance, Risk and
Compliance Team
Simple,
Ad Hoc
Discovery
and
Analytics
Reporting
Events to Evaluate
InformationService Calls
Data FeedOut
Data FeedIn
InformationService Calls
SearchRequests
ReportQueries
UnderstandInformation
Sources
UnderstandInformation
Sources
DeployDecisionModels
UnderstandCompliance
ReportCompliance
InformationService Calls
DataExport
Data Reservoir
AdvertiseInformation
Source
Information
Curator
InformationFederation
Calls
Enterprise IT
System of RecordApplications
Front Of f ice
Applications
Back Off ice
Applications
Enterprise S
ervice Bus
New Sources
Third Party Feeds
Third Party Services
Support
Services
Mobile and other Channels
DeployReal-timeDecisionModels
Other
Data ReservoirsOther
Data Lakes
Inter-lakeExchange
Internal Sources
10001
0101101101
Data Reservoir Operations
CurationInteraction
Management
Notifications
DataExport
DataImport
DataImport
DeployReal-timeDecisionModels
http://www.redbooks.ibm.com/redpieces/abstracts/redp5120.html?Open
© 2014 IBM Corporation 23
top related