cream 8_final_16

49
Analytics-Centric Data Architecture Strategy Cream 8 Eric Hasty Prajakta Patil Rachel Robin Robb (Muyang) Su

Upload: prajakta-patil

Post on 15-Jan-2017

126 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cream 8_final_16

Analytics-Centric Data Architecture Strategy

Cream 8

Eric Hasty

Prajakta Patil

Rachel Robin

Robb (Muyang) Su

Page 2: Cream 8_final_16

Agenda

2

Our Understanding .......………………………………………….. 3

Recommendations ……..…………………....…………………… 4

Implementation ……………………………………………….. 14

Cost Analysis ……………………………………………….. 17

Risk & Mitigation .…..…………………………………………... 18

Concluding Remarks ……………………………………………….. 20

Page 3: Cream 8_final_16

Our understanding

3

Cummins utilizes

multiple data

warehouse

environments to

support enterprise

analytics

objectives

Traditional data

warehousing

strategies inhibit

exploitation of

recently developed

analytical practices

and tools

What strategy and

architecture should

Cummins pursue in

order to take

advantage of

rapidly developing

predictive and

prescriptive

analytics

practices?

Understanding Recommendations Implementation CostsRisk

AnalysisClosing

Page 4: Cream 8_final_16

Three initiatives can produce a data architecture that best supports current and future predictive and prescriptive analytics at Cummins

4

Use a logical data warehouse structure as a blueprint for Cummins’ next

generation data warehouse

Shift to a cloud-based analytics environment in Amazon Web Services to

best support predictive and prescriptive analytic methods

Construct an information governance structure by following a five step

process to ensure promotion of trust and consistent use of the analytics

environment

1

2

3

Understanding Recommendations Implementation CostsRisk

AnalysisClosing

Page 5: Cream 8_final_16

Use a logical data warehouse structure as a blueprint for Cummins’ next generation data warehouse

5M

etadata

Man

agemen

t

Spreadsheet

CEP

RDF

Graph

IT log

RDBMS

SaaS

ERP

Structured Unstructured

Semantic Layer

Repository VirtualizationDistributed

Processing

Data Integration layer

Logical Data Warehouse

$Descriptive Diagnostic Predictive Prescriptive

Meta

data

Man

agem

ent

Store, Manage,

Organize,

Correlate

Explore and

Analyze

Data sources

Share,

Act and

Collaborate

Understanding Recommendation: Logical Data Warehouse Implementation CostsRisk

AnalysisClosing

Page 6: Cream 8_final_16

A logical data warehouse will present a consolidated view of enterprise data without requiring the deployment of a single consolidated warehouse

6

EDW – Repository

• Consolidates structured

data into central

repositories

• Analytical tools access data

from repository via

predefined schemas –

“schema on write”

Distributed processing

• Cost effective way of processing

massive structured and

unstructured data

• Pattern analysis over

historical/cold data

• Tools can define their own

schemas later- “schema on read”

Data virtualization

• Retrieves and processes data on

demand

• Supports rendering memory or cursor-

only types of data resources, which

directly read source systems

• Benefits include reduced data sprawl,

lower data latency and higher flexibility

Source: http://www.gartner.com/document/2841217?ref=ddrec

Understanding Implementation CostsRisk

AnalysisClosingRecommendation: Logical Data Warehouse

Page 7: Cream 8_final_16

Six key benefits come with adoption of a logical data warehouse environment

7

Source: http://www.gartner.com/document/2841217?ref=ddrec

Newer distributed computing

and complex event processing

helps meet Big Data challenges

Helps balance technical and

human investment portfolios

Cloud & 3 LDW styles offer

paybacks in various time & risk

levels

Key tasks are in iterations

Concurrently executed work

streams generate synergy

Data virtualization reduces

sprawl and enables better data

management and security

Connects diverse data

sources to deliver insights

in strategic and operational

contexts

Follows proven design principles

like “separation of concerns”

and Integration using a toolbox

approach

Improved Decision

MakingCollaboration &

Data Governance

Balanced

Investment

Portfolio

Improved

FlexibilityReduced Data

SprawlMeets Big Data

Challenges

Understanding Implementation CostsRisk

AnalysisClosingRecommendation: Logical Data Warehouse

Page 8: Cream 8_final_16

Shift to a cloud-based analytics environment in Amazon Web Services to best support predictive and prescriptive analytic methods

8

AWS provides extreme scalability, ensuring Cummins can grow its

analytics environment without physical machine encumbrances

A wide variety of tools are available, facilitating rapid pursuit of the latest

opportunities in predictive and prescriptive analytics

AWS can easily connect to just about any analytics platform imaginable

Important AWS Tools:

S3 Dynamo DB RDS Redshift

Understanding Implementation CostsRisk

AnalysisClosingRecommendation: AWS-Based Environment

EMR

Page 9: Cream 8_final_16

An example AWS-based logical data warehouse for Cummins

9

Data Objects- Schematics

- Video

- Images

Unstructured Data- Machine Output

- Engine Diagnostics

Selected

Transactional Data- Financial information

- Inventory

Enterprise Analytics Applications

Understanding Implementation CostsRisk

AnalysisClosingRecommendation: AWS-Based Environment

Page 10: Cream 8_final_16

AWS provides unmatched and expanding data residency/sovereignty support

10

Each region contains

multiple locations with

data residency support

AWS provides audit

support to validate

compliance

Source: https://aws.amazon.com/about-aws/global-infrastructure/

Understanding Implementation CostsRisk

AnalysisClosingRecommendation: AWS-Based Environment

Page 11: Cream 8_final_16

Construct an information governance structure by following a five step process to ensure promotion of trust and consistent use of the analytics environment

11

1. Identify and certify

trusted sources

2. Formalize

responsibilities

3. Establish data

quality metrics4. “Watermark”

outputs

5. Make lineage visible at point of consumption (context via metadata)

Business stakeholders,

SMEs, Data stewardsChief Data Officer Business stakeholders Data Consumers

Source: http://www.gartner.com/document/code/254668?ref=ggrec&refval=2552018

Understanding Implementation CostsRisk

AnalysisClosingRecommendation: Information Governance

Page 12: Cream 8_final_16

A tiered approach and dedicated team for information governance will ensure proper data prioritization and strategy

12

BUSINESS

MDM Infrastructure team

Modeling/metadata

App dev/

Integration

System

ManagementSecurity

Privacy

Monitoring

ReportingData Quality

IT

Designs, Builds out and manages technology infrastructure for MDM

Manages program, and

authors and maintains

master data

Information Governance board

MDM team

Centralized/Distributed

Information

Steward

Sets and enforces

information

management policies

Info Architect

Point of activity

for data quality

monitoring,

improvement and

issue resolution

IT representation

Source: http://www.gartner.com/document/3119918?ref=solrAll&refval=160151876&qid=f8616fb761d32d3be334435dd3008808

Understanding Implementation CostsRisk

AnalysisClosingRecommendation: Information Governance

Page 13: Cream 8_final_16

Use a parallel and simultaneous deployment strategy to move to cloud

13

Old on-premise DW system

Allows for optimizing before full conversion

Redshift and other AWS services

Reasons for simultaneous deployment Benefit of parallel deployment

Parallel conversion has a relatively low risk

compared to other conversion methods. The new

system can be tuned and corrected significantly

interfering with regular operations

BUs operate autonomously problems and

resolutions will be unique per BU. No

advantage from pilot

AWS payment structure based on data

volume not per user all data is available

immediately. No advantage from pilot

Deployment by geographic location does not

make sense for cloud operations

Source: http://www.baselinemag.com/cloud-computing/migrating-a-big-data-warehouse-to-the-cloud.html

Source: Systems Analysis & Design with UML, V.2, Alan Dennis, 2014

Understanding Implementation CostsRisk

AnalysisClosingRecommendation: Information Governance

Page 14: Cream 8_final_16

Deployment of AWS & Redshift will take less than one year

14

2 wk. 4 wk. 6 wk. 2 mo. 3 mo. 4 mo. 5 mo. 6 mo. 8 mo.

Plan/Analyze

Design/Prepare

POC

Migrate

Tune/Optimize

Change Mgmt.

Understanding Recommendations Implementation CostsRisk

AnalysisClosing

Page 15: Cream 8_final_16

Realize full benefits of Redshift through thorough planning and execution

15

Plan/

Analyze

Design/

Prepare

Proof of

ConceptMigrate

Data

Tune/

Optimize

Cloud adoption

strategy

Business

requirements

Technical

requirements

Cloud migration

roadmap

Success/ failure

conditions

Integration/

consolidation

design

End-to-end

migration testing/

validation

Check against

acceptance

criteria

Migrate using

AWS

import/export and

Attunity

CloudBeam

Consolidate

Integration

Elasticity and

scalability

Availability

Optimize

utilization

Source: http://www.slideshare.net/tomlaszewski/data-center-migration-to-the-aws-cloud

Source: http://www.ibm.com/developerworks/data/library/techarticle/dm-1309migtera

Source: http://www.attunity.com/attunity-cloudbeam-for-amazon-redshift-0

Understanding Recommendations Implementation CostsRisk

AnalysisClosing

Page 16: Cream 8_final_16

Support and prepare employees with effective change management

16

Major challenges

• Different maturities of different BUs

• Successful adoption within an aggressive timeline

• Parallel deployment makes it easy to revert to old system

MYTH Change management is not as important when switching to cloud.

REALITY Switching to cloud effects IT roles, service delivery, and processes.

Action for effective change management

1. Make the business case relevant.

2. Align change activities with SDLC.

3. Governance structure is clearly presented and understood

4. Set-up and manage correct expectations.

Source: http://www.cmswire.com/cms/information-management/cloud-implementations-

change-management-need-challenges-best-practices-016381.php?pageNum=2

Source: http://www.slideshare.net/gaurav1069/change-management-framework-33310710

Understanding Recommendations Implementation CostsRisk

AnalysisClosing

Page 17: Cream 8_final_16

Costs of the new analytics environment are driven by provision of AWS services and implementation advisory services

17

$380k

Upfront Payment

$15k

Monthly Expenses

AWS Enterprise-

Level Service

$380kAWS Business-

Level Service

AWS Basic-Level

Service $362k

-

Anticipated Additional

Advisory Implementation

Assistance

$84k

$224k

$100

-

Source: https://media.amazonwebservices.com/AWS_Pricing_Overview.pdf

Source: http://calculator.s3.amazonaws.com/index.html

$920k

Total Cost

Three Year Service

$467k

$604k

Understanding Recommendations Implementation CostsRisk

AnalysisClosing

Page 18: Cream 8_final_16

Three key risks are inherent in this proposal

18

Breach of sensitive data due

to weak controls and system

vulnerabilities in the cloud

Cultural resistance from

users who are more familiar

with the Oracle environment

• Implement hybrid cloud deployment and keep highly

sensitive information and data on-premise

• Purchase a data breach insurance for the cloud solution

• Design new policies that standardize the adoptive

behaviors and reward adopters

• Support and sponsorship from executive level is required

to pursue the new policies

Top Risk Concerns Mitigation Strategies

Leveraging AWS product

suite can cause vendor lock

in with Amazon

• Use the existing data centers to provide redundant

backup and storage of critical information, facilitating a

quick switch away from AWS if necessary

See Appendix for entire list

Understanding Recommendations Implementation CostsRisk

AnalysisClosing

Page 19: Cream 8_final_16

AWS includes robust support for audit and compliance functions

19

Amazon CloudTrail

Amazon Redshift logs all SQL

operations, including connection

attempts, queries and changes to

database

Amazon Redshift

Amazon CloudTrail records AWS API

and delivers log files, including:

• API calls made via the AWS

Management Console

• AWS SDKs

• Command line tools

• Higher-level AWS services

Source: https://aws.amazon.com/redshift/

https://aws.amazon.com/cloudtrail/

https://aws.amazon.com/compliance/

Certified and compliant with

Understanding Recommendations Implementation CostsRisk

AnalysisClosing

Page 20: Cream 8_final_16

These initiatives each enable security, agility, or usability, contributing to the achievement of Cummins’ data strategy

20

Source:http://www.gartner.com/document/2772517?ref=solrAll&r

efval=159699515&qid=cb58c9b2ee8945205c418718e533419b

Source: https://aws.amazon.com/redshift/

Security

• Built in security from leading cloud service providers

• Encryption, network isolation

• Integration with audit and compliance tools like AWS CloudTrail

Agility

•‘Pay as you go’ model

•Scalable and fast

•Fast restores

Usability

• Move from Capex to Opex Model

• Ease of adoption is greater as implementation time is shorter

• Fully managed, fault tolerant with automated backups

• SQL compatible and easily integrates with other AWS products

Features LDW AWS Redshift Data Governance

Security a aAgility a aUsabilty a a

Understanding Recommendations Implementation CostsRisk

AnalysisClosing

Page 21: Cream 8_final_16

Cummins can achieve an ideal environment for real-time predictive and prescriptive analytics through the use of an AWS-based logical data warehouse

21

Deploy a logical data warehouse structure that

incorporates a prioritized data classification

system

Shift the analytics environment to Amazon

Web Services in order to leverage the

flexibility and scalability of the cloud

Implement a data governance structure to

ensure effective management of the logical

data warehouse and analytics environment

Security

Agility

Usability

Understanding Recommendations Implementation CostsRisk

AnalysisClosing

Page 22: Cream 8_final_16

Appendix

22

Page 23: Cream 8_final_16

Appendix

23

80/10/5 Analytics Rule Comparable service cost -

Teradata

AWS Redshift

Compatibility

Considerations- BI tool

Feature comparison

Risk Catalogue Complete List of

Certifications and

Compliance

LDW Framework - Gartner Emerging trends in DW

Analytics Portfolio Use Case for analytics

portfolio – Descriptive &

Diagnostic

Debunking LDW myths Cloud data integration

comparison

Information governance

board sample

responsibilities

Metrics for data

management

Outcomes of each stage

of data governance

Maturity model for MDM

Detailed costs of AWS

services likely to be

provisioned by Cummins

Additional Advisory

Expenses - Assumptions

LDW components at

Cummins

Use Case for analytics

portfolio – Predictive &

Prescriptive

Criteria for evaluating

MDM maturity

Simplified Data Backup in

Amazon Redshift

AWS services included in

the new Cummins LDW

structure

AWS Support plan

Page 24: Cream 8_final_16

Simplified Data Backup in Amazon Redshift

24

DataLoaded

Redshift

Backup

Original Copy

Replica Copy on compute node

Automated snapshots

Appendix

Page 25: Cream 8_final_16

AWS services included in the new Cummins LDW structure

25

Data Object Storage

Unstructured Data Storage

Transactional Data Storage

Managed Hadoop Framework

Data Warehouse

Amazon CloudTrail

Amazon Kinesis

Managed Hadoop Framework

Stream real time data analysis

Transfer large amounts of data into and out of AWS

Amazon Glacier

Inactive Storage of large amount of data

Appendix

Page 26: Cream 8_final_16

Use the 80/10/5 rule to address 95% of analytics use cases

26

Fulfill using traditional repository

• Build a traditional repository.

• Analytic data should be written

into an established schema —

that is, the repository specifically

designed for that function.

Fulfill using virtualization and/or

distributed processing

• Wide access to the asset

• Structure is complex and not always

consistent

• Explore unexpected forms in the

data

Fulfill using data virtualization

• Direct availability of source data

from various BUs.

• Allow each use case to perform

its own analytics schema at read

for transformations and

integration.

Data and information has been recorded,

but its relationship to business processes

— and even other data — is not readily

apparent and requires multiple exploratory

efforts to resolve.Source: http://www.gartner.com/document/code/252003?ref=grbody&refval=3142720

80% of use cases 10% 5%5%

Appendix

Page 27: Cream 8_final_16

Comparable Service Cost - Teradata

27

Source: https://roianalyst.alinean.com/ent_04/AutoLogin.do?d=807140806693380932

Appendix

Page 28: Cream 8_final_16

AWS Support Plans

28 Appendix

Basic (Free) Business Plan Enterprise Plan

Features Provides customers immediate, around the clock access to customer service and technical support for system health issues that are detected by AWS. Customers also have access to technical FAQs, best practices guides, the AWS Service Health Dashboard, and the AWS Developer Forums, which are monitored and responded to by AWS support engineers.

Provides one-hour response time, available 24/7 via phone, chat, or email. Customers also gain access to AWS Trusted Advisor, a program which monitors AWS infrastructure services, identifies customer configurations, compares them to known best practices, and then notifies customers where opportunities may exist to save money, improve system performance, or close security gaps. New to this plan, customers now have on-demand access to Trusted Advisor self-service tool. In addition, customers receive 3rd Party Software Support for OS, web servers, databases, storage, FTP, and email.

Provides customers with all the plan components of Business plus mission critical responses within 15-minutes and a dedicated Technical Account Manager who is intimately aware of the customer's specific AWS architecture. Technical Account Managers will also conduct periodic business reviews for infrastructure planning, report metrics, collaborate on launches, and connect customers to solution architects as needed. The Trusted Advisor program is also available to all Enterprise plan customers.

Source: http://calculator.s3.amazonaws.com/index.html

Page 29: Cream 8_final_16

AWS Redshift Compatibility Considerations

29

“Amazon Redshift uses industry-standard SQL and is accessed using

standard JDBC and ODBC drivers.”

Source: https://aws.amazon.com/redshift/faqs/

Source: https://aws.amazon.com/redshift/partners/

Additionally, AWS and Redshift have an extensive partner network of popular tools, a few of

which are highlighted below:

Appendix

Page 30: Cream 8_final_16

Features Comparison

30

AWS Redshift Teradata (on AWS) IBM-dashDB

Fully Managed Yes No Yes

In-Memory No Yes Yes

Scalability Petabyte level Terabyte level Terabyte level (12)

Columnar Storage Yes No Yes

Data Residency High Low Middle

Security Certification Yes Yes Yes

Durability 99.99999999% N/A N/A

Redundancy Yes Yes Yes

Availability High High High

Appendix

Page 31: Cream 8_final_16

Risk Catalog 1

31

Risks Mitigation

People 1. Complicate transition of business analysts and other users

• Assembly a change management team to design a strategy for the new changes

• Acquire support and training from vendors

2. Cultural resistance due to history as a Oracle shop

• Design new policies that standardize the adoptive behaviors and reward adopters

• Support and sponsorship from executive level is required to pursue the new policies

3. Stakeholder’s expectation on the return and cost saving is over confident

• Change management team should help to manage the expectation to a realistic level byclarify the benefits and costs

4. Lack of internal knowledge base and talents for Cloud services

• Leverage Amazon's knowledge base and purchase the enterprise support package

• Start building Cummins' own knowledge management based on best practice of ITIL

Appendix

Page 32: Cream 8_final_16

Risk Catalog 2

32

Risks Mitigation

Technology 1. Data breach and security concerns that are related to the cloud solution

• Use private cloud deployment and keep highly sensitive data on premise

• Purchase data breach insurance to transfer potential loss

2. Vendor lock-in with AWS and neglect new possible alternatives in the future

• IT department periodically scans and evaluates the new technologies with competitive advantage

• The parallel strategy guarantees the flexibility for Cummins to quickly and smoothly switch to other solutions

3. High dependency on the Internet/Intranetconnection and vendor’s availability

• Assess and design the disaster recovery and business continuity strategy for the worst scenario

• Specify and manage SLAs with the AWS

Appendix

Page 33: Cream 8_final_16

Risk Catalog 3

33

Risks Mitigation

Process 1. The complexity of management is increased due to the new IT architecture

• Integrate the current risk management and incident management systems with adjusted incident management process

• Leverage the supporting service provided by AWS

2. Moving to the cloud decrease company's level of control on data

• Negotiate with service provider for certain administer privilege on the data management

3. Implementation of LDW intrudes personal privacy and ethics

• IT department needs to assess the priority of information governance and balance value, reusability, compliance and risk

4. To-be status is not described accurately or further restrictions are implied

• Constant re-visit of the objective and evaluate the IT investment portfolio

Appendix

Page 34: Cream 8_final_16

Complete List of Certifications and Compliance

34

AWS Assurance Programs

• SOC1

• SOC2

• SOC3

• IRAP (Australia)

• PCI DSS Level 1

• ISO 9001

• ISO 27001

• ISO 27017

• ISO 27018

• MTCS Tier 3 Certification

• FERPA

• HIPPA,

• ITAR

• Section 508 / VPAT

• FISMA, RMF and DIACAP

• NIST

• CJIS

• FIPS 140-2

• DoD SRG Levels 2 and 4

• G-Cloud

• IT- Grundschutz

• MPAA

• CSA

• Cyber Essential Plus

• FedRAMP (SM)

• FISMA

Appendix

Page 35: Cream 8_final_16

Logical data warehouse reference framework

35

Source: http://www.gartner.com/document/code/234996?ref=ggrec&refval=2267615

Appendix

Page 36: Cream 8_final_16

LDW components at Cummins

36

Enterprise wide

repository

Component

BU

Engine

BU

Power

Generation

BU

Distribution

&

Service BU

Data warehouse

A Single logical data

warehouse environment

consisting of an enterprise

wide repository and data

marts for every BU.

Appendix

Page 37: Cream 8_final_16

Emerging trends in modernizing DW initiatives

37

Source: http://www.gartner.com/document/code/234996?ref=ggrec&refval=2267615

Sr.No Emerging Trend Description Addressed in our

recommendation

Notes

1 Logical data

warehouse

Architecture that accelerates data warehouse initiatives by combining

traditional and nontraditional approaches to support rapidly evolving or

innovative use cases by using new technology to federate relational and

nonrelational (Hadoop and NoSQL) data stores and processes

Yes The LDW practice has entered into a maturing cycle and the time to

pursue it is now.

2 Data Lakes A persistence strategy for centralizing data assets in support of discovery and

analytics. Data Lakes can serve as a data source for data warehouse

initiatives. Processed and curated data from the data lake can be integrated

into the data warehouse

No. Yet, the ability to find and make proper use of the data in a lake will

prove to be challenging even for the most advanced users over time.

Moreover, the inability to track what is being collected in the data lake

will lead to potential governance and regulatory issues for data that

has retention policies attached to it.

3 HTAP Enables a single DBMS platform to support both transactional and analytical

workloads, and thus simplifies information management infrastructure. HTAP

DBMS platforms can participate in a LDW. The HTAP model is especially

suitable for real-time data warehousing requirements

Yes Not mature enough. On the trough of disillusionment in Gartner's hype

cycle. The concept is immature, industry experience is still limited to

the most-leading-edge organizations in a few industry sectors

(primarily financial services), best practices are not yet crystallized,

the vendors' landscape is still quite turbulent, and relevant skills are

almost impossible to find.

4 Data

Virtualization

Has capabilities that bring agility to a data integration strategy, and are used in

logical data warehouse architecture

Yes The use of federated views of data to leverage distributed enterprise

data in the logical data warehouse (LDW) is gaining early interest to

support ways to aggregate and provide data rapidly to the business.

Appendix

Page 38: Cream 8_final_16

Analytics portfolio

38

Source: http://www.gartner.com/document/2594822?ref=solrAll&refval=160099419&qid=5745a45724e4db2a39aeabe8334d985a

Understand

the scope

and context

of decision

Past

Identify

likely

outcomes

Identify the

best course

of action

Future

Create

awareness

that a

decision must

be made

Report on

results of

action

ACT

Descriptive Diagnostic

Predictive

Prescriptive

“What happened?”

Example: Annual

sales by region

“What should I do?”

Example: Price

optimization

“Why did it happen?”

Example: Web

analytics to

understand usage or

abandonment

patterns.

“What will happen?”

Example: Fraud

detection and credit

rating.

Appendix

Page 39: Cream 8_final_16

Use Cases for analytical portfolio

39

Source: http://www.gartner.com/document/2594822?ref=solrAll&refval=160099419&qid=5745a45724e4db2a39aeabe8334d985a

Technique Sample use Case

Report/Dashboard Sales report

Alerts Segmentation of customers by historical revenues

Segmentation How positive (or negative) are statements about your brand?

Technique Sample use Case

OLAP cube Web analytics to understand usage or abandonment patterns.

Data discovery Why are customers expressing negative sentiment?

Bayesian networks Churn analysis to diagnose reasons for losing customers.

De

scri

pti

veD

iagn

ost

ic

Appendix

Page 40: Cream 8_final_16

Use Cases for analytical portfolio

40

Source: http://www.gartner.com/document/2594822?ref=solrAll&refval=160099419&qid=5745a45724e4db2a39aeabe8334d985a

Technique Sample use Case

Regression Predictive maintenance

Time series Fraud detection

Neural networks Propensity modeling for direct marketing/cross-selling

Technique Sample use Case

Game theory Airline scheduling

Influence diagrams Supply chain optimization

Optimization Price optimization

Pre

dic

tive

Pre

scri

pti

ve

Appendix

Page 41: Cream 8_final_16

Debunking common LDW myths

41

Source: http://www.gartner.com/document/2841217?ref=ddrec

Appendix

Page 42: Cream 8_final_16

Cloud Data Integration Architecture Comparison

42

Source: http://www.gartner.com/document/2841217?ref=ddrec

Appendix

Page 43: Cream 8_final_16

Candidate Items for Information Governance board

43

Source: http://www.gartner.com/document/code/260884?ref=grbody&refval=3119918

Appendix

Page 44: Cream 8_final_16

Metrics for Data Management

44

Source: http://www.gartner.com/document/code/213255?ref=grbody&refval=3024120

Metrics for

achieving MDM

should be applied at

four different levels

Action Item: Use this MDM

metrics framework as an

outline to develop your own.

Build links between the

various parts of your MDM

program, since each will link

to different aspects of the

various levels of the

framework.

Appendix

Page 45: Cream 8_final_16

Key outcomes of each stage of data governance framework

45

An agreed and communicated set of data sources supporting the organization's key

reporting and analytical activities.

Defined and formalized business stakeholder and data steward roles to

address each certified information source.

An initial set of data quality metrics used to benchmark and communicate the state of data

quality and the business impact of data quality issues in certified data sources.

Critical reports and queries labeled with a symbol of certified trustworthiness; data consumers

seek out and acknowledge outputs bearing the watermark as authoritative and of high value.

Structured processes for capturing and presenting metadata describing data's lineage from

certified source to consumption point. From a data consumer's perspective, simple ways to

visualize data's lineage at the point of consumption are key to deriving value from it.

Key outcomes1. Identify and

certify trusted

sources

2. Formalize

responsibilities

3. Establish data

quality metrics

4. “Watermark”

outputs

5. Make lineage

visible

Appendix

Page 46: Cream 8_final_16

Maturity model for MDM

46

Source: http://www.gartner.com/document/code/276417?ref=ggrec&refval=2088116

Appendix

Page 47: Cream 8_final_16

Criteria for evaluating MDM maturity

47

Source: http://www.gartner.com/document/code/276417?ref=ggrec&refval=2088116

Appendix

Page 48: Cream 8_final_16

Detailed costs of AWS services likely to be provisioned by Cummins

48

Redshift $246,675

RDS $124,146

AWS Support $15,902

SimpleDB $2,604

EMR $1,793

ElastiCache $834

S3 $688

EC2 $603

Glacier $280

AWS Transfer Out $89

CloudWatch $2

AWS Transfer In $-

These calculations are based

on conservative storage

estimates, data transfer

amounts, and computation

volume for a 40TB analytics

environment.

Appendix

Page 49: Cream 8_final_16

Additional Advisory Expenses - Assumptions

49

AWS Service Level

Business Basic

Number of Consultants: 3 3

Duration of Engagement (wks): 12 32

Hourly Billable Rate: $175 $175

Total Advisory Expenses: $84,000 $224,000

Appendix