bi project estimating

Post on 19-Dec-2016

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Jim Gallo

Senior Data Warehouse Architect Information Control Corporation

October 21, 2010

Tools and Techniques for Accurately Estimating BI/DW

Projects

Agenda

•  The Business – IT Conundrum

•  Requirements Gathering – the Goldilocks Approach •  Deriving Information for the Estimate

•  The Estimating Process •  Building an Estimating Model

•  Risk Abatement and Uncertainty

•  Improving the Estimating Process

The Business – IT Conundrum

O&M Funds

Capital Funds The color of money matters!

Information to be Gathered – What’s Important?

•  Business Definition o  Goals/Measures o  Business Problems o  Questions to Be Answered

•  Scope o  Ad Hoc o  Canned Reports o  Self-service Reports o  KPI/Scorecard o  Dashboard

•  Queries and Reports o  Define layout and content o  Get samples, if available

•  Audience Profile o  Target Audience o  Types of Users o  Number of Users o  Frequency of Access/Concurrency

•  Analysis o  Facts and Dimensions o  Hierarchies o  Granularity

•  Content o  Entities o  Attributes o  Sources (internal and external)

•  Output Format o  Reporting Tool o  Web (HTML, etc.) o  Other (.pdf, .xls, .doc., etc.)

•  Operational o  Availability o  Refresh frequency

•  Data Quality •  Security •  History and Retention

The Goldilocks Approach

Information to be Gathered – What’s Important?

•  Business Definition o  Goals/Measures o  Business Problems o  Questions to Be Answered

•  Scope o  Ad Hoc o  Canned Reports o  Self-service Reports o  KPI/Scorecard o  Dashboard

•  Queries and Reports o  Define layout and content o  Get samples, if available

•  Audience Profile o  Target Audience o  Types of Users o  Number of Users o  Frequency of Access/Concurrency

•  Analysis o  Facts and Dimensions o  Hierarchies o  Granularity

•  Content o  Entities o  Attributes o  Sources (internal and external)

•  Output Format o  Reporting Tool o  Web (HTML, etc.) o  Other (.pdf, .xls, .doc., etc.)

•  Operational o  Availability o  Refresh frequency

•  Data Quality •  Security •  History and Retention

The total effort is directly related to 3 and only 3 variables.

Any guesses? 1.  Number of data elements 2.  Number of source files 3.  Expectations and reality of data quality

Why is the postulate important? Fact: 60% - 80% of BI/DW project hours deal specifically with DATA

INTEGRATION DATA INTEGRATION represents 70% - 90% of project RISK. Fact:

Jim’s Postulate

If you can adequately estimate the number of fields and source files, you can then derive, with relative certainty hours for:

•  Data modeling •  Data quality assessment and abatement •  ETL design and development •  Physical database design

•  Business involvement •  Testing

As well as the majority of hours needed for:

Derivations from the 3 Key Variables

Quantifying the Key Variables

Commonwealth County

Region City

Zip Code Year

Quarter

Month

Week

Day Male

Female

Unknow

n

16 -

25

26 -

35 36

- 45

46 -

55 >5

5

MOLAP

ROLAP (Star) Full Relational

Source Systems

Initial Requirements

Definition

Detailed Requirements Definition

Focus Here

Business Questions – Retail Bank - Marketing

1.  How profitable is the customer? 2.  What do we know about our customers and how can we know them better? 3.  How can we retain desirable customers? 4.  The specific list of questions that the group would like to be able to answer quickly and in a self-service

fashion includes: 5.  How do we define profitability of sales? 6.  What customers are using what channels and why? How profitable are they? 7.  Who are my at risk customers? 8.  What fees are we refunding at the associate and account level? 9.  What fees are we retaining at the associate and account level? 10.  What is the customer’s spending behavior and how do we make customers “sticky”? 11.  Should we have more quantity or quality of sales? 12.  How can we know our customers better across channels? 13.  Are we performing transactions accurately? 14.  Are customers in the right products? 15.  What are my customers’ activities? 16.  Who are the best potential customers so that we can focus our sales efforts on them? 17.  Which methods are the most/least expensive from a channel perspective? 18.  How do we drive customers to the least expensive channel and retain them? 19.  What benefits are received by HNB when fees are waived? 20.  Are we staffing resources accurately? 21.  Are we forecasting and measuring our efficiency? 22.  How are we managing costs, forecasting and how do we measure this? 23.  What is the customer’s profitability potential? 24.  across all mechanisms at customer accounts?

Quantifying the Key Variables

Measure

Time

Geography

Custom

er

Bank Product

Channel

Bank

Organization/ A

ssociate

Custom

er D

emographics

Consum

er Product

Category

Profitability ($) x x x x x x x Revenue ($) x x x x x x x

Cost ($) x x x x x x x Products (#) x x x x x x x Purchase

Transactions (#) x x x x x x x x

Purchase Transactions ($) x x x x x x x x

Refunds/Waived Fees ($) x x x x x x x

Retention (#) x x x x x x x

Source System Mapping

Initial or Detailed Requirements Definition.

From here you can estimate the number of sources and “guesstimate” data quality and ETL complexity

The Estimating Process

•  Questions •  Measures •  Goals

Warehouse - Tables (DM x3) - Attributes (tables x15)

Facts & Dimensions (Facts Qualifier

Matrix)

•  Semantic Layers •  Reports •  KPIs •  Dashboard Components

Cubes

Marts - Tables - Attributes (tables x15)

Initial Load - Columns - Rows

History - Number years - Variability vs. Initial Load

Source System Map

Data Quality Profile - Columns - Rows

Testing

Project Management

Business Analysis

Information Delivery

Data Integration

Requirements

Primary Derivations

Secondary Derivations

Build an Estimating Model – ETL Example

Work with data

modeler

Staging Area

Populate DW

Hours Summary

Target Columns

Data Sources

Complexity

Populate DM Tabs

•  Total Hours (all) •  Data Modeling •  DBA •  Source System Profile •  Data Quality •  ETL •  BI and Reporting •  Training •  Testing

Build an Estimating Model – Summarization

Break/fix after testing

cycles

Schedule and FTE

Approximation

Build an Estimating Model – Smoothing and Sequencing

Variance (Estimated

vs. Planned)

Estimated Hours

Planned Hours

(smoothed)

The Giggle Test and The Law of Big Numbers

70% – 77% of hours have been given due consideration 70% – 90% of RISK has been thought through

Assume a 1-year project

Identify Risks Identify Unknowns

Create a Risk Abatement Action Statement

Insert Hours and Tasks into Project Plan

Risk Abatement and Uncertainty

Risk Abatement and Uncertainty (continued)

If you still believe that all risks and uncertainties have been accounted for, apply a management contingency.

15% - 20% contingency is not uncommon and is standard a practice in most Project Management methodologies

Therefore, total estimate = Detailed estimate * 1.15 or * 1.20

So youʼre still not feeling good about the plan?

Develop Estimating

Model

Requirements

Tasks

Assumptions

Risks

Monitor Project

Project Plan

Determine Cause of Variances

Variances

List Variances by Task and Role

Adjust Model to Account for Observed Variances

Compare Actual Hours to Estimate

Clarity Initial Estimate

Continuous Improvement

Cycle

Improving The Estimation Process

Contact Information

•  If you have further questions or comments:

Jim Gallo

Senior Data Warehouse Architect Information Control Corporation

jgallo@iccohio.com

(614) 523-3070

top related