data warehouse concepts with dimensional modeling

36
Data Warehouse Concepts Pushpinder Singh PAXCEL Technologies Pvt Ltd.

Upload: pushpinder7979

Post on 21-Jul-2016

33 views

Category:

Documents


2 download

DESCRIPTION

Explains Datawarehousing and dimensional modeling

TRANSCRIPT

Page 1: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Concepts

Pushpinder SinghPAXCEL Technologies Pvt Ltd.

Page 2: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse– It is a Database management system– It is Subject-Oriented:

– means that the data addresses a specific subject such as sales, inventory etc.

– It is Integrated: – means that the data is obtained from a variety of sources

– It is Time-Variant & Non-Volatile: – implies that the data is stored in such a way that when some data is changed, then

the data that has been changed is also kept i.e. historical data is also kept

– It Facilitates on-line analytical processing: – by allowing the data to be viewed in different dimensions or perspectives, to provide

business intelligence

– It is a Collection of data – in support of management’s decision-making process

Page 3: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse

• Important Requirement to be a data warehouse:– On-line query Analysis– Based on historical data

Page 4: Data Warehouse Concepts With Dimensional Modeling

Data Mart

– A scaled down version of the data warehouse that addresses only one subject is called a “Data Mart”.

– For example in an organization we can create data marts for each department

– Data Marts are well suited for medium and small business enterprises as well as for different departments of large organizations.

– Data marts can be combined together to form a data warehouse

Page 5: Data Warehouse Concepts With Dimensional Modeling

Types of Analysis organizations are keen on

– Profitability analysis– Analysis of customer feedback– Analysis of market research/ surveys– Production planning etc

Page 6: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

Page 7: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

Page 8: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

• End-user requirements– What data is presently available and what type of information is being generated at

present?– What measures (or metrics) are presently being used for judging the performance of the

organization/ division/ region and what additional metrics are required– Why do these managers consider this information inadequate– What objectives would the managers like to achieve through the data mart/ data

warehouse?– Are the personnel working with the manager IT-savvy or do they dislike using

computers?– What are the immediate expectations from the data warehouse and what are the long-

term expectations?– Whether historical data (for the last how many years) is available and in what form it is

available (paper form, flat files, databases, etc.)– NOTE: Managers are interested in business process rather than technology. Hence,

while obtaining the user requirements, the data warehouse specialists need to focus only on the business requirements rather than on technology and tools

Page 9: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

Page 10: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

• Software Requirements Specifications– During the requirement elicitation

• Different Users give Different Inputs• Some requirements are common to everyone• Some requirements are unique to a manager• Some requirements from 2 or more users will be contradictory to each other

– At last you will have to• Produce a consolidated list of user requirements and prepare a document• If required assign priority levels to these requirements

– Other issues that should be addressed in the SRS document are:• Security of data• Data Warehouse Administration

– Note• Do circulate this document to all the users • Ensure that the document captures all the important specifications

– Do get the document validated

Page 11: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

Page 12: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

• Study of existing data sources– What are the data sources from which the data has to be extracted (flat files, database management

systems, ERP applications, etc.)– The physical location of the data sources and the communication links used to access this information

by users– For each data source, the computer’s hardware configuration and the operating system– For RDBMS applications, the database engine that is being used (Oracle, Sybase, Informix, DB2, etc.)

and the front-end tools used for developing the GUI (VB, VC++, Java, etc.)– The size of the database and the operational reports that are being generated– The table structure and the details of each and every field for RDBMS and ERP applications– The person responsible for database administration and the persons authorized to give permissions for

accessing the data for use in data warehouse– The measures taken by the information security officer of the organization to provide data security– NOTE:

• As a data warehouse is a combination of a number of data marts, the first step is to identify the data marts.

• Each data mart can be that of a functional area (sales, inventory, marketing etc.) or that of a branch/ sales office

• It is always good to start with a single source data mart and then move on to multiple source data mart

Page 13: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

Page 14: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process• Selection of Development Tools

– A fundamental question: is it necessary to buy commercial data warehouse development tools at all?

– There is no strict answer for this– Using commercially available tools has the following advantages:

• Makes the datawarehouse development very fast• Fast development means low project cost• Using tools will enhance the productivity of the team members• Tools have the functionality of accessing data from various data sources• Administration tools for data warehouses are also available

– Note: The management has to certainly allocate the necessary budget to procure the software tools and get the development team trained on these tools

– The broad classification of the number of tools available in market are:• Tools integrated with other development tools. Organizations such as Microsoft, Oracle, IBM,

SAP, etc. supply these tools• Specialized tools focusing only on business intelligence. Some examples are:

– Informatica– Cognos– Business Objects– Data Stage– Microstrategy

Page 15: Data Warehouse Concepts With Dimensional Modeling

If your Job is to assess & select from the various tools

• Your selection basis could be:– What type of architecture the tool supports? One may like to choose a tool that supports both C/S architecture

and well as web-based architecture– What type of OLAP does the tool support? Some tools support only ROLAP and some tools support MOLAP. We

may like to buy a tool that supports HOLAP– What hardware platforms/ operating systems/ databases do the tools require? Do you have that infrastructure

or do you have to buy a new hardware/ software? The natural inclination will be to buy a tool that can run on existing infrastructure

– What are the features of the tool and does it meet all of the requirements such as• Extraction of data from the data sources of your organization• User interface that provides readily some important reports and facility to generate ad hoc queries and

carry out the analysis• Tools for administration

– How good are the security mechanisms provided by the tool? This is important as the data warehouse has very sensitive data

– Is it possible to mix-and-match? One may like to buy the front-end tools from one vendor and backend tools from another vendor

– How good is the customer support? Is it in your locality?– Though vendors provide initial training on the tools, do not assume that it would be enough– What is the cost of software upgrades? Do check whether the upgrades are provided free of cost and for how

long– What is the cost? Look at the cost keeping in view the return on investment rather than just the absolute dollar

value of the software– For a complete evaluation one has to definitely study the technical features of the tools as well

Page 16: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

Page 17: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

• Data Modeling• Involves– Identification of fact tables, dimension tables for each

data mart and the enterprise-wide data warehouse– Slowly changing dimensions and their type (1, 2 or 3)

• Modeling tools are available now, for example: Erwin

Page 18: Data Warehouse Concepts With Dimensional Modeling

Dimensional Modeling

Sales

Fact Table

Product

Dimension

Time

Dimension

Location

Dimension

Customer

Dimension

Page 19: Data Warehouse Concepts With Dimensional Modeling

Dimensional Modeling

• Dimensional Modeling is a logical data modeling technique. As shown in the last figure, there will be two types of tables:– Fact Table– Dimension Table

• As the fact table is central and all the dimension tables are linked to the fact table this model can be represented as a star and hence this model is also referred to as star schema

Page 20: Data Warehouse Concepts With Dimensional Modeling

Dimensional Modeling

• Fact Table– A central table– Contains measures or facts of a business process– The two types of fields in fact table: facts or measures and foreign

keys from dimension tables– In a fact table, we can represent data at different atomic levels, called

grains– Grain: Granularity of data is a very important factor in the design of

fact table.– What detail has to be stored i.e. what granularity is required is

decided based on the business intelligence reports to be generated

Page 21: Data Warehouse Concepts With Dimensional Modeling

Dimensional Modeling

• Dimension Table– Answers one of the following questions:

• Who (purchased the product)• What (what model was purchased)• When (when the product was purchased, date/ time)• Where (through which regional office the product was purchased) and• How the measure is obtained

– Has a primary key and number of attributes– The attributes in this table describe the dimension– The Time dimension is a dimension one comes across in many data marts and

data warehouses. It can contain the following attributes:• Time_ID (an integer)• CalendarMonth (small integer)• CalendarQuarter (small integer)• CalendarYear (small integer)• FiscalMonth (small integer)• FiscalQuarter (small integer)• FiscalYear (small integer)

Page 22: Data Warehouse Concepts With Dimensional Modeling

Dimensional Modeling• Surrogate Key

– Is a very important concept in data warehouse– Surrogate means ‘deputy’ or ‘substitute’– Is a small integer that can uniquely indentify a record in a dimension

table.– Are generated automatically

• Storage Space Management:– In data warehouse design, the storage space requirement needs to be

estimated to plan the hardware configuration of the servers. For database size estimation, the inputs are:

– Size of the fact table records– The size of the dimension table records– The number of years for which the data has to be stored and– The data type of each attribute

Page 23: Data Warehouse Concepts With Dimensional Modeling

Dimensional Modeling• Slowly Changing Dimension– If the values of attributes in a dimension table change

over a period of time, then these dimensions are called slowly changing dimensions (SCDs).

• These can be represented in three forms:– Type 1 SCD: new data replaces the old data i.e.

historical data is not preserved– Type 2 SCD: new records are added to the dimension

table. The old record is retained– Type 3 SCD: New fields are added in the dimension

table so that the table can hold both old and new values in the same record.

Page 24: Data Warehouse Concepts With Dimensional Modeling

Dimensional Modeling

• Snowflaking– There may be a need to take out some data, to

keep that data in separate table and link that table to the original dimension table. This is called snowflaking

– For example: in the product dimension, the attributes related to the model (model number, model name etc.) can be separated out and this can be linked to the product dimension.

Page 25: Data Warehouse Concepts With Dimensional Modeling

Dimensional Modeling

• Junk Dimension– Sometimes while designing the dimensional model

from the operational databases we find few attributes that neither fit properly to be in fact table/s or dimension table/s. In such a case the options are:• To discard them which may result in the loss of information• To put them in different dimension tables which

unnecessarily increases the number of dimensions or• To use junk dimensions wherein a junk dimension table is

created with the ‘junk’ attributes– NOTE: it will increase the query processing time and

should be used with caution

Page 26: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

Page 27: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

• Architecture Finalization– This decision should be based on

• Technical requirements, as well as• Tools

– Another decision as to tools support:• ROLAP or MOLAP or HOLAP

– If cost is major consideration then data warehouse size is the major consideration• Generally if the size is likely to be less than about 100 GB, then MOLAP tools give

good performance whereas for very large warehouses, ROLAP is better

– The Data warehouse sixing includes• Listing all the dimension tables and fact tables with their attributes• Estimate number of rows for each table for a year• NOTE: the estimate should take into consideration the future requirements (say for

the next 10 years) because the data warehouse has to keep the historical data

Page 28: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

Page 29: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

• Data Extraction, Transformation and Loading & Metadata creation– Generally the most time consuming process– The commercially available tools provide GUI to carry out

all the ETL operations:• Defining the data sources• Carrying out various transformations• Loading the data to the target database

– These tools also provide• Facility to schedule the ETL jobs either in real-time mode or in

batch mode, at specific time (say, during night time)• Also a facility to monitor the complete process of ETL and check

whether the data loaded into the target database is correct

Page 30: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

Page 31: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

• Deployment of End-user Applications and Administration Tools– Always follow the best practices

Page 32: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

Page 33: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

• Data Warehouse Testing– A thorough integration testing of various modules

should be done– Initially warehouse filed trial should be done wherein:

• The warehouse is deployed to a selected group of users.• The main focus during the testing should be on meeting the

functional requirements– In addition you need to test the system for

• Performance• Reliability• Usability, and• Security

Page 34: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

Page 35: Data Warehouse Concepts With Dimensional Modeling

Data Warehouse Development Process

• Operation & Maintenance– The operational phase involves ensuring that the

data warehouse is functioning as per the user requirements.

– Bugs reported need to be fixed, as required by the user new features are to be added and

– The data warehouse experts have to make sure that they carry out the necessary modifications using the configuration management process.

Page 36: Data Warehouse Concepts With Dimensional Modeling

END