data warehouse dimensional model components concept

Data Warehouse Dimensional Model Components Concept Dimensional model is equivalent of logical data design of Data Warehouse, and much more. It is more simplistic in design and suits the purpose of a data warehouse. Dimensional Modeling Concept Dimensional Model is a logical design technique that seeks to present the data in a standard, intuitive framework that allows for high-performance access. It is inherently dimension al, and it adheres to a discipline th at uses the r elational model with some important restrictions. Every dimensional model is composed of one table with a multi-part key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a single-part primary key that corresponds exactly to one of the components of the multi-part key in the fact ta ble. (See Figure) This chara cteristic 'star- like' structure is often called a star join. A fact table, because it has a multi-part primary key made up of two OR more foreign keys, always expresses a many-to- many relationship. The most useful fact tables also contain one OR more numerical measures, OR 'facts,' that occur for the combination of keys that define each record. In Figure, the facts are Units_Sold, Dollars_Sold, and Avg_sales. The most useful facts in a fact table are numeric and additive. Additivity is crucial because data warehou se applications almost never retrieve a single fact ta ble record; rather, they fetch back hundreds, thousand s, OR even millions of these records at a time, and the only useful thing to do with so many records is to add them up. Dimension tables, by contrast, most often contain descriptive textual information, and the attributes (also called classification attributes), which are used for analysis. Dimension attributes are used as the source of most of the interesting constraints in data warehouse queries, and they are virtually always the source of the r ow headers in the SQL a nswer set. Fact Table and Dimension Tables in a Dimensional Model Schema

Lets consider a Data-Warehouse cube. This cube has 4 dimensions and three measures. This means that for every value of 

each of these 4 dimensions there will two values of coordinates. For example:

Co-ordinate [City(X), Product(Y), channel(Z),Month] = [ Sales (Quantity), Sales (Value)]OR [NY, Standard Desk-top, Mail, September 2005] = [2000 units, $15000]

In the dimensional modeling schema, the FACT table contains the value of coordinates against the lowest granularity of allthe possible combinations of dimensions. The dimension tables contain the details of the dimensions, which include theattributes of dimensions including all the higher-level hierarchies. The link between the fact table and all the associated

dimension tables is through a dimension key, which is the lowest level granularity primary key of the dimension tables.

Fact Table- The central linkage in Dimensional Modeling

A fact table contains the value of all the measures linked to the set of dimensions linked to the FACT table. It contains themeasure values for the combination of lowest level of granularity of dimensions. The measures are typically numeric, which

can undergo mathematical aggregation and analysis.

Families of FACT Tables

y  Chains and Circles.y  Heterogeneous products.

y  Transactions and snapshots.

y  Aggregates

Dimension Table- What does and should it contain

The dimension table contains all the information on the dimension. This includes:

a. The primary key (Equivalent foreign key in the Fact Table).

b. All attributes of the dimension. These include:

y  The hierarchy attributes- Consider a business hierarchy-- pin-code to city to district to state to country for locatiodimension. This means that each hierarchy element will be an attribute.

y  Textual as well as the code attributes- Location code as well as the name of the location. This is required, becaus both could be used for different reasons by different users. A power user could be looking for location code (NY01)whereas an end user could be looking for more explicit header (New Jersey).

y  Include all parallel hierarchies ± A product could be having different hierarchies, depending upon if CFO OR Head of sales is looking at it. This enables the done on all hierarchies as well as cross-hierarchies.

y  Production Primary Key Refer Surrogate primary key link to FACT table ± These keys are used because the

 production keys could change OR could be reused. For example a bill number could be reused after 5 years, OR a part number (especially FMCG) could be reused after few years.

y  Production OR source system key- This is required for audit ability OR link to the Extraction data and source


Dimensional Model Schemas- Star, Snow-Flake and Constellation

Dimensional model can be organized in star-schema or snow-flaked schema.

Dimensional Model Star Schema using Star Query

The star schema is perhaps the simplest data warehouse schema. It is called a star schema because the entity-relationshipdiagram of this schema resembles a star, with points radiating from a central table. The center of the star consists of a large

fact table and the points of the star are the dimension tables.

A star schema is characterized by one OR more very large fact tables that contain the primary information in the data

warehouse, and a number of much smaller dimension tables (OR lookup tables), each of which contains information aboutthe entries for a particular attribute in the fact table.

A star query is a join between a fact table and a number of dimension tables. Each dimension table is joined to the facttable using a primary key to foreign key join, but the dimension tables are not joined to each other. The cost-based optimize

recognizes star queries and generates efficient execution plans for them.

A typical fact table contains keys and measures. For example, in the sample schema, the fact table, sales, contain themeasures quantity_sold, amount, and average, and the keys time_key, item-key, branch_key, and location_key. The

dimension tables are time, branch, item and location.

A star join is a primary key to foreign key join of the dimension tables to a fact table.

The main advantages of star schemas are that they:

y  Provide a direct and intuitive mapping between the business entities being analyzed by end users and the schemadesign.

y  Provide highly optimized performance for typical star queries.y  Are widely supported by a large number of business intelligence tools, which may anticipate OR even require that

the data-warehouse schema contains dimension tables

Snow-Flake Schema in Dimensional Modeling

The snowflake schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is calle

a snowflake schema because the diagram of the schema resembles a snowflake.

Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped intomultiple tables instead of one large table. For example, a location dimension table in a star schema might be normalized intoa location table and city table in a snowflake schema. While this saves space, it increases the number of dimension tablesand requires more foreign key joins. The result is more complex queries and reduced query performance. Figure above

 presents a graphical representation of a snowflake schema.

Fact Constellation Schema

This Schema is used mainly for the aggregate fact tables, OR where we want to split a fact table for better comprehension.The split of fact table is done only when we want to focus on aggregation over few facts & dimensions.

Dimensional Modeling vs. Relational Modeling

Dimensional modeling is different from the OLTP normalized modeling to enable analysis and querying through massiveand unpredicted queries. Something which is a relational model is ill-equipped to handle.

How Dimensional model is different from an E-R diagram?

y  An E-R diagram (used in OLTP or transactional system) has highly normalized model (Even at a logical level),whereas dimensional model aggregates most of the attributes and hierarchies of a dimension into a single entity.

y  An E-R diagram is a complex maze of hundreds of entities linked with each other, whereas the Dimensional modelhas logical grouped set of star-schemas.

y  The E-R diagram is split as per the entities. A dimension model is split as per the dimensions and facts.y  In an E-R diagram all attributes for an entity including textual as well as numeric, belong to the entity table.

Whereas a 'dimension' entity in dimension model has mostly the textual attributes, and the 'fact' entity has mostly

numeric attributes.

Dimensional modeling is a better approach for Data warehouse compared to standard Data Model.

The dimensional model has a number of important data warehouse advantages that the ER model lacks.

First advantage of the dimensional model is that there are standard type of joins and framework. All dimensions can bethought of as symmetrically equal entry points into the fact table. The logical design can be done independent of expectedquery patterns. The user interfaces are symmetrical, the query strategies are symmetrical, and the SQL generated against th

dimensional model is symmetrical. In other words,

y  You will never find attributes in fact tables and facts in dimension tables.

y  If you see a non-fact field in the fact table, you can assume that it is a key to a dimension table

Second advantage of the dimensional model is that it is smoothly extensible to accommodate unexpected new dataelements and new design decisions. First, all existing tables (both fact and dimension) can be changed in place by simplyadding new data rows in the table. Data should not have to be reloaded. Typically, No query tool OR reporting tool needs to

 be reprogrammed to accommodate the change. All old applications continue to run without yielding different results. Youcan, respectively, make the following graceful changes to the design after the data warehouse is up and running by:

y  Adding new unanticipated facts (that is, new additive numeric fields in the fact table), as long as they are consistenwith the fundamental grain of the existing fact table.

y  Adding completely new dimensions, as long as there is a single value of that dimension defined for each existingfact record

y  Adding new, unanticipated dimensional attributes.

y  Breaking existing dimension records down to a lower level of granularity from a certain point in time forward.

Third advantage of the dimensional model is that there is a body of standard approaches for handling common modelingsituations in the business world. Each of these situations has a well-understood set of alternatives that can be specifically

 programmed in report writers, query tools, and other user interfaces. These modeling situations include:

y  Slowly changing dimensions, where a 'constant' dimension such as Product OR Customer actually evolves slowlyand asynchronously. Dimensional modeling provides specific techniques for handling slowly changing dimensionsdepending on the business environment.

y  Heterogeneous products, where a business such as a bank needs to:o  track a number of different lines of business together within a single common set of attributes and facts, but

at the same time..

o  it needs to describe and measure the individual lines of business in highly idiosyncratic ways using

incompatible measures.

Foundation & Conformed Dimensions and Facts in Data Warehouse Dimensional Model

Data Warehouse is a repository which feeds data marts, and other down stream systems. It has to be designed to have globaor re-usable set of dimensions and measures.

Data Warehouse modeling has two components:

y  Foundation to support medium to long-term capabilities, without the need to unsettle the structure time and again.

y  The individual phases for developments of Data Marts eventually merge into the enterprise wide Data Warehouse.

A project has to address both the foundation and phase elements. Every stage in the Data Warehouse project will addressthese two elements in distinct and overt manner. For dimensional modeling, the following foundation setting elements will

work like reusable components. They will be same across the Data-Marts/Data Warehouse for current and the future phasesof developments:

Standard set of foundation or conformed dimensions. This means that:

y  Dimensions are super-sets of all possible attributes for that dimension. For example, customer 'age' attribute maynot be required for sales analysis, but required for Credit Analysis. Therefore, when creating the standarddimensions, one make the superset of attributes.

y  Dimensions include all possible levels of business hierarchy. For example- A portfolio analysis of a channel maynot require the branch level location, but the agent productivity analysis could.

y  Dimensions to include not only categories, but descriptive textual attributes as well wherever needed.For example- A textual detail for a location code could be needed for distribution analysis, but many not be needed for  portfolio analysis.

y  Make the dimension most granular- Many a times the analysis does not need to go down to the most granular leve

of customer ID. In case, customer moves from his existing customer segment, the whole dimensional modelingcould lead to issues, if the dimension is starting from customer group upwards

examples of foundation dimensions are- Customer , Location, Channel, Sales Lead etc. PLEASE REFER Universal

Dimensions for more examples.

Standard set of foundation or conformed facts. This means that:

y  A fact table will include all possible units of measures for given set of dimensions. For example sales by numberscould need only the number of 'Crates' in one data mart and 'Pieces' in the other. However, both units for the givenmeasure should be included even if there is a standard conversion rate. These standards conversion rates keep onchanging with time.

y  A Fact table logically groups a business instance. For example you could require distribution of a 'product' to retailoutlet for distribution analysis. However, you will require the fact on final sale to the end customer for sales

analysis. As a guideline, a highly linked business process should get combined in a single fact.

Standard set of foundation measures. This means that

y  All the measures and their possible units to be listed out.

y  Measures are most susceptible to having confusing definitions OR to be mis-named. Detailed formulas behindmeasures are must. Refer Sales Revenue Fact-Measure as an example.

examples of foundation measures are- Sales Measures, Customer Measures, etc. PLEASE REFER FACTS-Base Measures

for more examples.

Slowly Changing Dimensions SCD in Dimensional Modeling

Dimensional model has to address some complex situations liked slowly changing dimensions.

Slowly Changing Dimensions

Entities change over time. Customer demographics, product characteristics, classification rules, status of customers etc. leadto changes in the attributes of dimensions. In a transaction system, many a times the change is overwritten and track of change is lost.

For example a source system may have only the latest customer PIN Code, as it is needed to send the marketing and billingstatements. However, a data warehouse needs to maintain all the previous PIN Codes as well, because we need to track on

how many customers move to new locations over what frequency.

A key benefit for Data Warehouse is to provide historical information, which is typically over-written (and thus lost)in thetransaction systems. How to handle slowly changing dimensions in a Dimensional Model is a key determinant to that benefit.

There are three ways to handle the same:

Slowly Changing Dimension method 1 (In short SCD 1)

The way most of the source systems will handle it- Overwrite the attribute value. For example if a customer¶s marital statushas moved from 'Unmarried' to 'Married', we over-write 'unmarried' to 'Married'. Similarly, if an insurance policy status has

moved from 'Lapsed' to 'Re-instated' the new status is over written on the old status. This is obviously done, when we arenot analyzing the historical information.

Slowly Changing Dimension Method 2 (in short SCD 2)

This is the true-blue technique to deliver precise historical analysis. This is used, when there is more than one change in

the attributes of an entity, and we need to track the date of change of the attribute.

In this method, a new record is added whereby the new record is given a separate identifier as the primary key. We cannotuse the production key as the primary key here as it has not changed (Customer ID has remained the same, while the valueof its attribute 'marital status' has changed). This new identifier is called the surrogate key.

Apart from adding a new record and providing a new primary (surrogate) key, the validity period for this new record is also


For example- You have a dimensional table with customer_ID '110002' with marital status as 'single'. Overtime, customer 

gets married and also moved to a new location. The customer dimension record will be:

Surrogate Key  Customer ID  Date Valid Marital

Status Date of Birth  City 

1100021 110002 Sept 23, 2004 Single Jan8, 1982 Palo Alto

1100022 110002 Oct 25, 2005 Married Jan8, 1982 Palo Alto

1100023 110002 Nov 23, 2005 Married Jan8, 1982 San Francisco

Slowly changing dimension method 3 (SCD 3)

This is a mid-way between method 1 and method 2. Here we don¶t add an additional record, but add a new field 'oldattribute value'. However, this has limitations. This method has to know from the beginning on what attributes will change.This is because a new field/attribute has to be added in the design for every attribute, which can change. Secondly, attributecan change maximum once in the lifetime of the entity OR at least the lifetime of the data warehouse.

Surrogate Key  Customer ID Marital

Status Date of Birth  City 

Marital Status

Old City


1100021 110002 Married Jan8, 1982 San Francisco SinglePaloAlto

 NOTE ± The term of 'Slowly changing dimension' is used because of it being a universally acknowledged term. However,the same methods will apply to fast changing dimensions as well.

Surrogate Keys as Primary keys of dimension tables

