04 dimensional modelling_logical design
TRANSCRIPT
-
8/6/2019 04 Dimensional Modelling_Logical Design
1/59
Nkumba University
Logical Design in
Data Warehouses(DIMENSIONAL MODELLING)
Lecturer: Mulondo Edward
-
8/6/2019 04 Dimensional Modelling_Logical Design
2/59
-
8/6/2019 04 Dimensional Modelling_Logical Design
3/59
Dimensional Modeling -1
Dimensional Modeling is a design concept
used by many data warehouse designers to
build their datawarehouse.
In this design model all the data is stored in two
types of tables - Facts table and Dimension table.
Fact table contains the facts/measurements of the
business and the dimension table contains thecontext of measurements i.e. the dimensions on
which the facts are calculated.
-
8/6/2019 04 Dimensional Modelling_Logical Design
4/59
Dimensional Modeling -1
Dimensional Modeling is a logical designtechnique that seeks to present the data in astandard intuitive framework that allows for high-performance access. It is inherently dimensional and it adheres to a
discipline that uses the relational modelwith someimportant restrictions.
Every dimensional model is composed of one table
with a multipart key called the fact table and a set ofsmaller tables called dimension tables.
Each dimension table has a single-part primary keythat corresponds exactly to one of the components ofthe multipart key in the fact table.
-
8/6/2019 04 Dimensional Modelling_Logical Design
5/59
Data Modeling
This is a very important step in the datawarehousing project. The foundation of the data warehousing system is the
data model. A good data model will allow the datawarehousing system to grow easily, as well as allowingfor good performance.
In data warehousing project, the logical datamodel is built based on user requirements, and
then it is translated into the physical data model. Part of the data modeling exercise is often the
identification of data sources.
-
8/6/2019 04 Dimensional Modelling_Logical Design
6/59
-
8/6/2019 04 Dimensional Modelling_Logical Design
7/59
Prerequisites to Logical design
Business requirements are statements ofwhat users need the data warehouse for.
Defining Requirements is different for a datawarehouse from those of anoperational(transactional) system.
Why?
Usage of information is unpredictable
Guide to defining requirements: dimensionalnature of business data
-
8/6/2019 04 Dimensional Modelling_Logical Design
8/59
Prerequisites to Logical design
A conceptual data model identifies thehighest-level relationships between thedifferent entities. Features of a conceptual datamodel include:
the important entities and the relationships amongthem.
No attribute is specified.
No primary key is specified.
It enables us to understand at a high level thedifferent entities in our data and how theyrelate to one another
-
8/6/2019 04 Dimensional Modelling_Logical Design
9/59
Prerequisites to Logical design
The figure
shown is an
example of a
conceptual datamodel.
How many
entities do we
have here?
-
8/6/2019 04 Dimensional Modelling_Logical Design
10/59
Prerequisites to Logical design
You translate your requirements into a systemdeliverable by creating the logical and physicaldesign for the data warehouse. You then
define : The specific data content
Relationships within and between groups of data
The system environment supporting your datawarehouse
The data transformations required
The frequency with which data is refreshed
-
8/6/2019 04 Dimensional Modelling_Logical Design
11/59
Logical vs. Physical Design in Data
Warehouses
The logical design is more conceptual and
abstract than the physical design.
In the logical design, you look at the logical
relationships (dependence/cause/links) among
the objects.
In the physical design, you look at the most
effective way of storing and retrieving the objectsas well as handling them from a transportation
and backup/recovery perspective.
-
8/6/2019 04 Dimensional Modelling_Logical Design
12/59
Logical vs. Physical Design in Data
Warehouses
A logical data model describes the data in asmuch detail as possible, without regard to howthey will be physical implemented in thedatabase.
Features of a logical data model include: Includes all entities and relationships among them.
All attributes for each entity are specified.
The primary key for each entity is specified.
Foreign keys (keys identifying the relationshipbetween different entities) are specified.
Normalization occurs at this level.
-
8/6/2019 04 Dimensional Modelling_Logical Design
13/59
Logical vs. Physical Design in Data
Warehouses
When creating a data warehouse, you create a
logical design before the physical one.
By beginning with the logical design, you focus
on the information requirements and save the
implementation details for later.
-
8/6/2019 04 Dimensional Modelling_Logical Design
14/59
-
8/6/2019 04 Dimensional Modelling_Logical Design
15/59
Creating a Logical Design
The process of logical design involves
arranging data into a series of logical
relationships called entities and attributes.
An entity represents a chunk ofData Warehousing
Schemas information. In relational databases, an
entity often maps to a table.
An attribute is a component of an entity that helpsdefine the uniqueness of the entity. In relational
databases, an attribute maps to a column.
-
8/6/2019 04 Dimensional Modelling_Logical Design
16/59
Creating a Logical Design
The steps for designing the logical data model
are as follows:
Specify primary keys for all entities.
Find the relationships between different entities.
Find all attributes for each entity.
Resolve many-to-many relationships.
Normalization.
-
8/6/2019 04 Dimensional Modelling_Logical Design
17/59
Creating a Logical Design: Logical Data
Model
-
8/6/2019 04 Dimensional Modelling_Logical Design
18/59
Creating a Logical Design
While entity-relationship diagramming has
traditionally been associated with highly
normalized models such as OLTP applications,
the technique is still useful for data
warehouse design in the form ofdimensional
modeling.
-
8/6/2019 04 Dimensional Modelling_Logical Design
19/59
Creating a Logical Design
In dimensional modeling, instead of seeking todiscover atomic units of information (such asentities and attributes) and all of the
relationships between them, you identify: which information belongs to a central fact table and
which information belongs to its associated dimensiontables.
You identify business subjects orfields of data,define relationships between business subjects,and name the attributes for each subject.
-
8/6/2019 04 Dimensional Modelling_Logical Design
20/59
-
8/6/2019 04 Dimensional Modelling_Logical Design
21/59
Physical Data Model(1)
Physical data model represents how the model
will be built in the database. A physical database
model shows all table structures, including:
column name,
column data type,
column constraints,
primary key, foreign key, and
relationships between tables.
-
8/6/2019 04 Dimensional Modelling_Logical Design
22/59
Physical Data Model(2)
Features of a physical data model include:
Specification of all tables and columns.
Foreign keys are used to identify relationships between
tables.
Denormalization may occur based on user requirements.
Physical considerations may cause the physical data model
to be quite different from the logical data model.
Physical data model will be different for different RDBMS.
For example, data type for a column may be different
between MySQL and SQL Server
-
8/6/2019 04 Dimensional Modelling_Logical Design
23/59
Physical Data Model(3)
The steps for physical data model design are
as follows:
Convert entities into tables.
Convert relationships into foreign keys.
Convert attributes into columns.
Modify the physical data model based on physical
constraints / requirements.
-
8/6/2019 04 Dimensional Modelling_Logical Design
24/59
Physical Data Model(4)
Physical Data Model Example
-
8/6/2019 04 Dimensional Modelling_Logical Design
25/59
Physical Data Model(4)
Comparing the logical data model shown in
the previous slide with the logical data
model diagram (seen earlier), we see the main
differences between the two: Entity names are now table names.
Attributes are now column names.
Data type for each column is specified. Data types can be different depending on the actual
database being used.
-
8/6/2019 04 Dimensional Modelling_Logical Design
26/59
-
8/6/2019 04 Dimensional Modelling_Logical Design
27/59
How Dimensional model is different from
an E-R diagram?
The E-R diagram is split as per the entities. A
dimension model is split as per the
dimensions and facts.
In an E-R diagram all attributes for an entity
including textual as well as numeric, belong to
the entity table.
Whereas a 'dimension' entity in dimension model
has mostly the textual attributes, and the 'fact'
entity has mostly numeric attributes.
-
8/6/2019 04 Dimensional Modelling_Logical Design
28/59
Data Warehousing Schemas
A schema is a collection of database objects,including tables, views, indexes, and synonyms.
You can arrange schema objects in the schema
models designed for data warehousing in avariety of ways. Most data warehouses use a dimensional model.
The model of your source data and the requirementsof your users help you design the data warehouse
schema. You can sometimes get the source model from your
company's enterprise data model and reverse-engineer the logical data model for the datawarehouse from this.
-
8/6/2019 04 Dimensional Modelling_Logical Design
29/59
Data Warehousing Schemas
The physical implementation of the logical
data may require some changes to adapt it to
your system parameterssize of machine,
number of users, storage capacity, type of
network, and software.
-
8/6/2019 04 Dimensional Modelling_Logical Design
30/59
Star Schemas
The star schema is the simplest data
warehouse schema.
It is called a star schema because the diagramresembles a star, with points radiating from a
center.
The center of the star consists of one or more
fact tables and the points of the star are the
dimension tables, as shown in Figure 21.
-
8/6/2019 04 Dimensional Modelling_Logical Design
31/59
Figure 21
-
8/6/2019 04 Dimensional Modelling_Logical Design
32/59
Star Schemas
The most natural way to model a data warehouse
is as a star schema, where only one join
establishes the relationship between the fact
table and any one of the dimension tables.
A star schema optimizes performance by keeping
queries simple and providing fast response time.
All the information about each level is stored in onerow.
-
8/6/2019 04 Dimensional Modelling_Logical Design
33/59
Other Schemas
Some schemas in data warehousing
environments use third normal form rather
than star schemas.
Another schema that is sometimes useful is
the snowflake schema, which is a star schema
with normalized dimensions in a tree
structure.
-
8/6/2019 04 Dimensional Modelling_Logical Design
34/59
Snowflake Schema
The snowflake schema is an extension of the
star schema, where each point of the star
explodes into more points.
In a star schema, each dimension is
represented by a single dimensional table,
whereas in a snowflake schema, that
dimensional table is normalized into multiplelookup tables, each representing a level in the
dimensional hierarchy.
-
8/6/2019 04 Dimensional Modelling_Logical Design
35/59
Sample snowflake schema
-
8/6/2019 04 Dimensional Modelling_Logical Design
36/59
Snowflake Schema advantages and
disadvantages
The main advantage of the snowflake schema
is the improvement in query performance due
to minimized disk storage requirements and
joining smaller lookup tables.
The main disadvantage of the snowflake
schema is the additional maintenance efforts
needed due to the increase in number of thelookup tables.
-
8/6/2019 04 Dimensional Modelling_Logical Design
37/59
Data Warehousing Objects
Fact tables and dimension tables are the twotypes of objects commonly used in dimensionaldata warehouse schemas.
Fact tables are the large tables in your datawarehouse schema that store businessmeasurements.
Fact tables typically contain facts and foreign keys tothe dimension tables.
Fact tables represent data, usually numeric andadditive, that can be analyzed and examined.
Examples include sales, cost, and profit.
-
8/6/2019 04 Dimensional Modelling_Logical Design
38/59
Data Warehousing Objects
Dimension tables, also known as lookup or
reference tables, contain the relatively static
data in the data warehouse.
Dimension tables store the information you
normally use to contain queries.
Dimension tables are usually textual and
descriptive and you can use them as the row
headers of the result set.
Examples are customers or products.
-
8/6/2019 04 Dimensional Modelling_Logical Design
39/59
Fact Tables
A fact table typically has two types of
columns: those that contain numeric facts
(often called measurements), and those that
are foreign keys to dimension tables. A fact table contains either detail-level facts
or facts that have been aggregated.
Fact tables that contain aggregated facts are oftencalled summary tables.
-
8/6/2019 04 Dimensional Modelling_Logical Design
40/59
Fact Tables
A fact table usually contains facts(measurements) with the same level ofaggregation.
Though most facts are additive, they can also besemi-additive or non-additive. Additive facts can be aggregated by simple
arithmetical addition. A common example of this issales.
Non-additive facts cannot be added at all. An exampleof this is averages.
Semi-additive facts can be aggregated along some ofthe dimensions and not along others.
-
8/6/2019 04 Dimensional Modelling_Logical Design
41/59
Fact Tables
A fact table contains the measures of interest
(Facts).
For example, sales amount would be such a
measure. This measure is stored in the fact table with the
appropriate granularity.
For example, it can be sales amount by store by day.
In this case, the fact table would contain three
columns: A date column, a store column, and a sales
amount column.
-
8/6/2019 04 Dimensional Modelling_Logical Design
42/59
Creating a New Fact Table
You must define a fact table for each star
schema.
From a modeling standpoint, the primary key
of the fact table is usually a composite key
that is made up of all of its foreign keys.
-
8/6/2019 04 Dimensional Modelling_Logical Design
43/59
Creating a New Fact Table
Granularity
The first step in designing a fact table is todetermine the granularity of the fact table.
By granularity, we mean the lowest level ofinformation that will be stored in the fact table. Thisconstitutes two steps:
Determine which dimensions will be included.
Determine where along the hierarchy of each dimension theinformation will be kept.
The determining factors usually goes back to therequirements.
-
8/6/2019 04 Dimensional Modelling_Logical Design
44/59
Creating a New Fact Table
Which Dimensions To Include
Determining which dimensions to include is
usually a straightforward process, because
business processes will often dictate clearly whatare the relevant dimensions.
For example, in an off-line retail world, the
dimensions for a sales fact table are usually time,
geography, and product.
-
8/6/2019 04 Dimensional Modelling_Logical Design
45/59
Creating a New Fact Table
What Level Within Each Dimensions To Include Determining which part of hierarchy the information is stored
along each dimension is a bit more tricky.
This is where user requirement (both stated and possibly future)plays a major role.
In the above example, will the supermarket wanting to doanalysis along at the hourly level? (i.e., looking at how certainproducts may sell by different hours of the day.) If so, it makessense to use 'hour' as the lowest level of granularity in the timedimension.
Ifdaily analysis is sufficient, then 'day' can be used as thelowest level of granularity. Since the lower the level of detail,the larger the data amount in the fact table, the granularityexercise is in essence figuring out the sweet spot in the tradeoffbetween detailed level of analysis and data storage.
-
8/6/2019 04 Dimensional Modelling_Logical Design
46/59
Dimension Tables
A dimension is a structure, often composed of one or morehierarchies, that categorizes data.
Dimensional attributes help to describe the dimensionalvalue. They are normally descriptive, textual values.
Several distinct dimensions, combined with facts, enableyou to answer business questions.
Commonly used dimensions are customers, products, andtime.
Dimension data is typically collected at the lowest level ofdetail and then aggregated into higher level totals that aremore useful for analysis. These natural rollups or aggregations within a dimension table
are called hierarchies.
-
8/6/2019 04 Dimensional Modelling_Logical Design
47/59
-
8/6/2019 04 Dimensional Modelling_Logical Design
48/59
Hierarchies
Definitions:
Hierarchies are logical structures that use ordered
levels as a means of organizing data.
Hierarchies are the paths over which any data (OR
measure) is summarized
-
8/6/2019 04 Dimensional Modelling_Logical Design
49/59
Hierarchies
A hierarchy can be used to define dataaggregation.
For example, in a time dimension, a hierarchy
might aggregate data from the month level to thequarter level to the year level.
A hierarchy can also be used to define anavigational drill path and to establish a family
structure. Moving between the levels of a hierarchy is called
drilling up and drilling down
-
8/6/2019 04 Dimensional Modelling_Logical Design
50/59
Hierarchies
Within a hierarchy, each level is logically
connected to the levels above and below it.
Data values at lower levels aggregate into the data
values at higher levels. A dimension can be composed of more than
one hierarchy.
For example, in the product dimension, theremight be two hierarchiesone for product
categories and one for product suppliers.
-
8/6/2019 04 Dimensional Modelling_Logical Design
51/59
-
8/6/2019 04 Dimensional Modelling_Logical Design
52/59
-
8/6/2019 04 Dimensional Modelling_Logical Design
53/59
Hierarchies
Hierarchies are also essential components in
enabling more complex rewrites.
For example, the database can aggregate an
existing sales revenue on a quarterly base to a
yearly aggregation when the dimensional
dependencies between quarter and year are
known.
-
8/6/2019 04 Dimensional Modelling_Logical Design
54/59
Typical Dimension Hierarchy
Figure 22 illustrates a
dimension hierarchy
based on customers.
- i.e. You can analyse
your Customers byregion, sub region,
Country and by
individual customer
-
8/6/2019 04 Dimensional Modelling_Logical Design
55/59
Unique Identifiers
Unique identifiers are specified for onedistinct record in a dimension table.
Artificial unique identifiers are often used to
avoid the potential problem of uniqueidentifiers changing.
Unique identifiers are can be represented witha prefix e.g. the # character, depending onDBMS
For example, #customer_id.(Oracle)
-
8/6/2019 04 Dimensional Modelling_Logical Design
56/59
Relationships
Relationships guarantee business integrity.
An example is that if a business sells something,
there is obviously a customer and a product.
Designing a relationship between the sales
information in the fact table and the
dimension tables products and customers
enforces the business rules in databases.
-
8/6/2019 04 Dimensional Modelling_Logical Design
57/59
Example of Data Warehousing
Objects and Their Relationships
Figure 23 illustrates a common example of a
sales fact table and dimension tables
customers, products, promotions, times, and
channels.
-
8/6/2019 04 Dimensional Modelling_Logical Design
58/59
-
8/6/2019 04 Dimensional Modelling_Logical Design
59/59
Practical
Next lecture: use given data to create fact
tables, dimensions and a star schema