04 dimensional modelling_logical design

Upload: edward-mulondo

Post on 08-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    1/59

    Nkumba University

    Logical Design in

    Data Warehouses(DIMENSIONAL MODELLING)

    Lecturer: Mulondo Edward

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    2/59

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    3/59

    Dimensional Modeling -1

    Dimensional Modeling is a design concept

    used by many data warehouse designers to

    build their datawarehouse.

    In this design model all the data is stored in two

    types of tables - Facts table and Dimension table.

    Fact table contains the facts/measurements of the

    business and the dimension table contains thecontext of measurements i.e. the dimensions on

    which the facts are calculated.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    4/59

    Dimensional Modeling -1

    Dimensional Modeling is a logical designtechnique that seeks to present the data in astandard intuitive framework that allows for high-performance access. It is inherently dimensional and it adheres to a

    discipline that uses the relational modelwith someimportant restrictions.

    Every dimensional model is composed of one table

    with a multipart key called the fact table and a set ofsmaller tables called dimension tables.

    Each dimension table has a single-part primary keythat corresponds exactly to one of the components ofthe multipart key in the fact table.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    5/59

    Data Modeling

    This is a very important step in the datawarehousing project. The foundation of the data warehousing system is the

    data model. A good data model will allow the datawarehousing system to grow easily, as well as allowingfor good performance.

    In data warehousing project, the logical datamodel is built based on user requirements, and

    then it is translated into the physical data model. Part of the data modeling exercise is often the

    identification of data sources.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    6/59

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    7/59

    Prerequisites to Logical design

    Business requirements are statements ofwhat users need the data warehouse for.

    Defining Requirements is different for a datawarehouse from those of anoperational(transactional) system.

    Why?

    Usage of information is unpredictable

    Guide to defining requirements: dimensionalnature of business data

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    8/59

    Prerequisites to Logical design

    A conceptual data model identifies thehighest-level relationships between thedifferent entities. Features of a conceptual datamodel include:

    the important entities and the relationships amongthem.

    No attribute is specified.

    No primary key is specified.

    It enables us to understand at a high level thedifferent entities in our data and how theyrelate to one another

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    9/59

    Prerequisites to Logical design

    The figure

    shown is an

    example of a

    conceptual datamodel.

    How many

    entities do we

    have here?

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    10/59

    Prerequisites to Logical design

    You translate your requirements into a systemdeliverable by creating the logical and physicaldesign for the data warehouse. You then

    define : The specific data content

    Relationships within and between groups of data

    The system environment supporting your datawarehouse

    The data transformations required

    The frequency with which data is refreshed

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    11/59

    Logical vs. Physical Design in Data

    Warehouses

    The logical design is more conceptual and

    abstract than the physical design.

    In the logical design, you look at the logical

    relationships (dependence/cause/links) among

    the objects.

    In the physical design, you look at the most

    effective way of storing and retrieving the objectsas well as handling them from a transportation

    and backup/recovery perspective.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    12/59

    Logical vs. Physical Design in Data

    Warehouses

    A logical data model describes the data in asmuch detail as possible, without regard to howthey will be physical implemented in thedatabase.

    Features of a logical data model include: Includes all entities and relationships among them.

    All attributes for each entity are specified.

    The primary key for each entity is specified.

    Foreign keys (keys identifying the relationshipbetween different entities) are specified.

    Normalization occurs at this level.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    13/59

    Logical vs. Physical Design in Data

    Warehouses

    When creating a data warehouse, you create a

    logical design before the physical one.

    By beginning with the logical design, you focus

    on the information requirements and save the

    implementation details for later.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    14/59

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    15/59

    Creating a Logical Design

    The process of logical design involves

    arranging data into a series of logical

    relationships called entities and attributes.

    An entity represents a chunk ofData Warehousing

    Schemas information. In relational databases, an

    entity often maps to a table.

    An attribute is a component of an entity that helpsdefine the uniqueness of the entity. In relational

    databases, an attribute maps to a column.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    16/59

    Creating a Logical Design

    The steps for designing the logical data model

    are as follows:

    Specify primary keys for all entities.

    Find the relationships between different entities.

    Find all attributes for each entity.

    Resolve many-to-many relationships.

    Normalization.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    17/59

    Creating a Logical Design: Logical Data

    Model

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    18/59

    Creating a Logical Design

    While entity-relationship diagramming has

    traditionally been associated with highly

    normalized models such as OLTP applications,

    the technique is still useful for data

    warehouse design in the form ofdimensional

    modeling.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    19/59

    Creating a Logical Design

    In dimensional modeling, instead of seeking todiscover atomic units of information (such asentities and attributes) and all of the

    relationships between them, you identify: which information belongs to a central fact table and

    which information belongs to its associated dimensiontables.

    You identify business subjects orfields of data,define relationships between business subjects,and name the attributes for each subject.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    20/59

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    21/59

    Physical Data Model(1)

    Physical data model represents how the model

    will be built in the database. A physical database

    model shows all table structures, including:

    column name,

    column data type,

    column constraints,

    primary key, foreign key, and

    relationships between tables.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    22/59

    Physical Data Model(2)

    Features of a physical data model include:

    Specification of all tables and columns.

    Foreign keys are used to identify relationships between

    tables.

    Denormalization may occur based on user requirements.

    Physical considerations may cause the physical data model

    to be quite different from the logical data model.

    Physical data model will be different for different RDBMS.

    For example, data type for a column may be different

    between MySQL and SQL Server

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    23/59

    Physical Data Model(3)

    The steps for physical data model design are

    as follows:

    Convert entities into tables.

    Convert relationships into foreign keys.

    Convert attributes into columns.

    Modify the physical data model based on physical

    constraints / requirements.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    24/59

    Physical Data Model(4)

    Physical Data Model Example

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    25/59

    Physical Data Model(4)

    Comparing the logical data model shown in

    the previous slide with the logical data

    model diagram (seen earlier), we see the main

    differences between the two: Entity names are now table names.

    Attributes are now column names.

    Data type for each column is specified. Data types can be different depending on the actual

    database being used.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    26/59

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    27/59

    How Dimensional model is different from

    an E-R diagram?

    The E-R diagram is split as per the entities. A

    dimension model is split as per the

    dimensions and facts.

    In an E-R diagram all attributes for an entity

    including textual as well as numeric, belong to

    the entity table.

    Whereas a 'dimension' entity in dimension model

    has mostly the textual attributes, and the 'fact'

    entity has mostly numeric attributes.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    28/59

    Data Warehousing Schemas

    A schema is a collection of database objects,including tables, views, indexes, and synonyms.

    You can arrange schema objects in the schema

    models designed for data warehousing in avariety of ways. Most data warehouses use a dimensional model.

    The model of your source data and the requirementsof your users help you design the data warehouse

    schema. You can sometimes get the source model from your

    company's enterprise data model and reverse-engineer the logical data model for the datawarehouse from this.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    29/59

    Data Warehousing Schemas

    The physical implementation of the logical

    data may require some changes to adapt it to

    your system parameterssize of machine,

    number of users, storage capacity, type of

    network, and software.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    30/59

    Star Schemas

    The star schema is the simplest data

    warehouse schema.

    It is called a star schema because the diagramresembles a star, with points radiating from a

    center.

    The center of the star consists of one or more

    fact tables and the points of the star are the

    dimension tables, as shown in Figure 21.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    31/59

    Figure 21

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    32/59

    Star Schemas

    The most natural way to model a data warehouse

    is as a star schema, where only one join

    establishes the relationship between the fact

    table and any one of the dimension tables.

    A star schema optimizes performance by keeping

    queries simple and providing fast response time.

    All the information about each level is stored in onerow.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    33/59

    Other Schemas

    Some schemas in data warehousing

    environments use third normal form rather

    than star schemas.

    Another schema that is sometimes useful is

    the snowflake schema, which is a star schema

    with normalized dimensions in a tree

    structure.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    34/59

    Snowflake Schema

    The snowflake schema is an extension of the

    star schema, where each point of the star

    explodes into more points.

    In a star schema, each dimension is

    represented by a single dimensional table,

    whereas in a snowflake schema, that

    dimensional table is normalized into multiplelookup tables, each representing a level in the

    dimensional hierarchy.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    35/59

    Sample snowflake schema

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    36/59

    Snowflake Schema advantages and

    disadvantages

    The main advantage of the snowflake schema

    is the improvement in query performance due

    to minimized disk storage requirements and

    joining smaller lookup tables.

    The main disadvantage of the snowflake

    schema is the additional maintenance efforts

    needed due to the increase in number of thelookup tables.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    37/59

    Data Warehousing Objects

    Fact tables and dimension tables are the twotypes of objects commonly used in dimensionaldata warehouse schemas.

    Fact tables are the large tables in your datawarehouse schema that store businessmeasurements.

    Fact tables typically contain facts and foreign keys tothe dimension tables.

    Fact tables represent data, usually numeric andadditive, that can be analyzed and examined.

    Examples include sales, cost, and profit.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    38/59

    Data Warehousing Objects

    Dimension tables, also known as lookup or

    reference tables, contain the relatively static

    data in the data warehouse.

    Dimension tables store the information you

    normally use to contain queries.

    Dimension tables are usually textual and

    descriptive and you can use them as the row

    headers of the result set.

    Examples are customers or products.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    39/59

    Fact Tables

    A fact table typically has two types of

    columns: those that contain numeric facts

    (often called measurements), and those that

    are foreign keys to dimension tables. A fact table contains either detail-level facts

    or facts that have been aggregated.

    Fact tables that contain aggregated facts are oftencalled summary tables.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    40/59

    Fact Tables

    A fact table usually contains facts(measurements) with the same level ofaggregation.

    Though most facts are additive, they can also besemi-additive or non-additive. Additive facts can be aggregated by simple

    arithmetical addition. A common example of this issales.

    Non-additive facts cannot be added at all. An exampleof this is averages.

    Semi-additive facts can be aggregated along some ofthe dimensions and not along others.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    41/59

    Fact Tables

    A fact table contains the measures of interest

    (Facts).

    For example, sales amount would be such a

    measure. This measure is stored in the fact table with the

    appropriate granularity.

    For example, it can be sales amount by store by day.

    In this case, the fact table would contain three

    columns: A date column, a store column, and a sales

    amount column.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    42/59

    Creating a New Fact Table

    You must define a fact table for each star

    schema.

    From a modeling standpoint, the primary key

    of the fact table is usually a composite key

    that is made up of all of its foreign keys.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    43/59

    Creating a New Fact Table

    Granularity

    The first step in designing a fact table is todetermine the granularity of the fact table.

    By granularity, we mean the lowest level ofinformation that will be stored in the fact table. Thisconstitutes two steps:

    Determine which dimensions will be included.

    Determine where along the hierarchy of each dimension theinformation will be kept.

    The determining factors usually goes back to therequirements.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    44/59

    Creating a New Fact Table

    Which Dimensions To Include

    Determining which dimensions to include is

    usually a straightforward process, because

    business processes will often dictate clearly whatare the relevant dimensions.

    For example, in an off-line retail world, the

    dimensions for a sales fact table are usually time,

    geography, and product.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    45/59

    Creating a New Fact Table

    What Level Within Each Dimensions To Include Determining which part of hierarchy the information is stored

    along each dimension is a bit more tricky.

    This is where user requirement (both stated and possibly future)plays a major role.

    In the above example, will the supermarket wanting to doanalysis along at the hourly level? (i.e., looking at how certainproducts may sell by different hours of the day.) If so, it makessense to use 'hour' as the lowest level of granularity in the timedimension.

    Ifdaily analysis is sufficient, then 'day' can be used as thelowest level of granularity. Since the lower the level of detail,the larger the data amount in the fact table, the granularityexercise is in essence figuring out the sweet spot in the tradeoffbetween detailed level of analysis and data storage.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    46/59

    Dimension Tables

    A dimension is a structure, often composed of one or morehierarchies, that categorizes data.

    Dimensional attributes help to describe the dimensionalvalue. They are normally descriptive, textual values.

    Several distinct dimensions, combined with facts, enableyou to answer business questions.

    Commonly used dimensions are customers, products, andtime.

    Dimension data is typically collected at the lowest level ofdetail and then aggregated into higher level totals that aremore useful for analysis. These natural rollups or aggregations within a dimension table

    are called hierarchies.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    47/59

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    48/59

    Hierarchies

    Definitions:

    Hierarchies are logical structures that use ordered

    levels as a means of organizing data.

    Hierarchies are the paths over which any data (OR

    measure) is summarized

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    49/59

    Hierarchies

    A hierarchy can be used to define dataaggregation.

    For example, in a time dimension, a hierarchy

    might aggregate data from the month level to thequarter level to the year level.

    A hierarchy can also be used to define anavigational drill path and to establish a family

    structure. Moving between the levels of a hierarchy is called

    drilling up and drilling down

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    50/59

    Hierarchies

    Within a hierarchy, each level is logically

    connected to the levels above and below it.

    Data values at lower levels aggregate into the data

    values at higher levels. A dimension can be composed of more than

    one hierarchy.

    For example, in the product dimension, theremight be two hierarchiesone for product

    categories and one for product suppliers.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    51/59

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    52/59

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    53/59

    Hierarchies

    Hierarchies are also essential components in

    enabling more complex rewrites.

    For example, the database can aggregate an

    existing sales revenue on a quarterly base to a

    yearly aggregation when the dimensional

    dependencies between quarter and year are

    known.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    54/59

    Typical Dimension Hierarchy

    Figure 22 illustrates a

    dimension hierarchy

    based on customers.

    - i.e. You can analyse

    your Customers byregion, sub region,

    Country and by

    individual customer

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    55/59

    Unique Identifiers

    Unique identifiers are specified for onedistinct record in a dimension table.

    Artificial unique identifiers are often used to

    avoid the potential problem of uniqueidentifiers changing.

    Unique identifiers are can be represented witha prefix e.g. the # character, depending onDBMS

    For example, #customer_id.(Oracle)

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    56/59

    Relationships

    Relationships guarantee business integrity.

    An example is that if a business sells something,

    there is obviously a customer and a product.

    Designing a relationship between the sales

    information in the fact table and the

    dimension tables products and customers

    enforces the business rules in databases.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    57/59

    Example of Data Warehousing

    Objects and Their Relationships

    Figure 23 illustrates a common example of a

    sales fact table and dimension tables

    customers, products, promotions, times, and

    channels.

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    58/59

  • 8/6/2019 04 Dimensional Modelling_Logical Design

    59/59

    Practical

    Next lecture: use given data to create fact

    tables, dimensions and a star schema