8 dimensional modeling1

Upload: jonjon

Post on 03-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 8 Dimensional Modeling1

    1/33

    Dimensional ModelingKimball: 6, 7

    Make everything as simple as

    possible, but not simpler.( Albert Einstein )

  • 8/12/2019 8 Dimensional Modeling1

    2/33

    2

    AgendaTopics covered:

    Introduction to Dimensional ModelingDesigning a Dimensional ModelDesigning a Physical DatabasePlanning for PerformanceIntroduction to Extract, Transformation, Load

  • 8/12/2019 8 Dimensional Modeling1

    3/33

    3

    Introduction to Dimensional ModelingDimensional modeling divides

    the world into measurements and context .

    Measurements are captured by the organization's business

    processes and their supporting operational source systems.Measurements are usually numeric values; we refer tothem as facts .Facts are surrounded by largely textual context that is trueat the moment the fact is recorded. This context isintuitively divided into independent logical clumpscalled dimensions . Dimensions describe the "who, what,when, where, why, and how" context of themeasurement.

  • 8/12/2019 8 Dimensional Modeling1

    4/33

    4

    An Example of Facts and Dimensions

  • 8/12/2019 8 Dimensional Modeling1

    5/33

    5

    Referred to as master dimensions or common

    reference dimensions.Two forms: (1) two conformed dimensions areidentical, (2) one dimension is a perfect subsetof another.Benefits include:

    ConsistencyIntegration

    Reduce development time to market

    Conformed dimensions and their benefits

  • 8/12/2019 8 Dimensional Modeling1

    6/33

    6

    Shrunken conformeddimension tables arecreated to describefact tables that eithernaturally capturemeasurements at ahigher level of detail,or facts that havebeen aggregated to aless granular, rolled

    up level forperformance reasons(e.g. productdimension is rolled up

    to Brand dimension).

  • 8/12/2019 8 Dimensional Modeling1

    7/337

    Modeling Process

  • 8/12/2019 8 Dimensional Modeling1

    8/338

    Figure 6.4. Diagram of data warehouse bus with conformed dimension interfaces.

  • 8/12/2019 8 Dimensional Modeling1

    9/339

    High level model bubble diagram

  • 8/12/2019 8 Dimensional Modeling1

    10/3310

    Step 1: Choose the business process from the

    bus matrixStep 2: Declare the grain, as atomic aspossibleStep 3: Identify the dimensionsStep 4: Identify the facts

    Four-Step Dimensional Design Process

  • 8/12/2019 8 Dimensional Modeling1

    11/33

    11

    Figure 6.5. Bus matrix for manufacturing

    supply chain

    Step 1: Choose the biz process from the matrix

  • 8/12/2019 8 Dimensional Modeling1

    12/33

    12

    The first dimensional model built should be theone with the most impact it should answer themost pressing business questions and bereadily accessible for data extraction.

    Step 1: Choose the biz process from the matrix

  • 8/12/2019 8 Dimensional Modeling1

    13/33

    13

    Matrix Row Mishap

    Departmental or overly encompassing rows:distinction between departments (group ofprocesses) and business processesReport-centric or too narrowly defined rows:

    kitchen sink syndromeMatrix Column Mishap

    Overly generalized column: check for overlaps

    Separate column for each level of hierarchy

    Avoid Common Matrix Mishaps

  • 8/12/2019 8 Dimensional Modeling1

    14/33

    14

    Example grain declarations include:

    An individual line item on a customer's retail salesticket as measured by a scanner device A line item on a bill received from a doctor An individual boarding pass to get on a flight A daily snapshot of the inventory levels for eachproduct in a warehouse A monthly snapshot for each bank account

    Preferably you should develop dimensional models for themost atomic information captured by a business process.

    Atomic data is the most detailed information collected; such

    data cannot be subdivided further. Dont bypass this step!

    Step 2: Declare the grain, as atomic as possible

  • 8/12/2019 8 Dimensional Modeling1

    15/33

    15

    All dimensions in bus matrix should be tested againstthe grain to see if they fit.Scrutinize the dimensions to make sure they makesense. Consider the impacts on usability andperformance of splitting a large dimension into severaldimensions or combining dimensions.

    A careful grain statement determines the primarydimensionality of the fact table. It is then oftenpossible to add more dimensions to the basic grain ofthe fact table, where these additional dimensionsnaturally take on only one value under eachcombination of the primary dimensions. If theadditional dimension violates the grain by causingadditional fact rows to be generated, then the grain

    statement must be revised to accommodate thisdimension.

    Step 3: Identify the dimensions

  • 8/12/2019 8 Dimensional Modeling1

    16/33

    16

    Make sure the facts are additive along all

    dimensionsThe most useful facts are both numeric andadditive because BI applications seldomretrieve a single fact table row, queries typicallyselect hundreds or thousands of fact rows.

    Step 4: Identifying Facts

  • 8/12/2019 8 Dimensional Modeling1

    17/33

    17

    Imagine that we work in the headquarters of a largegrocery chain. Our business has 100 grocery storesspread over a five-state area. Each of the stores hasa full complement of departments, including grocery,frozen foods, dairy, meat, produce, bakery, floral, andhealth/beauty aids. Each store has roughly 60,000

    individual products on its shelves. The individualproducts are called stock keeping units (SKUs). About55,000 of the SKUs come from outside manufacturersand have bar codes imprinted on the productpackage. These bar codes are called universal

    product codes (UPCs). UPCs are at the same grainas individual SKUs. Each different package variationof a product has a separate UPC and hence is aseparate SKU.

    A retail case

  • 8/12/2019 8 Dimensional Modeling1

    18/33

    18

    In our retail case study, management wants to

    better understand customer purchases ascaptured by the POS system.Thus the business process we're going tomodel is POS retail sales. This data will allowus to analyze what products are selling inwhich stores on what days under whatpromotional conditions.

    Step 1: Choose the biz process

  • 8/12/2019 8 Dimensional Modeling1

    19/33

    19

    In our case study, the most granular data is an

    individual line item on a POS transaction. Toensure maximum dimensionality and flexibility,we will proceed with this grain.

    A data warehouse almost always demands dataexpressed at the lowest possible grain of eachdimension not because queries want to see individuallow-level rows, but because queries need to cutthrough the details in very precise ways.

    Step 2: Declare the Grain

  • 8/12/2019 8 Dimensional Modeling1

    20/33

    20

    In our case study we've decided on the

    following descriptive dimensions: date, product,store, and promotion. In addition, we'll includethe POS transaction ticket number as a specialdimension.

    Step 3: Choose Dimensions

  • 8/12/2019 8 Dimensional Modeling1

    21/33

    21

    The facts must be true to the grain: the individual lineitem on the POS transaction in this case.The facts collected by the POS system include thesales quantity, per unit sales price, and the salesdollar amount. The sales dollar amount equals thesales quantity multiplied by the unit price. Cost dollaramount is also included.

    Step 4: Identify the Facts

  • 8/12/2019 8 Dimensional Modeling1

    22/33

    22

    Three of the facts, sales quantity, sales dollar amount,and cost dollar amount, are beautifully additive acrossall the dimensions.We can compute the gross profit by subtracting thecost dollar amount from the sales dollar amount, orrevenue. Although computed, this gross profit is alsoperfectly additive across all the dimensions.The gross margin can be calculated by dividing thegross profit by the dollar revenue. Gross margin is anonadditive fact because it can't be summarizedalong any dimension.Unit price is also a nonadditive fact. Attempting tosum up unit price across any of the dimensionsresults in a meaningless, nonsensical number.

    Step 4: Identify the Facts

  • 8/12/2019 8 Dimensional Modeling1

    23/33

    23

    Physical Design for Project 2

  • 8/12/2019 8 Dimensional Modeling1

    24/33

    24

    DROP TABLE DEPARTMENT CASCADE CONSTRAINTS;

    CREATE TABLE DEPARTMENT(DEPT_ID Number NOT NULL,DEPT_NAME Varchar2(40) NOT NULL);

    ALTER TABLE DEPARTMENT ADD CONSTRAINT DEPT_UID PRIMARYKEY (DEPT_ID);

  • 8/12/2019 8 Dimensional Modeling1

    25/33

    25

    DROP TABLE EMPLOYEE CASCADE CONSTRAINTS;CREATE TABLE EMPLOYEE( EMP_ID Number NOT NULL,DEPT_ID Number NOT NULL,EMP_SSN Char(9 ) NOT NULL,EMP_FIRST_NAME Varchar2(20) NOT NULL,EMP_LAST_NAME Varchar2(30) NOT NULL,EMP_BIRTH_DATE Date NOT NULL,

    EMP_GENDER Char(1) NOT NULL,EMP_HIRE_DATE Date NOT NULL,EMP_STREET Varchar2(80), EMP_CITY Varchar2(80),EMP_STATE Char(2), EMP_ZIP Char(5),EMP_TYPE Varchar2(1)CONSTRAINT ValidValuesEMP_TYPE CHECK (( EMP_TYPE IN (' E ','N')))

    ); ALTER TABLE EMPLOYEE ADD CONSTRAINT EMP_UID1 PRIMARY KEY(EMP_ID,DEPT_ID);

    ALTER TABLE EMPLOYEE ADD CONSTRAINT EMP_UID2 UNIQUE (EMP_SSN); ALTER TABLE EMPLOYEE ADD CONSTRAINT EMP_UID3 UNIQUE(EMP_FIRST_NAME,EMP_LAST_NAME,EMP_BIRTH_DATE,EMP_GENDER);

  • 8/12/2019 8 Dimensional Modeling1

    26/33

    26 Database Systems, 8 th Edition 26

    Triggers: Maintain PK unique across subtypesProcedural SQL code automatically invoked by

    RDBMS on data manipulation eventTrigger definition:Triggering timing: BEFORE or AFTER

    Triggering event: INSERT, UPDATE, DELETETriggering level:

    Statement-level triggerRow-level trigger

    Triggering actionDROP TRIGGER trigger_name

  • 8/12/2019 8 Dimensional Modeling1

    27/33

    27

  • 8/12/2019 8 Dimensional Modeling1

    28/33

    28

    create or replaceTRIGGER non_exempt_employee_check

    BEFORE INSERT OR UPDATE OF emp_idON non_exempt_employeeFOR EACH ROWDECLAREdummy INTEGER := 0;BEGINIF ( (INSERTING OR UPDATING) AND ( :new.emp_id :old.emp_id)) THENSELECT COUNT(*)INTO dummyFROM exempt_employeeWHERE emp_id = :new.emp_id;IF (dummy 0) THENRAISE DUP_VAL_ON_INDEX;

    END IF;END IF;END;/

    The PL/SQL Code for Two Subtypes

  • 8/12/2019 8 Dimensional Modeling1

    29/33

    29

    create or replaceTRIGGER exempt_employee_checkBEFORE INSERT OR UPDATE OF emp_idON exempt_employeeFOR EACH ROWDECLAREdummy INTEGER := 0;BEGIN

    IF ( (INSERTING OR UPDATING) AND ( :new.emp_id :old.emp_id))THEN

    SELECT COUNT(*)INTO dummyFROM non_exempt_employeeWHERE emp_id = :new.emp_id;

    IF (dummy 0)THENRAISE DUP_VAL_ON_INDEX;

    END IF;ELSE

    SELECT COUNT(*)INTO dummyFROM WHERE emp_id = :new.emp_id;

    IF (dummy 0)THENRAISE DUP_VAL_ON_INDEX;

    END IF;

    END IF ;END;

    The PL/SQL Code for Three Subtypes

  • 8/12/2019 8 Dimensional Modeling1

    30/33

    30

    h d l

  • 8/12/2019 8 Dimensional Modeling1

    31/33

    31

    Choose a real world case and design a mini datawarehouse

    Business Requirements and Bus MatrixDesign the Dimensional Model, include at least 4 main

    dimensionsImplement the Design by Loading it into a real Data

    Warehouse softwarePrepare the reports and queries based on the business

    requirementsPresent to the class (15-20 minutes).

    For reference purpose: The Data Warehouse Toolkit: TheComplete Guide to Dimensional Modeling, 2nd Edition by Ralph Kimballand Margy Ross, John Wiley & Sons 2002This book contain many examples from different industries ondimensional modeling.

    Project 3: Data Warehouse Design and Development

  • 8/12/2019 8 Dimensional Modeling1

    32/33

    32

    Oracle Business Intelligence:http://www.oracle.com/technology/tech/bi/index.html

    Microsoft MS SQL 2008 and Visual StudioDownload from MSDN:http://msdn07.e-academy.com/elms/Storefront/Storefront.aspx?campus=temple_mis

    T Di i

    http://www.oracle.com/technology/tech/bi/index.htmlhttp://msdn07.e-academy.com/elms/Storefront/Storefront.aspx?campus=temple_mishttp://msdn07.e-academy.com/elms/Storefront/Storefront.aspx?campus=temple_mishttp://msdn07.e-academy.com/elms/Storefront/Storefront.aspx?campus=temple_mishttp://msdn07.e-academy.com/elms/Storefront/Storefront.aspx?campus=temple_mishttp://msdn07.e-academy.com/elms/Storefront/Storefront.aspx?campus=temple_mishttp://www.oracle.com/technology/tech/bi/index.html
  • 8/12/2019 8 Dimensional Modeling1

    33/33

    Get into your team

    Discuss about the topic of the projectNeed help let me know.

    Team Discussion