data warehouse elizabeth arden wp

Upload: dumbrava-caius-florin

Post on 10-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    1/18

    Oracle Warehouse Builder 10g Release 2Implementing Advanced DataWarehouse Concepts at Elizabeth

    Arden I nc.

    October 2006

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    2/18

    Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 2

    Note:

    This document is for informational purposes. It is not a commitment to deliver

    any material, code, or functionality, and should not be relied upon in making

    purchasing decisions. The development, release, and timing of any features or

    functionality described in this document remains at the sole discretion of Oracle.

    This document in any form, software or printed matter, contains proprietary

    information that is the exclusive property of Oracle. This document and

    information contained herein may not be disclosed, copied, reproduced, or

    distributed to anyone outside Oracle without prior written consent of Oracle. This

    document is not part of your license agreement nor can it be incorporated into

    any contractual agreement with Oracle or its subsidiaries or affiliates.

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    3/18

    Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 3

    Oracle Warehouse Builder 10g Release 2

    Implementing Advanced Data Warehouse

    Concepts at Elizabeth Arden

    INTRODUCTION

    In many organizations information requirements become more advanced and

    require an adaptive architecture to deliver on these advanced requirements in a

    timely manner. This latter requirement forces organizations to put more emphasis

    on planning ahead and implementing more advanced concepts in production

    environments.

    The specific solutions we are discussing in this paper are technically elegant and

    effective, but need to achieve real business goals to warrant their development

    time. For Elizabeth Arden Inc. the goals were concrete and tangible:

    Ensure business users own good as well as bad data, allowing for betteroverall information quality and clear information ownership in the

    business.

    Empower business users to create their own complex reporting solutionsreducing the involvement of IT developers and reducing the cost to thebusiness.

    Allow for quick and easy conversion of outside or external data reducingthe time and effort required to integrate these new data sets.

    Ultimately, if all of these goals are achieved, Elizabeth Arden achieves better

    information quality, faster turn around cycles to add or move new data and gets

    better decisions directly from users. The company also offloads some reporting

    related efforts from the IT staff, removing bottle-necks from the reporting

    process for end-users.

    In this paper we discuss these advanced concepts that are useful in any data

    warehouse situation:

    In place rejection and realignment of data elements to allow for businessownership of all data.

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    4/18

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    5/18

    Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 5

    IN PLACE REJECTION AND AUTOMATED REALIGNMENT OF DATAELEMENTS

    The in place rejection concept is based on the fact that business users ultimately

    own the data they are viewing in their reports. This simple fact means that

    business users are also ultimately responsible for bad data. To allow business users

    to see for themselves where bad data is residing it should be allowed into the

    warehouse. In place rejection and realignment is a way to achieve this in a

    dimensional model.

    Concepts

    In a dimensional model, the fact table is linked to the dimensions based on some

    key value. For each fact record, to load it into the fact table, the dimension keys

    must be a match, otherwise the constraints are enforced and the record is not

    loaded. To load facts, therefore the dimensions must hold the exact data elements

    that the new fact records expect.

    If a fact record contains bad data (for at least one of the dimensions it

    references), then the normal process in a relational database is to reject therecord. Rejecting the record, however does not expose the records to the business

    user. In some cases it might end up in a pile of incorrect data that someone

    should go through at some point in time. Experience learns that that task is hardly

    ever completed in a timely manner.

    Hence the concept of in place rejection is introduced. By rejecting the records in

    place within the fact table the business users see the misaligned data at each

    report, and are consciously aware the bad data is there and how much of it is lying

    around. They also see the business value (in dollars, units or whatever the

    measurement type is) of misaligned data.

    In place rejection

    To achieve the goal of placing the bad data in the actual reporting systems, in

    place rejection uses generated dimension members to capture the bad data. For

    each dimension member at the linked level (typically this is the lowest level of the

    dimension) a value is introduced that is then used as a garbage collector.

    All data that does not match any of the members is assigned to this garbage

    collector value. The garbage collector value can either be a static value like

    rejected, or a dynamic value like for example REJ_originalvalue. Alternatively you

    can log the entire input row and the new key value in a table for later processing

    of reloads.In all cases the lowest level rolls up to a distinct level in the top of the hierarchy

    and the total gets the in place rejected records so the total of a star always rolls up

    correctly.

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    6/18

    Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 6

    Realignment

    In some cases in place rejection is caused not by bad data, but by bad timing. In

    that case the realignment happens without any interference directly upon the next

    load of the facts and dimensions. When bad data is truly the cause of

    misalignment, realignment can only happen if the data is fixed and then reloaded.

    Regardless of the fix, the system must be built in such a way that the realignment

    is detected and the data that was labeled as incorrect is realigned in the database

    to be correct. I t is crucial that the system is not double counting due to a previous

    error.

    A look at Warehouse Builder

    These concepts are automated in the ETL processes in Warehouse Builder 10g

    Release 2. Here we describe some of the implementation details.

    The first pass through the data is loading the dimensions in a regular mapping.

    For this paper that mapping is not interesting as it simply loads all dimension

    records currently present in the source.

    The fact mapping is the first mapping that will encounter any issues with the load

    if a dimension record is not (yet) loaded. That mapping utilizes a plain Warehouse

    Builder lookup, but returns a default link value if the dimension record is not

    found.

    Figure 1 Using no-match rows in key-lookup to always load a fact

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    7/18

    Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 7

    This way, all facts are loaded, with a default dimension ID, which is as is shown in

    Figure 1 set to 4. Because of the lookup functionality that comes with Warehouse

    Builder this is s straightforward step in the ETL process.

    The next step is less trivial, and that is because the keys need to be realigned when

    a dimension record is found. To match and recode a source table is used that

    contains the original references, from there a lookup is done to the final

    dimension to establish if there is a new record that requires recoding.

    Figure 2 Lookup a dimension value for recoding

    If there is no recoding (e.g. the record is still not found in the dimension) thedefault key is loaded into the recoding table again.

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    8/18

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    9/18

    Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 9

    EMBEDDED SPECIAL TIME CONSTRUCTS

    While time is a simple concept in our every day life, it actually can complicate

    reporting on a system. In most cases the data warehouse or reporting database

    takes only a base time period as its storage point. In other words, sales orders are

    loaded when they are booked, and that is the time of the transaction.

    Users on the reporting side however often have questions like, how does thismonths sales compare to last month, or to last year for the same month. When

    the reporting system is only concerned with the base time, the user now has to

    create the calculations to get the desired answers.

    Embedded special time constructs are moving the calculation of these desired

    time constructs down into the reporting system. Essentially the data model will

    embed the special time constructs for the user to choose from. By embedding it

    in the base data layer of the reporting system, heavy lifting to run the calculations

    is done when the data is refreshed, not when the user needs an answer.

    Concepts

    The main important concept is pre-calculation, where a reporting entity is pre-built

    for the users. A data warehouse in itself is a good example of implementing this

    concept. Instead of heaving each user join the tables to get the sales for all

    customers over the last month out of the ERP system, it is pre-built.

    Embedded special time goes one step further. It creates more reporting constructs

    up-front giving easy access to hard to get data.

    To pre-calculate many options can be looked at. On-Line Analytical Processing

    (OLAP) is one storage technique that heavily uses pre-built information. In this

    section however we cover a strictly relational implementation, using two diverse

    techniques to pre-build the special time constructs:

    Analytic SQL A bitmap table linking fact to dimensions

    Using Analytic SQL

    As with OLAP data, if you start to create facts that only exist in certain cells for

    the dimensions, sparsity is the result. The data is sparse in that for certain

    dimension combinations no fact is stored. The problem with sparse data is that it

    increases storage and typically can decrease performance.

    To avoid this sparsity in the query objects, an extra join is used to condense the

    data set, and then analytic SQL functions enable the data to be shown to users as

    embedded special time constructs.

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    10/18

    Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 10

    The other advantage (this is part of why we do this) is that the heavy lifting is not

    done by users in client calculations, but by the database. This increases

    performance and alleviates the burden of creating the calculations.

    Using a Bitmap Table

    A bitmap table is an intermediate table that links the fact table to the time

    dimension allowing the facts to be looked at for different time constructs by pre-joining the data set.

    A bitmap table would look something like this (note this is not a complete table

    but should convey the idea):

    from_period to_period CP LY_CP LU_R2 LY_R3 QTD_CP LY_QTD_CP

    200701 200701 1 0 0 0 1 0

    200601 200701 0 1 1 1 0 0

    200512 200701 0 0 1 1 0 0

    200511 200701 0 0 0 1 0 0

    Table 1 Bitmap Table Example

    The columns in the bitmap table represent the special time constructs we want to

    work with. The SQL will join on the rows to determine when there are values to

    be shown leveraging the bitmap in the table.

    A look at Warehouse Builder

    These concepts are added to Warehouse Builder in the schema design, and the

    queries are created in views again stored in the Warehouse Builder repository.

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    11/18

    Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 11

    Table 2 A densified time view in the data object editor

    Focus on Analytic SQL

    As we said, there are 2 important concepts here. The first is to deal with the

    analytic SQL itself. The below fragment shows a particular snippet from the select

    clause of the statement, note the analytic SQL constructs.

    Select

    , NVL ( SUM ( fact.quantity_on_hand )

    OVER ( PARTITION BY fact.company_id,fact.item_sku_id,

    v_time.fiscal_period_cd

    ORDER BY v_time.fiscal_period_cd ROWS

    BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

    )

    ,

    0 ) ptd_quantity_on_hand

    The second part is the actual join used to densify the data. It l iterally squeezes the

    air of the result set and gives us better performance in retrieving the results. That

    join is shown in the snippet below.Select

    FROM Hybrid_Eaden.c_inventory_rpt fact

    PARTITION BY (fact.company_id,fact.item_sku_id)

    RIGHT OUTER JOIN Hybrid_Eaden.d_ea_month v_time

    ON ( fact.ea_month_id = v_time.ea_month_id )

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    12/18

    Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 12

    Focus on the Bitmap Table

    The bitmap table is linked from the fact table to the dimension. The bitmap

    (either a 1 or a 0) is then added into the select clause of the query. If we look at

    our bitmap table again, and then at the usage in the select clause we can

    understand where data is shown (there is a 1) or not shown in the results (there is

    a 0) allowing us to create a nice reporting solution.

    , (fact.sales_units*time.ly_qtd_cp)

    , (fact.sales_units*time.ly_tot_qtr_cp)

    , (fact.sales_units*time.ly_tot_hyr_cp)

    , (fact.sales_units*time.ly_tot_yr_cp)

    , (fact.sales_units*time.ly_r3)

    , (fact.sales_units*time.ly_r6)

    , (fact.sales_units*time.ly_r12)

    FROM c_global_sales_affiliate fact

    , d_ea_month_matrix time

    WHERE fact.ea_month_id = time.from_ea_month_idThe bitmap is simply joined to the fact table but the measures in the fact are

    multiplied with the value in the bitmap.

    This allows each measure to either hold data, or to hold a zero value. In this case

    zero means that this cell is not applicable for the measure. Rolling the totals up

    will then just add a zero value and not influence any totals.

    In the case where the bitmap has aone, the value is shown in the measure and is

    counted in rollups and in reporting. Users do not have to figure this out, they

    simply see either a value or a zero value.

    Advantages and disadvantages

    Both methods have their advantages and disadvantages. At Elizabeth Arden the

    following points came out of the woodworks.

    Analytic SQL advantage has the advantage that it outperforms the traditional

    approach of writing complex sub-queries. It has the disadvantage that

    performance can be impacted by because query predicates are not always passed

    to the underlying view.

    The main disadvantage of the bitmap method is that the query pulls in more data

    than is strictly required. This of course will impact performance as well.

    Both methods however do achieve their main goal of creating a simpler reporting

    environment for the end user and allows the IT professionals to focus on

    developing better systems.

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    13/18

    Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 13

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    14/18

    Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 14

    DIMENSION BASED DATA CONVERSION

    In todays world a successful project faces some interesting challenges. Mostly the

    addition of new data from either new subsidiaries, or acquisitions, or other similar

    systems from what the system started with. Adding data to an existing system may

    sound simple, simply load the new (or old historical) data into the database and

    you are done. If it was just that simple, this section would not be written.

    The problem, especially with adding similar but different systems, is that in many

    cases data needs to be recoded or rolled up in different ways. When loading the

    new data it needs to conform to the current hierarchies and strategies. To achieve

    this can be a daunting task.

    Concepts

    The dimension-based conversion is based on a simple but elegant concept. Rather

    than changing data structures or original data loaded into a dimension to convert

    data, the rollup in a hierarchy manages the changes.

    This concept has a number of very compelling benefits. One is that a change isnot permanent or destructive to the original data. The original record and linkage

    is still in place, it just rolls up differently. The second benefit is that these typically

    can be done quickly with much less impact on the system than a normal update.

    Load level vs. Reporting levels

    The two benefits we mentioned earlier are achieved by adding a level to each

    dimension (or better to a hierarchy) that acts as the lowest level. This lowest level

    is called a load level. In other words data for the fact is always loaded at this level

    in the hierarchy. The load level however is never exposed to users of the

    reporting system. These users only see the so-called reporting levels.

    Realigning vs. Overwriting

    In many cases conversion methods are destructive. For example fact data might

    be stored at a higher grain, and the new data needs to be aggregated. Doing this in

    a normal load wil l convert the data once. If that turned out to be not completely

    accurate, then there is no easy fix. If the source is lost by that time, there is no f ix

    at all. Realigning simply keeps the new records as is in the grain they are in

    originally. Overwriting is not what we want to do to our data.

    An example

    If we look at a table structure and data, this concept will become a lot clearer and

    the benefits become visible instantly.

    Lets say we have loaded three products into the dimension table (in this example

    we only show a few dimension columns). The result with the booked (which is

    what the load level is called in this example) and reporting levels will look like this:

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    15/18

    Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 15

    booked_products reported_products source_key

    A A STANDARD

    C C STANDARD

    D D STANDARD

    Table 3 The dimension table with booked and reported levels

    Now lets say we need to merge from a new vendor into this dimension, and we

    do have the following mapping table that shows how a product from the new

    vendor can be recoded to the Elizabeth Arden product codes (and therefore

    hierarchies):

    Vendor XYZ Product Elizabeth Arden Product

    123 A

    654 A

    234 D

    A C

    Table 4 Recoding or matching table

    With the levels as shown in Table 3 we can now quickly load the dimension and

    see the following as a result after loading and recoding:

    booked_products reported_products source_key

    A A STANDARD

    C C STANDARD

    D D STANDARD

    123 A VENDORXYZ

    654 A VENDORXYZ

    234 D VENDORXYZ

    A C VENDORXYZ

    Table 5 Adding new products is simple and retains history

    We can now see where the big benefits come in. The table in Table 5 shows both

    the original data and it shows the Elizabeth Arden product to the user . Therevenue for this dimension is now directly available in the relevant business units.

    The other advantage is when a recoding is required, where for example product

    234 should be linked to Elizabeth Arden product A, the dimension can be

    updated instead of all related facts.

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    16/18

    Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 16

    More to come

    The concept of building a dimension with two separate levels playing two separate

    roles, one for loading one for reporting is currently not in production at Elizabeth

    Arden, but may be implemented in future.

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    17/18

    Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 17

    SUMMARY AND CONCLUSION

    Inventive use of an ETL tool, forward looking data modeling and embedding

    smarts in data structures allows Elizabeth Arden to achieve two goals for their

    business.

    First, the IT staff can focus on their core capabilities of building and maintaining

    systems, not on creating reports for end users. Second, business users get betterinformation quicker allowing these users to again do their job, run the business.

    With Warehouse Builder Elizabeth Arden has found a tool that it can use to

    implement these concepts to get the maximum benefits for the company.

  • 8/8/2019 Data Warehouse Elizabeth Arden Wp

    18/18