data warehouse elizabeth arden wp

8/8/2019 Data Warehouse Elizabeth Arden Wp

1/18

Oracle Warehouse Builder 10g Release 2Implementing Advanced DataWarehouse Concepts at Elizabeth

Arden I nc.

October 2006


2/18

Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 2

Note:

This document is for informational purposes. It is not a commitment to deliver

any material, code, or functionality, and should not be relied upon in making

purchasing decisions. The development, release, and timing of any features or

functionality described in this document remains at the sole discretion of Oracle.

This document in any form, software or printed matter, contains proprietary

information that is the exclusive property of Oracle. This document and

information contained herein may not be disclosed, copied, reproduced, or

distributed to anyone outside Oracle without prior written consent of Oracle. This

document is not part of your license agreement nor can it be incorporated into

any contractual agreement with Oracle or its subsidiaries or affiliates.


3/18


Oracle Warehouse Builder 10g Release 2

Implementing Advanced Data Warehouse

Concepts at Elizabeth Arden

INTRODUCTION

In many organizations information requirements become more advanced and

require an adaptive architecture to deliver on these advanced requirements in a

timely manner. This latter requirement forces organizations to put more emphasis

on planning ahead and implementing more advanced concepts in production

environments.

The specific solutions we are discussing in this paper are technically elegant and

effective, but need to achieve real business goals to warrant their development

time. For Elizabeth Arden Inc. the goals were concrete and tangible:

Ensure business users own good as well as bad data, allowing for betteroverall information quality and clear information ownership in the

business.

Empower business users to create their own complex reporting solutionsreducing the involvement of IT developers and reducing the cost to thebusiness.

Allow for quick and easy conversion of outside or external data reducingthe time and effort required to integrate these new data sets.

Ultimately, if all of these goals are achieved, Elizabeth Arden achieves better

information quality, faster turn around cycles to add or move new data and gets

better decisions directly from users. The company also offloads some reporting

related efforts from the IT staff, removing bottle-necks from the reporting

process for end-users.

In this paper we discuss these advanced concepts that are useful in any data

warehouse situation:

In place rejection and realignment of data elements to allow for businessownership of all data.


4/18


5/18


IN PLACE REJECTION AND AUTOMATED REALIGNMENT OF DATAELEMENTS

The in place rejection concept is based on the fact that business users ultimately

own the data they are viewing in their reports. This simple fact means that

business users are also ultimately responsible for bad data. To allow business users

to see for themselves where bad data is residing it should be allowed into the

warehouse. In place rejection and realignment is a way to achieve this in a

dimensional model.

Concepts

In a dimensional model, the fact table is linked to the dimensions based on some

key value. For each fact record, to load it into the fact table, the dimension keys

must be a match, otherwise the constraints are enforced and the record is not

loaded. To load facts, therefore the dimensions must hold the exact data elements

that the new fact records expect.

If a fact record contains bad data (for at least one of the dimensions it

references), then the normal process in a relational database is to reject therecord. Rejecting the record, however does not expose the records to the business

user. In some cases it might end up in a pile of incorrect data that someone

should go through at some point in time. Experience learns that that task is hardly

ever completed in a timely manner.

Hence the concept of in place rejection is introduced. By rejecting the records in

place within the fact table the business users see the misaligned data at each

report, and are consciously aware the bad data is there and how much of it is lying

around. They also see the business value (in dollars, units or whatever the

measurement type is) of misaligned data.

In place rejection

To achieve the goal of placing the bad data in the actual reporting systems, in

place rejection uses generated dimension members to capture the bad data. For

each dimension member at the linked level (typically this is the lowest level of the

dimension) a value is introduced that is then used as a garbage collector.

All data that does not match any of the members is assigned to this garbage

collector value. The garbage collector value can either be a static value like

rejected, or a dynamic value like for example REJ_originalvalue. Alternatively you

can log the entire input row and the new key value in a table for later processing

of reloads.In all cases the lowest level rolls up to a distinct level in the top of the hierarchy

and the total gets the in place rejected records so the total of a star always rolls up

correctly.


6/18


Realignment

In some cases in place rejection is caused not by bad data, but by bad timing. In

that case the realignment happens without any interference directly upon the next

load of the facts and dimensions. When bad data is truly the cause of

misalignment, realignment can only happen if the data is fixed and then reloaded.

Regardless of the fix, the system must be built in such a way that the realignment

is detected and the data that was labeled as incorrect is realigned in the database

to be correct. I t is crucial that the system is not double counting due to a previous

error.

A look at Warehouse Builder

These concepts are automated in the ETL processes in Warehouse Builder 10g

Release 2. Here we describe some of the implementation details.

The first pass through the data is loading the dimensions in a regular mapping.

For this paper that mapping is not interesting as it simply loads all dimension

records currently present in the source.

The fact mapping is the first mapping that will encounter any issues with the load

if a dimension record is not (yet) loaded. That mapping utilizes a plain Warehouse

Builder lookup, but returns a default link value if the dimension record is not

found.

Figure 1 Using no-match rows in key-lookup to always load a fact


7/18


This way, all facts are loaded, with a default dimension ID, which is as is shown in

Figure 1 set to 4. Because of the lookup functionality that comes with Warehouse

Builder this is s straightforward step in the ETL process.

The next step is less trivial, and that is because the keys need to be realigned when

a dimension record is found. To match and recode a source table is used that

contains the original references, from there a lookup is done to the final

dimension to establish if there is a new record that requires recoding.

Figure 2 Lookup a dimension value for recoding

If there is no recoding (e.g. the record is still not found in the dimension) thedefault key is loaded into the recoding table again.


8/18


9/18


EMBEDDED SPECIAL TIME CONSTRUCTS

While time is a simple concept in our every day life, it actually can complicate

reporting on a system. In most cases the data warehouse or reporting database

takes only a base time period as its storage point. In other words, sales orders are

loaded when they are booked, and that is the time of the transaction.

Users on the reporting side however often have questions like, how does thismonths sales compare to last month, or to last year for the same month. When

the reporting system is only concerned with the base time, the user now has to

create the calculations to get the desired answers.

Embedded special time constructs are moving the calculation of these desired

time constructs down into the reporting system. Essentially the data model will

embed the special time constructs for the user to choose from. By embedding it

in the base data layer of the reporting system, heavy lifting to run the calculations

is done when the data is refreshed, not when the user needs an answer.

Concepts

The main important concept is pre-calculation, where a reporting entity is pre-built

for the users. A data warehouse in itself is a good example of implementing this

concept. Instead of heaving each user join the tables to get the sales for all

customers over the last month out of the ERP system, it is pre-built.

Embedded special time goes one step further. It creates more reporting constructs

up-front giving easy access to hard to get data.

To pre-calculate many options can be looked at. On-Line Analytical Processing

(OLAP) is one storage technique that heavily uses pre-built information. In this

section however we cover a strictly relational implementation, using two diverse

techniques to pre-build the special time constructs:

Analytic SQL A bitmap table linking fact to dimensions

Using Analytic SQL

As with OLAP data, if you start to create facts that only exist in certain cells for

the dimensions, sparsity is the result. The data is sparse in that for certain

dimension combinations no fact is stored. The problem with sparse data is that it

increases storage and typically can decrease performance.

To avoid this sparsity in the query objects, an extra join is used to condense the

data set, and then analytic SQL functions enable the data to be shown to users as

embedded special time constructs.


10/18


The other advantage (this is part of why we do this) is that the heavy lifting is not

done by users in client calculations, but by the database. This increases

performance and alleviates the burden of creating the calculations.

Using a Bitmap Table

A bitmap table is an intermediate table that links the fact table to the time

dimension allowing the facts to be looked at for different time constructs by pre-joining the data set.

A bitmap table would look something like this (note this is not a complete table

but should convey the idea):

from_period to_period CP LY_CP LU_R2 LY_R3 QTD_CP LY_QTD_CP

200701 200701 1 0 0 0 1 0

200601 200701 0 1 1 1 0 0

200512 200701 0 0 1 1 0 0

200511 200701 0 0 0 1 0 0

Table 1 Bitmap Table Example

The columns in the bitmap table represent the special time constructs we want to

work with. The SQL will join on the rows to determine when there are values to

be shown leveraging the bitmap in the table.

A look at Warehouse Builder

These concepts are added to Warehouse Builder in the schema design, and the

queries are created in views again stored in the Warehouse Builder repository.


11/18


Table 2 A densified time view in the data object editor

Focus on Analytic SQL

As we said, there are 2 important concepts here. The first is to deal with the

analytic SQL itself. The below fragment shows a particular snippet from the select

clause of the statement, note the analytic SQL constructs.

Select

, NVL ( SUM ( fact.quantity_on_hand )

OVER ( PARTITION BY fact.company_id,fact.item_sku_id,

v_time.fiscal_period_cd

ORDER BY v_time.fiscal_period_cd ROWS

BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

)

,

0 ) ptd_quantity_on_hand

The second part is the actual join used to densify the data. It l iterally squeezes the

air of the result set and gives us better performance in retrieving the results. That

join is shown in the snippet below.Select

FROM Hybrid_Eaden.c_inventory_rpt fact

PARTITION BY (fact.company_id,fact.item_sku_id)

RIGHT OUTER JOIN Hybrid_Eaden.d_ea_month v_time

ON ( fact.ea_month_id = v_time.ea_month_id )


12/18


Focus on the Bitmap Table

The bitmap table is linked from the fact table to the dimension. The bitmap

(either a 1 or a 0) is then added into the select clause of the query. If we look at

our bitmap table again, and then at the usage in the select clause we can

understand where data is shown (there is a 1) or not shown in the results (there is

a 0) allowing us to create a nice reporting solution.

, (fact.sales_units*time.ly_qtd_cp)

, (fact.sales_units*time.ly_tot_qtr_cp)

, (fact.sales_units*time.ly_tot_hyr_cp)

, (fact.sales_units*time.ly_tot_yr_cp)

, (fact.sales_units*time.ly_r3)



FROM c_global_sales_affiliate fact

, d_ea_month_matrix time

WHERE fact.ea_month_id = time.from_ea_month_idThe bitmap is simply joined to the fact table but the measures in the fact are

multiplied with the value in the bitmap.

This allows each measure to either hold data, or to hold a zero value. In this case

zero means that this cell is not applicable for the measure. Rolling the totals up

will then just add a zero value and not influence any totals.

In the case where the bitmap has aone, the value is shown in the measure and is

counted in rollups and in reporting. Users do not have to figure this out, they

simply see either a value or a zero value.

Advantages and disadvantages

Both methods have their advantages and disadvantages. At Elizabeth Arden the

following points came out of the woodworks.

Analytic SQL advantage has the advantage that it outperforms the traditional

approach of writing complex sub-queries. It has the disadvantage that

performance can be impacted by because query predicates are not always passed

to the underlying view.

The main disadvantage of the bitmap method is that the query pulls in more data

than is strictly required. This of course will impact performance as well.

Both methods however do achieve their main goal of creating a simpler reporting

environment for the end user and allows the IT professionals to focus on

developing better systems.


13/18



14/18


DIMENSION BASED DATA CONVERSION

In todays world a successful project faces some interesting challenges. Mostly the

addition of new data from either new subsidiaries, or acquisitions, or other similar

systems from what the system started with. Adding data to an existing system may

sound simple, simply load the new (or old historical) data into the database and

you are done. If it was just that simple, this section would not be written.

The problem, especially with adding similar but different systems, is that in many

cases data needs to be recoded or rolled up in different ways. When loading the

new data it needs to conform to the current hierarchies and strategies. To achieve

this can be a daunting task.

Concepts

The dimension-based conversion is based on a simple but elegant concept. Rather

than changing data structures or original data loaded into a dimension to convert

data, the rollup in a hierarchy manages the changes.

This concept has a number of very compelling benefits. One is that a change isnot permanent or destructive to the original data. The original record and linkage

is still in place, it just rolls up differently. The second benefit is that these typically

can be done quickly with much less impact on the system than a normal update.

Load level vs. Reporting levels

The two benefits we mentioned earlier are achieved by adding a level to each

dimension (or better to a hierarchy) that acts as the lowest level. This lowest level

is called a load level. In other words data for the fact is always loaded at this level

in the hierarchy. The load level however is never exposed to users of the

reporting system. These users only see the so-called reporting levels.

Realigning vs. Overwriting

In many cases conversion methods are destructive. For example fact data might

be stored at a higher grain, and the new data needs to be aggregated. Doing this in

a normal load wil l convert the data once. If that turned out to be not completely

accurate, then there is no easy fix. If the source is lost by that time, there is no f ix

at all. Realigning simply keeps the new records as is in the grain they are in

originally. Overwriting is not what we want to do to our data.

An example

If we look at a table structure and data, this concept will become a lot clearer and

the benefits become visible instantly.

Lets say we have loaded three products into the dimension table (in this example

we only show a few dimension columns). The result with the booked (which is

what the load level is called in this example) and reporting levels will look like this:


15/18


booked_products reported_products source_key

A A STANDARD

C C STANDARD

D D STANDARD

Table 3 The dimension table with booked and reported levels

Now lets say we need to merge from a new vendor into this dimension, and we

do have the following mapping table that shows how a product from the new

vendor can be recoded to the Elizabeth Arden product codes (and therefore

hierarchies):

Vendor XYZ Product Elizabeth Arden Product

123 A

654 A

234 D

A C

Table 4 Recoding or matching table

With the levels as shown in Table 3 we can now quickly load the dimension and

see the following as a result after loading and recoding:

booked_products reported_products source_key

A A STANDARD

C C STANDARD

D D STANDARD

123 A VENDORXYZ

654 A VENDORXYZ

234 D VENDORXYZ

A C VENDORXYZ

Table 5 Adding new products is simple and retains history

We can now see where the big benefits come in. The table in Table 5 shows both

the original data and it shows the Elizabeth Arden product to the user . Therevenue for this dimension is now directly available in the relevant business units.

The other advantage is when a recoding is required, where for example product

234 should be linked to Elizabeth Arden product A, the dimension can be

updated instead of all related facts.


16/18


More to come

The concept of building a dimension with two separate levels playing two separate

roles, one for loading one for reporting is currently not in production at Elizabeth

Arden, but may be implemented in future.


17/18


SUMMARY AND CONCLUSION

Inventive use of an ETL tool, forward looking data modeling and embedding

smarts in data structures allows Elizabeth Arden to achieve two goals for their

business.

First, the IT staff can focus on their core capabilities of building and maintaining

systems, not on creating reports for end users. Second, business users get betterinformation quicker allowing these users to again do their job, run the business.

With Warehouse Builder Elizabeth Arden has found a tool that it can use to

implement these concepts to get the maximum benefits for the company.


18/18

data warehouse elizabeth arden wp

Documents