decision support and data warehouse. decision supports systems components data management function...

23
Decision Support and Data Warehouse

Post on 19-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Decision Support and Data Warehouse

Decision supports Systems Components

• Data management function– Data warehouse

• Model management function– Analytical models:

• Statistical model, management science model

• User interface– Data visualization

On-Line Analytical Processing (OLAP)• The use of a set of graphical tools that

provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques

• OLAP Operations– Cube slicing–come up with 2-D view of data– Drill-down–going from summary to more

detailed views– Roll-up – the opposite direction of drill-down– Reaggregation – rearrange the order of

dimensions

Slicing a data cube

Example of drill-down

Summary report

Drill-down with color added

Starting with summary data, users can obtain details for particular cells

Excel’s Pivot Table

• Data/Pivot Table– Drilldown, rollup, reaggregation

Data Warehouse• A subject-oriented, integrated, time-variant,

non-updatable collection of data used in support of management decision-making processes– Subject-oriented: e.g. customers, employees,

locations, products, time periods, etc.• Dimensions for data analysis

– Integrated: Consistent naming conventions, formats, encoding structures; from multiple data sources

– Time-variant: Can study trends and changes– Nonupdatable: Read-only, periodically refreshed

Data Warehouse Design- Star Schema -

• Fact table– contain detailed business data

• Ex. Line items of orders to compute total sales by product, by salesperson.

• Dimension tables– Dimension is a term used to describe any category or subjects of

the business used in analyzing data, such as customers, employees, locations, products, time periods, etc.

– contain descriptions about the subjects of the business such as customers, employees, locations, products, time periods, etc.

– Example: A sold item related to many business subjects such as salesperson, customer, product and time period.

Example:Order Processing System

Customer Order

Product

Has

Has

1 M

M

M

CID Cname City OID ODate

PIDPname

Price

RatingSalesPerson

Qty

Star Schema

FactTableLocationCodePeriodCode

RatingPIDQty

Amount

LocationDimension

LocationCodeStateCity

CustomerRatingDimension

RatingDescription

ProductDimension

PIDPname

Category

PeriodDimensionPeriodCode

YearQuarter

Can group by State, City

Snowflake Schema

FactTableLocationCodePeriodCode

RatingPIDQty

Amount

LocationDimension

LocationCodeStateCity

CustomerRatingDimension

RatingDescription

ProductDimension

PIDPname

CategoryID

ProductCategory

CategoryIDDescription

PeriodDimensionPeriodCode

YearQuarter

Can group by State, City

The ETL Process

E

T

LOne, company-wide warehouse

Periodic extraction data is not completely current in warehouse

The ETL Process

• Capture/Extract• Transform

– Scrub(data cleansing),derive– Example:

• City -> LocationCode, State, City• OrderDate -> PeriodCode, Year, Quarter

• Load and Index

ETL = Extract, transform, and load

From SalesDB to MyDataWarehouse

• Extract data from SalesDB:– Create query to get the data– Download to MyDataWareHouse

• File/Import/Save as Table

• Transform:– Transform City to Location– Transform Odate to Period

• Query FactPC

• Load data to FactTable

Star schema example

Fact table provides statistics for sales broken down by product, period and store dimensions

Dimension tables contain descriptions about the subjects of the business

Star schema with sample data

SQL GROUPING SETS

• GROUPING SETS– SELECT CITY,RATING,COUNT(CID) FROM CUSTOMERS

– GROUP BY GROUPING SETS(CITY,RATING,(CITY,RATING),())

– ORDER BY CITY;

• Note: Compute the subtotals for every member in the GROUPING SETS. () indicates that an overall total is desired.

ResultsCITY Rating COUNT(CID)-------------------- - ---------- ------------------CHICAGO A 1CHICAGO B 2CHICAGO 3LOS ANGELES A 1LOS ANGELES C 1LOS ANGELES 2SAN FRANCISCO A 2SAN FRANCISCO B 1SAN FRANCISCO 3 A 4 8

CITY R COUNT(CID)-------------------- - ---------- B 3 C 1

SQL CUBE

• Perform aggregations for all possible combinations of columns indicated.– SELECT CITY,RATING,COUNT(CID) FROM CUSTOMERS

– GROUP BY CUBE(CITY,RATING)

– ORDER BY CITY, RATING;

ResultsCITY Rating COUNT(CID)-------------------- - ------- ----------CHICAGO A 1CHICAGO B 2CHICAGO 3LOS ANGELES A 1LOS ANGELES C 1LOS ANGELES 2SAN FRANCISCO A 2SAN FRANCISCO B 1SAN FRANCISCO 3 A 4 B 3

CITY R COUNT(CID)-------------------- - ---------- C 1 8

SQL ROLLUP

• The ROLLUP extension causes cumulative subtotals to be calculated for the columns indicated. If multiple columns are indicated, subtotals are performed for each of the columns except the far-right column.– SELECT CITY,RATING,COUNT(CID) FROM CUSTOMERS– GROUP BY ROLLUP(CITY,RATING)– ORDER BY CITY, RATING;

Results

CITY Rating COUNT(CID)-------------------- - ----------CHICAGO A 1CHICAGO B 2CHICAGO 3LOS ANGELES A 1LOS ANGELES C 1LOS ANGELES 2SAN FRANCISCO A 2SAN FRANCISCO B 1SAN FRANCISCO 3 8

Need for Data Warehousing• Integrated, company-wide view of high-quality

information (from disparate databases)• Separation of operational and informational systems

and data (for improved performance)