Transcript
Page 1: 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

1

Topics about Data Warehouses

What is a data warehouse?

How does a data warehouse differ from a transaction processing database?

What are the characteristics of a data warehouse?

What are the components of a data warehousing system?

How is a data warehouse created?

How is a data warehouse accessed?

Page 2: 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

TPS vs. DSS

Issue TPS/MIS DSS

Definition Systems to support day-to-day operations.

Systems to support ad-hoc decision making.

Users clerks, data entry, low-level supervisors.

managers, analysts, support staff, researchers.

Design goal Performance. Flexibility, ease of use, ease of access.

Transaction Type

Updates. Queries.

Query Activity

low; few joins. high; many joins.

Page 3: 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

Transaction vs. DSS databases

Issue Transaction database

DSS database

Content Internal data, process-oriented.

Internal and external data.

Subject-oriented.

Data currency

Real time.

Current.

Volatile.

Batch.

Historical.

Non-volatile.

Summary level

Details of transactions; no (or very little) derived data.

Summarized; many aggregation levels.

Volume Megabytes to gigabytes.

Gigabytes to terabytes.

Design Normalized to prevent anomalies.

Denormalized to enhance query performance.

Page 4: 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

So, can one database support both transaction processing and decision

support applications?Yes No

Page 5: 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

What is a data warehouse?

A data warehouse is a database designed to support a decision support system.

A data warehouse is:

Integrated: It is a centralized, consolidated database integrating data from an entire organization.

Subject-oriented: Data warehouse data are organized around key subjects. The data are usually arranged by topic, such as customers, products, suppliers, etc.

Time-variant: Data in the warehouse contain a time dimension so that they may be used as a historical aggregation.

Non-volatile: Once data enter, they seldom leave. Data are appended rather than overwritten. Data are updated in batches.

Page 6: 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

Data warehouse design example

SalesFact

PK DayPK MonthPK YearPK,FK4 ProductIDPK,FK1 CustomerTypeIDPK,FK2 EmployeeIDPK,FK3 LocationID

SalesDollars #ofSales

Product

PK ProductID

Description

Employee

PK EmployeeID

Name

CustomerType

PK CustomerTypeID

Description

Location

PK LocationID

Description

Page 7: 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

7

Issues in designing a data warehouse

Must have a predefined subject focus.

Has the potential to be very large – must define the “grain” or granularity level of storage.

Will always have a dimension of time.

Will contain derived data.

Will be a summary of data, rather than each detailed transaction.

Does not always adhere to standard normalization rules.

Page 8: 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

CustomerTransactionDatabase

ProductTransactionDatabase

OrderTransactionDatabase

DataScrubbing

DataScrubbing

DataScrubbing

DataExtraction

DataExtraction

DataExtraction

DataIntegration

Sales DataWarehouse

Creating aData

Warehouse

Page 9: 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

9

Issues in creating a data warehouse

How to get accurate and complete data?

How to consolidate data?

Differing data meanings.

Differing storage mechanisms.

Differing data formats.

Page 10: 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

10

Components of a data warehousing system

Data store.

Extraction/filtering/transformation processes.

End user query tools.

End user visualization tools.

Page 11: 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

Two-tier data warehouse architecture

Data warehouse

Operationaldatabase

Operationaldatabase

Externaldata source

EDM

Summarizeddata

Transformationprocess

Data warehouseserver

User departments

Page 12: 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

Three-tier data warehouse architecture

Data warehouse

Operationaldatabase

Operationaldatabase

Externaldata source

EDM

Summarizeddata

Transformationprocess

Data warehouseserver

Userdepartments

Data mart

Data mart

Data mart tier

Extractionprocess

Page 13: 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

13

Accessing a data warehouse

Visualization tools.

Graphical.

Spreadsheet format - usually Excel or Lotus look-and-feel.

Dashboard. Example: http://tomcat.corda.com/superstore/sr.jsp

Query tools.

OLAP: Online analytical processing.

Data mining: Artificial intelligence based query methods.

Page 14: 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

14

Online analytical processing

Provides multi-dimensional data analysis techniques.

Works primarily with data aggregation.

Provides advanced statistical analysis.

Provides advanced graphical output.

Supports access to very large databases.

Provides enhanced query optimization algorithms.

Lots of acronyms: OLAP, ROLAP, MOLAP, HOLAP.

Can be add-ons to existing products, example is Excel. Can have their own user interfaces.

Page 15: 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

OLAP vs. Data Mining questionsOLAP Data Mining

Which customers spent the most with us in the past year?

Which types of customers are likely to spend the most with us in the coming year?

How much did the bank lose from loan defaulters within the past two years?

What are the characteristics of the customers most likely to default on their loans before the year is over?

What were the highest selling fashion items in our London stores?

What additional products are most likely to be sold to customers who buy shorts?

Which store/ location made the highest sales in the past year?

In which area whould we open a new store next year?

Page 16: 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

16

Data mining

Data mining tools:

analyze the data;

uncover patterns hidden in the data;

form computer models based on the findings; and

use the models to predict business behavior.

Proactive tools.

Based on artificial intelligence software such as decision trees, neural networks, fuzzy logic systems, inductive nets and classification networking.

Page 17: 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

17

What are some applications of data warehousing?

Customer relationship management.

Business process management.

Order management.

Strategic decision analysis.


Top Related