1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.

Download 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.

Post on 04-Jan-2016

213 views

Category:

Documents

1 download

TRANSCRIPT

  • Topics about Data WarehousesWhat is a data warehouse?How does a data warehouse differ from a transaction processing database?What are the characteristics of a data warehouse?What are the components of a data warehousing system?How is a data warehouse created?How is a data warehouse accessed?

  • TPS vs. DSS

    Issue

    TPS/MIS

    DSS

    Definition

    Systems to support day-to-day operations.

    Systems to support ad-hoc decision making.

    Users

    clerks, data entry, low-level supervisors.

    managers, analysts, support staff, researchers.

    Design goal

    Performance.

    Flexibility, ease of use, ease of access.

    Transaction Type

    Updates.

    Queries.

    Query Activity

    low; few joins.

    high; many joins.

  • Transaction vs. DSS databases

    Issue

    Transaction database

    DSS database

    Content

    Internal data, process-oriented.

    Internal and external data.

    Subject-oriented.

    Data currency

    Real time.

    Current.

    Volatile.

    Batch.

    Historical.

    Non-volatile.

    Summary level

    Details of transactions; no (or very little) derived data.

    Summarized; many aggregation levels.

    Volume

    Megabytes to gigabytes.

    Gigabytes to terabytes.

    Design

    Normalized to prevent anomalies.

    Denormalized to enhance query performance.

  • So, can one database support both transaction processing and decision support applications?YesNo

  • What is a data warehouse?A data warehouse is a database designed to support a decision support system.A data warehouse is:Integrated: It is a centralized, consolidated database integrating data from an entire organization.Subject-oriented: Data warehouse data are organized around key subjects. The data are usually arranged by topic, such as customers, products, suppliers, etc.Time-variant: Data in the warehouse contain a time dimension so that they may be used as a historical aggregation.Non-volatile: Once data enter, they seldom leave. Data are appended rather than overwritten. Data are updated in batches.

  • Data warehouse design example

    Table

  • Issues in designing a data warehouseMust have a predefined subject focus.Has the potential to be very large must define the grain or granularity level of storage.Will always have a dimension of time.Will contain derived data.Will be a summary of data, rather than each detailed transaction.Does not always adhere to standard normalization rules.

  • CustomerTransaction Database

    ProductTransaction Database

    OrderTransaction Database

    Data Scrubbing

    Data Scrubbing

    Data Scrubbing

    Data Extraction

    Data Extraction

    Data Extraction

    Data Integration

    Sales Data Warehouse

    Creating a Data Warehouse

  • Issues in creating a data warehouseHow to get accurate and complete data?How to consolidate data?Differing data meanings.Differing storage mechanisms.Differing data formats.

  • Components of a data warehousing systemData store.Extraction/filtering/transformation processes.End user query tools.End user visualization tools.

  • Two-tier data warehouse architecture

  • Three-tier data warehouse architecture

  • Accessing a data warehouseVisualization tools.Graphical.Spreadsheet format - usually Excel or Lotus look-and-feel.Dashboard. Example: http://tomcat.corda.com/superstore/sr.jspQuery tools.OLAP: Online analytical processing.Data mining: Artificial intelligence based query methods.

  • Online analytical processingProvides multi-dimensional data analysis techniques.Works primarily with data aggregation.Provides advanced statistical analysis.Provides advanced graphical output.Supports access to very large databases.Provides enhanced query optimization algorithms.Lots of acronyms: OLAP, ROLAP, MOLAP, HOLAP.Can be add-ons to existing products, example is Excel. Can have their own user interfaces.

  • OLAP vs. Data Mining questions

    OLAP

    Data Mining

    Which customers spent the most with us in the past year?

    Which types of customers are likely to spend the most with us in the coming year?

    How much did the bank lose from loan defaulters within the past two years?

    What are the characteristics of the customers most likely to default on their loans before the year is over?

    What were the highest selling fashion items in our London stores?

    What additional products are most likely to be sold to customers who buy shorts?

    Which store/location made the highest sales in the past year?

    In which area whould we open a new store next year?

  • Data miningData mining tools: analyze the data; uncover patterns hidden in the data; form computer models based on the findings; anduse the models to predict business behavior.Proactive tools.Based on artificial intelligence software such as decision trees, neural networks, fuzzy logic systems, inductive nets and classification networking.

  • What are some applications of data warehousing?Customer relationship management.Business process management.Order management.Strategic decision analysis.

    IS 475/675 - Overview of Data WarehousingIS 475/675 - Overview of Data Warehousing

Recommended

View more >