1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

Download 1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics

Post on 04-Jan-2016




1 download

Embed Size (px)


<ul><li><p>Topics about Data WarehousesWhat is a data warehouse?How does a data warehouse differ from a transaction processing database?What are the characteristics of a data warehouse?What are the components of a data warehousing system?How is a data warehouse created?How is a data warehouse accessed?</p></li><li><p>TPS vs. DSS</p><p>Issue</p><p>TPS/MIS</p><p>DSS</p><p>Definition</p><p>Systems to support day-to-day operations.</p><p>Systems to support ad-hoc decision making.</p><p>Users</p><p>clerks, data entry, low-level supervisors.</p><p>managers, analysts, support staff, researchers.</p><p>Design goal</p><p>Performance.</p><p>Flexibility, ease of use, ease of access.</p><p>Transaction Type</p><p>Updates.</p><p>Queries.</p><p>Query Activity</p><p>low; few joins.</p><p>high; many joins.</p></li><li><p>Transaction vs. DSS databases</p><p>Issue</p><p>Transaction database</p><p>DSS database</p><p>Content</p><p>Internal data, process-oriented.</p><p>Internal and external data.</p><p>Subject-oriented.</p><p>Data currency</p><p>Real time.</p><p>Current.</p><p>Volatile.</p><p>Batch. </p><p>Historical.</p><p>Non-volatile.</p><p>Summary level</p><p>Details of transactions; no (or very little) derived data.</p><p>Summarized; many aggregation levels.</p><p>Volume</p><p>Megabytes to gigabytes.</p><p>Gigabytes to terabytes.</p><p>Design</p><p>Normalized to prevent anomalies.</p><p>Denormalized to enhance query performance.</p></li><li><p>So, can one database support both transaction processing and decision support applications?YesNo</p></li><li><p>What is a data warehouse?A data warehouse is a database designed to support a decision support system.A data warehouse is:Integrated: It is a centralized, consolidated database integrating data from an entire organization.Subject-oriented: Data warehouse data are organized around key subjects. The data are usually arranged by topic, such as customers, products, suppliers, etc.Time-variant: Data in the warehouse contain a time dimension so that they may be used as a historical aggregation.Non-volatile: Once data enter, they seldom leave. Data are appended rather than overwritten. Data are updated in batches. </p></li><li><p>Data warehouse design example</p><p>Table</p></li><li><p>Issues in designing a data warehouseMust have a predefined subject focus.Has the potential to be very large must define the grain or granularity level of storage.Will always have a dimension of time.Will contain derived data.Will be a summary of data, rather than each detailed transaction.Does not always adhere to standard normalization rules.</p></li><li><p>CustomerTransaction Database</p><p>ProductTransaction Database</p><p>OrderTransaction Database</p><p>Data Scrubbing</p><p>Data Scrubbing</p><p>Data Scrubbing</p><p>Data Extraction</p><p>Data Extraction</p><p>Data Extraction</p><p>Data Integration</p><p>Sales Data Warehouse</p><p>Creating a Data Warehouse</p></li><li><p>Issues in creating a data warehouseHow to get accurate and complete data?How to consolidate data?Differing data meanings.Differing storage mechanisms.Differing data formats.</p></li><li><p>Components of a data warehousing systemData store.Extraction/filtering/transformation processes.End user query tools.End user visualization tools.</p></li><li><p>Two-tier data warehouse architecture</p></li><li><p>Three-tier data warehouse architecture</p></li><li><p>Accessing a data warehouseVisualization tools.Graphical.Spreadsheet format - usually Excel or Lotus look-and-feel.Dashboard. Example: http://tomcat.corda.com/superstore/sr.jspQuery tools.OLAP: Online analytical processing.Data mining: Artificial intelligence based query methods.</p></li><li><p>Online analytical processingProvides multi-dimensional data analysis techniques.Works primarily with data aggregation.Provides advanced statistical analysis.Provides advanced graphical output.Supports access to very large databases.Provides enhanced query optimization algorithms.Lots of acronyms: OLAP, ROLAP, MOLAP, HOLAP.Can be add-ons to existing products, example is Excel. Can have their own user interfaces.</p></li><li><p>OLAP vs. Data Mining questions</p><p>OLAP</p><p>Data Mining</p><p>Which customers spent the most with us in the past year?</p><p>Which types of customers are likely to spend the most with us in the coming year?</p><p>How much did the bank lose from loan defaulters within the past two years?</p><p>What are the characteristics of the customers most likely to default on their loans before the year is over?</p><p>What were the highest selling fashion items in our London stores?</p><p>What additional products are most likely to be sold to customers who buy shorts?</p><p>Which store/location made the highest sales in the past year?</p><p>In which area whould we open a new store next year?</p></li><li><p>Data miningData mining tools: analyze the data; uncover patterns hidden in the data; form computer models based on the findings; anduse the models to predict business behavior.Proactive tools.Based on artificial intelligence software such as decision trees, neural networks, fuzzy logic systems, inductive nets and classification networking.</p></li><li><p>What are some applications of data warehousing?Customer relationship management.Business process management.Order management.Strategic decision analysis.</p><p>IS 475/675 - Overview of Data WarehousingIS 475/675 - Overview of Data Warehousing</p></li></ul>