an analysis of the publication "an overview of data warehousing and olap technology” by...
TRANSCRIPT
![Page 1: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/1.jpg)
An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal
Michael GosheyUniversity of Minnesota, Fall 2006CSci 8701: Overview of Database Research
![Page 2: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/2.jpg)
Michael Goshey: 9/19/2006 2
Outline
1. Introduction
2. Problem Addressed
3. Major Contributions
4. Key Concepts
5. Validation Methodology
6. Assumptions
7. 2006 Rewrite
![Page 3: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/3.jpg)
Michael Goshey: 9/19/2006 3
Introduction
Selected paper S. Chaudhuri and U. Dayal, An Overview of
Data Warehousing and OLAP Technology, SIGMOD Record 26(1): 65-74(1997).
Motivation Personal Interest
![Page 4: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/4.jpg)
Michael Goshey: 9/19/2006 4
Outline
1. Introduction
2. Problem Addressed
3. Major Contributions
4. Key Concepts
5. Validation Methodology
6. Assumptions
7. 2006 Rewrite
![Page 5: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/5.jpg)
Michael Goshey: 9/19/2006 5
Problem Addressed
Problem Statement Survey: organizing the data warehousing space Differing requirements between OLTP and
OLAP Significance
Growth area Reference work establishing consensus on
terms, architectures and issues
![Page 6: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/6.jpg)
Michael Goshey: 9/19/2006 6
Outline
1. Introduction
2. Problem Addressed
3. Major Contributions
4. Key Concepts
5. Validation Methodology
6. Assumptions
7. 2006 Rewrite
![Page 7: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/7.jpg)
Michael Goshey: 9/19/2006 7
Major Contributions
Bridging the gulf between industry and academia OLTP vs. OLAP: clarifying the differences Concise survey of relevant issues, architectures
and tools Concrete list of data warehouse design and build
steps
![Page 8: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/8.jpg)
Michael Goshey: 9/19/2006 8
Outline
1. Introduction
2. Problem Addressed
3. Major Contributions
4. Key Concepts
5. Validation Methodology
6. Assumptions
7. 2006 Rewrite
![Page 9: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/9.jpg)
Michael Goshey: 9/19/2006 9
Key Concepts
Data warehouses and data marts OLTP, OLAP, ROLAP vs. MOLAP) Relational and dimensional data models Bitmap Index ETL Metadata Managed query vs. ad hoc environments Materialized views SQL extensions (cube, rollup, rank, percentile, etc.)
![Page 10: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/10.jpg)
Michael Goshey: 9/19/2006 10
Data Warehouse, Data Mart
Data Staging
Area
MetadataCatalog
Typical Data Warehouse Architecture
ETL Services
Dimensional Data Marts including atomic data
Other uses
Source Systems
Ad Hoc Query and Analysis Tools
Reporting ToolsDimensional Data Marts with
only aggregated data
![Page 11: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/11.jpg)
Michael Goshey: 9/19/2006 11
Relational or Dimensional?Categories
PK CategoryID
U1 CategoryName Description Picture
Shippers
PK ShipperID
CompanyName Phone
Order Details
PK,FK1,I2,I1 OrderIDPK,FK2,I4,I3 ProductID
UnitPrice Quantity Discount
Customers
PK CustomerID
I2 CompanyName ContactName ContactTitle AddressI1 CityI4 RegionI3 PostalCode Country Phone Fax
Suppliers
PK SupplierID
I1 CompanyName ContactName ContactTitle Address City RegionI2 PostalCode Country Phone Fax HomePage
Orders
PK OrderID
FK1,I2,I1 CustomerIDFK2,I3,I4 EmployeeIDI5 OrderDate RequiredDateI6 ShippedDateFK3,I7 ShipVia Freight ShipName ShipAddress ShipCity ShipRegionI8 ShipPostalCode ShipCountry
Employees
PK EmployeeID
I1 LastName FirstName Title TitleOfCourtesy BirthDate HireDate Address City RegionI2 PostalCode Country HomePhone Extension Photo Notes ReportsTo
Products
PK ProductID
I3 ProductNameFK2,I5,I4 SupplierIDFK1,I1,I2 CategoryID QuantityPerUnit UnitPrice UnitsInStock UnitsOnOrder ReorderLevel Discontinued
![Page 12: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/12.jpg)
Michael Goshey: 9/19/2006 12
Relational or Dimensional?
(image from http://www.laynetworks.com)
![Page 13: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/13.jpg)
Michael Goshey: 9/19/2006 13
Bitmap Indices
customer
age 0-10 age 11-20 age 21-30 age 31-40
Mary 1 0 0 0
John 0 1 0 0
Steve 0 0 1 0
Tom 0 0 0 1
Lisa 0 0 1 0
cardinality: unique values/total rows B-Tree vs. bitmap: 1% rule, uniqueness Boolean algebra directly on indices
![Page 14: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/14.jpg)
Michael Goshey: 9/19/2006 14
Outline
1. Introduction
2. Problem Addressed
3. Major Contributions
4. Key Concepts
5. Validation Methodology
6. Assumptions
7. 2006 Rewrite
![Page 15: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/15.jpg)
Michael Goshey: 9/19/2006 15
Validation Methodology
Survey paper goals Academic and industry citations Referencing tools, vendors Case studies
![Page 16: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/16.jpg)
Michael Goshey: 9/19/2006 16
Outline
1. Introduction
2. Problem Addressed
3. Major Contributions
4. Key Concepts
5. Validation Methodology
6. Assumptions
7. 2006 Rewrite
![Page 17: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/17.jpg)
Michael Goshey: 9/19/2006 17
Assumptions
Read-only environments Shortcomings
(occasional) transactional commitments the data revision problem
![Page 18: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/18.jpg)
Michael Goshey: 9/19/2006 18
Outline
1. Introduction
2. Problem Addressed
3. Major Contributions
4. Key Concepts
5. Validation Methodology
6. Assumptions
7. 2006 Rewrite
![Page 19: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/19.jpg)
Michael Goshey: 9/19/2006 19
2006 Rewrite
Changes in terminology, tools, vendors Fact constellations -> conformed dimensions Decision support -> BI Vendors and tools in BI, ETL, OLAP
Multiple user constituencies Data history difficulties
petabyte databases -> very large warehouses common
data expiry challenges slowly changing dimensions
![Page 20: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/20.jpg)
Michael Goshey: 9/19/2006 20
Slowly Changing Dimensions
CustomerID Name Status
001 Mary Johnson
Gold
CustomerID Name Status
001 Mary Johnson
Platinum
CustomerID Name Status
001 Mary Johnson
Gold
001 Mary Johnson
Platinum
CustomerID Name Original Status
Current Status
Effective Date
001 Mary Johnson
Gold Platinum 10/1/2006
Before
After: Type 1
After: Type 2
After: Type 3
CustomerID Name Status
001 Mary Johnson
Platinum
![Page 21: An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of](https://reader035.vdocuments.site/reader035/viewer/2022062305/5697bf911a28abf838c8e394/html5/thumbnails/21.jpg)
Michael Goshey: 9/19/2006 21
Questions?