ist722 data warehousing - syracuse universityclasses.ischool.syr.edu/...of-the-datawarehouse.pdf ·...
TRANSCRIPT
![Page 1: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/1.jpg)
IST722 Data Warehousing
Components of the Data Warehouse
Michael A. Fudge, Jr.
![Page 2: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/2.jpg)
Recall: Inmon’s CIF
The CIF is a reference architecture
![Page 3: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/3.jpg)
Understanding the Diagram
The CIF is a reference architecture
![Page 4: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/4.jpg)
CIF Components
![Page 5: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/5.jpg)
External World & Applications
The CIF is a reference architecture
![Page 6: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/6.jpg)
External World & Applications
• External World – the people and systems that generate operational data.
• Applications – the systems which provide the source for the operational data.
• Examples: ERP’s, Business Applications, Internet data, external data streams.
• These are the inputs and data sources for the CIF.• OLTP Systems – Operational data, transaction-
oriented.
![Page 7: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/7.jpg)
Integration & Transformation Layer
The CIF is a reference architecture
![Page 8: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/8.jpg)
Integration & Transformation Layer
• I&T layer – takes un-integrated data from multiple sources and integrates and consolidates it.
• Computer programs are written to transform data from the external world into corporate data.
• The data come from a variety of sources and in both structured and un-structured formats.
• Today’s Database Management Systems provide tooling to assist with this process.
• This is the most difficult and time-consuming component of the CIF.
• Two approaches: ETL and ELT
![Page 9: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/9.jpg)
ETL – Extract Transform Load
• The data transformation occurs over staged data.• The source data is not stored in the warehouse.
![Page 10: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/10.jpg)
ELT – Extract Load Transform
• The data transformation occurs over warehoused data.• The staged data is stored in the warehouse.
![Page 11: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/11.jpg)
Operational Data Store
The CIF is a reference architecture
![Page 12: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/12.jpg)
Operational Data Store• Integrated, detailed, and current data from the
External World and Applications.• Consolidated from disparate sources.• Does not grow over time.• Performs similarly to a transactional database.• Structured differently than a data warehouse, and
therefore should be stored as a separate database.• Receives data from I&T layer sends data to the data
warehouse.• The data warehouse can populate it, too.• Think of it as a consolidated operational database.
![Page 13: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/13.jpg)
Enterprise Data Warehouse
The CIF is a reference architecture
![Page 14: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/14.jpg)
Enterprise Data Warehouse
• Subject-oriented, integrated, summarized, and current data from the External World and Applications.
• Optimized for query performance. • Structured differently than operational data,
typically in a dimensional model.• Receives data from I&T layer and the ODS.• Use as a source for data marts and decision support
systems. • Grows in size over time due to historical data.• The heart of the CIF.
![Page 15: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/15.jpg)
ODS vs. EDWCharacteristic Operational Data Store Data WarehousePrimary Purpose Run the business on a
current basisSupport managerial decision making
Design Goal Performance throughput,availability
Easy reporting and analytics
Primary Users Clerks, salespersons,administrators
Managers, business analysis, customers
Subject‐Oriented Yes YesIntegrated Yes YesDetailed Data Yes YesSummary Data No YesTime of Data Current data Historical snapshotsUpdates Frequent small updates Periodic batch updatesQueries Simple queries on a few
rowsComplex queries on several rows
![Page 16: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/16.jpg)
Why No ODS in the EDW?
I need fast updates!
I need query performance!
You can’t have both!
![Page 17: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/17.jpg)
Data Marts
The CIF is a reference architecture
![Page 18: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/18.jpg)
Data Marts• A collection of data tailored to the informational
needs of a department or business process. • Easy to control, low cost, and customizable due to
their limited scope.• Receive their inputs from the Enterprise Data
Warehouse.• Are source data for Online Analytical Processing
(OLAP) engines.
![Page 19: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/19.jpg)
OLAPROLAP MOLAP
• Uses a Relational Database Management System
• Data design is the Star Schema
• Built on well-known relational concepts
• In the EDW.
• Uses a Multi-Dimensional Database Management System
• Data design is the Cube
• Highly flexible.• Data Marts
Typical implementations have the ROLAP star schema feed the MOLAP cube
![Page 20: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/20.jpg)
ROLAP – Star Schema• Stored in a
relationalDBMS
• Fact table is M-M relationship among dimensions.
![Page 21: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/21.jpg)
MOLAP ‐ Cube• Stored in a
Multi-Dimensional DBMS
• Facts are pre-aggregated across all dimensions for improved performance.
![Page 22: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/22.jpg)
DSS Applications
The CIF is a reference architecture
![Page 23: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/23.jpg)
Decision‐Support Systems• Business Intelligence.• Front-ends to ROLAP and OLAP Engines.• Help us explore and visualize information at
a high level
![Page 24: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/24.jpg)
In Summary…• The CIF is a reference architecture for building out
an information ecosystem.• Applications from the external world are inputs into
the CIF.• The Integration & Transformation Layer transforms
transactional data into corporate data.• The Operational Data Store contains consolidated,
non-historical data.• The Enterprise Data Warehouse contains
consolidated historical data.• Data marts are tailored to the informational needs
of a department or business process.
![Page 25: IST722 Data Warehousing - Syracuse Universityclasses.ischool.syr.edu/...of-the-Datawarehouse.pdf · Enterprise Data Warehouse • Subject-oriented, integrated, summarized, and current](https://reader034.vdocuments.site/reader034/viewer/2022050506/5f9806da35038c0eeb2d1a59/html5/thumbnails/25.jpg)
IST722 Data Warehousing
Components of the Data Warehouse
Michael A. Fudge, Jr.