isqs 6339, business intelligence data warehousing
DESCRIPTION
ISQS 6339, Business Intelligence Data Warehousing . Zhangxi Lin Texas Tech University. 1. Outlines. So far students should have learned Basic concepts of business intelligence The definition and importance of data warehouse In this lecture, the following topics will be covered - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/1.jpg)
ISQS 6339, Business ISQS 6339, Business IntelligenceIntelligenceData Warehousing Data Warehousing Zhangxi LinTexas Tech University
11
![Page 2: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/2.jpg)
OutlinesOutlinesSo far students have learned
◦ Basic concepts of business intelligence◦ The definition and importance of data warehouse
In this lecture, the following topics will be covered◦ SQL Server 2008 data mart case study
How to access data in a network directory How to access SQL Server 2008 on the Citrix Server How to load data from an Excel file to a database
◦ Data warehouse overview◦ Data warehouse architecture◦ Data integration
ISQS 6347, Data & Text Mining 2
![Page 3: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/3.jpg)
Data Warehousing Data Warehousing Definitions and ConceptsDefinitions and Concepts
Data warehouse◦ Video – Overview of data warehouse 2’38”A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format
Benefits of data warehouse 3’18”
3
![Page 4: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/4.jpg)
Data martData mart
DefinitionA localized data warehouse that stores only relevant data to a department or even an individual◦ Dependent data mart
A subset that is created directly from a data warehouse
◦ Independent data martA small data warehouse designed for a strategic business unit or a department
4
![Page 5: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/5.jpg)
Data MartData Mart- - The IMW CaseThe IMW CaseIMW, standing for Internet Media Works!, is an ASP in real estate information services. It is headquartered in Austin, Texas. CEO is Gary Anderson. Web page: http://www.inetworks.com
5
![Page 6: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/6.jpg)
About IMWAbout IMW
ISQS 6347, Data & Text Mining6
Based in Austin, Texas, IMW (Internet Media Works!) is an ASP, specialized mainly in web-based application development, database integration, and web development and hosting for all kinds of businesses.
IMW has been more successful in selling its e-business services for commercial real estate. Its services include lead generation, real estate transaction management, property listing, realtor membership management, real estate indices, real estate auctions, etc., with COMMREX as a complete e-business solution.
IMW used to have up to 6 full-time employees and a few part-time employees.
![Page 7: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/7.jpg)
ISQS 6347, Data & Text Mining7
Website Hosting Services
Core Membership Database Services
Core Property Listing Database Services
Optional WebsiteHosting Services
Optional Membership Database Services
Optional Property Listing Database Services
Public UserApplication
Services
Networking and System Operation ServicesPublic User Support
Internet Service Provider’s Services
IMW’s Web-Based Application Services
IMW’s Services
![Page 8: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/8.jpg)
Why need Data Mart?Why need Data Mart?Data mart complements the centralized
data warehousing based on UDM model, for the situations where UDM cannot be used◦ Legacy databases◦ Data are from nondatabase sources◦ No physical connection the centralized data
warehouse◦ Data are not clean
8
![Page 9: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/9.jpg)
Data Mart StructuresData Mart StructuresFact tables
◦ MeasuresDimension tables
◦ Dimensions and Hierarchies◦ Attributes (or columns)
Dimensional modeling – Stars and Snowflakes
9
![Page 10: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/10.jpg)
Measures Measures A numeric quantity expressing some of
the organization’s performance. The information represented by this quantity is used to support or evaluate the decision making and performance of the organization.
A measure is also called a factThe table holding measure information is
called as a fact table
Dimensions vs. Measures 2’38”
![Page 11: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/11.jpg)
11
Commrex Real Estate Operational Commrex Real Estate Operational DatabaseDatabase Users: property listors, webmaster, marketing manager of
IMW Objective: Encourage realtors to use the online ASP
services with the best information services to increase IMW’s revenue.
Value Chain ◦ Listors create their account◦ Listors post their real estate properties to the web-based
database services and pay listing fees◦ Property buyers search the website-based database and
buy properties from listors. This is the incentive for listors to use the ASP services
Business Processes◦ Listor sign up◦ Listor account management◦ Property data posting◦ Property search◦ Property database maintenance
11
![Page 12: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/12.jpg)
ISQS 6339, Data Mgmt & BI, Zhangxi Lin 12
Property ID
Listor ID Listor ID
Address
Property Type
City
Company ID
Chapter
Functions
Specializations
Comp Name
Address
Telephone #
Listor Name
UpdateDate
Feature
Property Type
Subtype 1
Type Name
Subtype 2
Subtype n
M:1
M:M
M:M
Primary Key
Secondary Key
Link to a table
Legends
Property Listing DatabaseMembership Database
IMW’s Database ERD Model
Company ID
TransactionID
PropID
UserIDM:1
![Page 13: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/13.jpg)
Commrex Data WarehousingCommrex Data Warehousing Users: CEO of IMW, IMW business analyst, IMW
marketing manager Analytic themes
◦ Fast retrieval of business key performance indicators (KPIs)
◦ Decision making on business promotions Applications
◦ Geographic distribution of property listings◦ Scorecard for main performance indicators◦ Dashboard
Questions◦ How to model data warehouse? ◦ What are required in data transformation and
preprocessing?◦ Any missing dimension for data ware housing?◦ How to perform routine data warehouse updates –
frequency, timing, etc.
![Page 14: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/14.jpg)
ISQS 6339, Data Mgmt & BI, Zhangxi Lin 14
Property ID
Listor ID Listor ID
Address
PropType
City
Company ID
Chapter
Functions
Specializations
Company ID
Address
Telephone #
Listor Name
UpdateDate
Features
PropType
…SubName
Primary Key
Secondary Key
Link to a table
Legends
Property Listing Fact Membership Dimension
IMW’s Data Warehouse Dimensional Model
Company Dimension
Property TypeDimension
Comp Name
Year
Month
Date
Quarter
![Page 15: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/15.jpg)
Data Warehouse Data Warehouse OverviewOverview
15
![Page 16: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/16.jpg)
Data Warehousing Data Warehousing CharacteristicsCharacteristics
Basic characteristics of data warehousing ◦ Subject oriented ◦ Integrated ◦ Time variant (time series)◦ Nonvolatile (not allow to change)
Others◦ Web based ◦ Relational/multidimensional ◦ Client/server ◦ Real-time ◦ Include metadata
16
![Page 17: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/17.jpg)
Data Warehousing Data Warehousing Process OverviewProcess Overview
Data in DW are constantly accumulated. ◦ Organizations continuously collect data, information,
and knowledge at an increasingly accelerated rate and store them in computerized systems
The number of users is constantly increasing.◦ The number of users needing to access the
information continues to increase as a result of improved reliability and availability of network access, especially the Internet
The organization using data warehouse relied on DW more and more
17
![Page 18: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/18.jpg)
Data Warehousing Data Warehousing More ConceptsMore Concepts
Operational data stores (ODS)A type of database often used as an interim area for a data warehouse, especially for customer information files
Enterprise data warehouse (EDW)A large-scale data warehouse used across the enterprise for decision support. It integrates different sources of information into a consolidated information system.
Metadata (Video 1’41”)Data about data. In a data warehouse, metadata describe the contents of a data warehouse and the manner of its use ◦ Syntactic metadata, structural metadata, and semantic
metadata18
![Page 19: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/19.jpg)
Data Warehousing Data Warehousing Process OverviewProcess Overview
19
![Page 20: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/20.jpg)
Data Warehousing Data Warehousing Process OverviewProcess Overview
The major components of a data warehousing process ◦Data sources ◦Data extraction ◦Data loading ◦Comprehensive database ◦Metadata ◦Middleware tools
20
![Page 21: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/21.jpg)
Data Warehouse Data Warehouse ArchitecturesArchitectures
21
![Page 22: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/22.jpg)
Three Parts of Data Three Parts of Data WarehouseWarehouse
The data warehouse that contains the data and associated software
Data acquisition (back-end) software that extracts data from legacy systems and external sources, consolidates and summarizes them, and loads them into the data warehouse
Client (front-end) software that allows users to access and analyze data from the warehouse
22
![Page 23: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/23.jpg)
Three-Tier Data Three-Tier Data WarehouseWarehouse
23
![Page 24: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/24.jpg)
Alternative Data Warehouse Alternative Data Warehouse Architectures (1)Architectures (1)
24
![Page 25: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/25.jpg)
Alternative Data Warehouse Alternative Data Warehouse Architectures (2)Architectures (2)
25
![Page 26: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/26.jpg)
Alternative Data Warehouse Alternative Data Warehouse Architectures (3)Architectures (3)
26
![Page 27: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/27.jpg)
Alternative Data Warehouse Alternative Data Warehouse Architectures (4)Architectures (4)
27
![Page 28: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/28.jpg)
Alternative Data Warehouse Alternative Data Warehouse Architectures (5)Architectures (5)
28
![Page 29: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/29.jpg)
29
Architectures ComparisonArchitectures Comparison
![Page 30: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/30.jpg)
Teradata’s EDWTeradata’s EDW
30
![Page 31: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/31.jpg)
Hadoop – for BI in the Hadoop – for BI in the ClouderaClouderaHadoop is a free, Java-based programming
framework that supports the processing of large data sets in a distributed computing environment.
Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes.
Hadoop was inspired by Google's MapReduce, a software framework in which anapplication is broken down into numerous small parts. Doug Cutting, Hadoop's creator, named the framework after his child's stuffed toy elephant.
31
![Page 32: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/32.jpg)
Apache HadoopApache Hadoop The Apache Hadoop framework is
composed of the following modules :◦Hadoop Common - contains libraries
and utilities needed by other Hadoop modules
◦Hadoop Distributed File System (HDFS).
◦Hadoop YARN - a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications.
◦Hadoop MapReduce - a programming model for large scale data processing.
ISQS 6339, Data Mgmt & BI 32
![Page 33: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/33.jpg)
MapReduceMapReduce
33
MapReduce is a framework for processing parallelizable problems across huge datasets using a large number of computers (nodes), collectively referred to as a cluster or a grid.
![Page 34: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/34.jpg)
How Hadoop OperatesHow Hadoop Operates
ISQS 6339, Data Mgmt & BI 34
![Page 35: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/35.jpg)
Cloudera’s Hadoop SystemCloudera’s Hadoop System
35
![Page 36: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/36.jpg)
Hadoop 2: Big data's big leap Hadoop 2: Big data's big leap forwardforward
The new Hadoop is the Apache Foundation's attempt to create a whole new general framework for the way big data can be stored, mined, and processed.
The biggest constraint on scale has been Hadoop’s job handling. All jobs in Hadoop are run as batch processes through a single daemon called JobTracker, which creates a scalability and processing-speed bottleneck.
Hadoop 2 uses an entirely new job-processing framework built using two daemons: ResourceManager, which governs all jobs in the system, and NodeManager, which runs on each Hadoop node and keeps the ResourceManager informed about what's happening on that node.
ISQS 6339, Data Mgmt & BI 36
![Page 37: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/37.jpg)
MapReduce 2.0 – YARNMapReduce 2.0 – YARN(Yet Another Resource (Yet Another Resource Negotiator)Negotiator)
ISQS 6339, Data Mgmt & BI 37
![Page 38: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/38.jpg)
Teradata Teradata Big Data Big Data PlatformPlatform
2013-12-02 林漳希 @ 清华大学 38
![Page 39: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/39.jpg)
Dell representation of the Dell representation of the Hadoop ecosystemHadoop ecosystem
39
![Page 40: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/40.jpg)
Nokia’s Big Data Nokia’s Big Data Architechture Architechture
2013-12-02 林漳希 @ 清华大学 40
![Page 41: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/41.jpg)
Comparison between big data Comparison between big data platform and traditional BI platformplatform and traditional BI platform
2013-12-02 林漳希 @ 清华大学 41
![Page 42: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/42.jpg)
Resolving legacy problemResolving legacy problem – Dual – Dual platformplatform
2013-12-02 林漳希 @ 清华大学 42
![Page 43: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/43.jpg)
Ten factors that potentially Ten factors that potentially affect the architecture affect the architecture selection decisionselection decision
1. Information interdependence between organizational units
2. Upper management’s information needs
3. Urgency of need for a data warehouse
4. Nature of end-user tasks5. Constraints on resources
6. Strategic view of the data warehouse prior to implementation
7. Compatibility with existing systems
8. Perceived ability of the in-house IT staff
9. Technical issues10. Social/political factors
43
![Page 44: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/44.jpg)
Data IntegrationData Integration
44
![Page 45: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/45.jpg)
Data IntegrationData Integration
Integration that comprises three major processes: ◦ data access, ◦ data federation, and ◦ change capture.
When these three processes are correctly implemented, data can be accessed and made accessible to an array of ETL and analysis tools and data warehousing environments
ETL Tools 4’56”
45
![Page 46: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/46.jpg)
Data IntegrationData Integration
Enterprise application integration (EAI)A technology that provides a vehicle for pushing data from source systems into a data warehouse, including application functionality integration. Recently service-oriented architecture (SOA) is applied
Enterprise information integration (EII) An evolving tool space that promises real-time data integration from a variety of sources, such as relational databases, Web services, and multidimensional databases
Extraction, transformation, and load (ETL)A data warehousing process that consists of extraction (i.e., reading data from a database), transformation (i.e., converting the extracted data from its previous form into the form in which it needs to be so that it can be placed into a data warehouse or simply another database), and load (i.e., putting the data into the data warehouse)
46
![Page 47: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/47.jpg)
Transformation Tools: To Transformation Tools: To purchase or to Build in-Housepurchase or to Build in-House
Issues affect whether an organization will purchase data transformation tools or build the transformation process itself ◦ Data transformation tools are expensive◦ Data transformation tools may have a long learning curve◦ It is difficult to measure how the IT organization is doing
until it has learned to use the data transformation tools Important criteria in selecting an ETL tool
◦ Ability to read from and write to an unlimited number of data source architectures
◦ Automatic capturing and delivery of metadata◦ A history of conforming to open standards◦ An easy-to-use interface for the developer and the
functional user 47
![Page 48: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/48.jpg)
Open Source Software for Open Source Software for Big DataBig DataOracle VM VirtualBoxCloudera Hadoop - Get Started
With Enterprise HadoopHortonworks Data Platform -
Hortonworks.comGoogle Hadoop Solutions -
google.comHadoop on Google Cloud PlatformHadoop & NoSQL - MarkLogic.com
48
![Page 49: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/49.jpg)
Structure and Components of Structure and Components of Business IntelligenceBusiness Intelligence
49
SSMS SSIS SSAS
SSRS
SASEM
SASEG
MS SQL Server 2008
BIDS
![Page 50: ISQS 6339, Business Intelligence Data Warehousing](https://reader036.vdocuments.site/reader036/viewer/2022062310/56815d51550346895dcb5ac8/html5/thumbnails/50.jpg)
Exercise 1 – Walk through Exercise 1 – Walk through data warehousing processdata warehousing process Learning Objectives
◦ To gain a general impression how to use SQL Server 2008 to implement a data mart
Tasks◦ Create your database with SSMS, named as
ISQS6339_lastname◦ Import data from Commrex_2011.xls◦ Use SSMS to create a ERD diagram◦ Create a SSAS project using BIDS◦ Define data source, data source view, and cube
Deliverable: ◦ One-page printout of the screenshot of the cube diagram
50