decision support and date warehouse jingyi lu. outline decision support system olap vs. oltp what is...
TRANSCRIPT
Decision Support andDate Warehouse
Jingyi Lu
Outline
Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform, and Load (ETL)
Decision Support System
Information technology to help the knowledge worker(executive, manager, analyst) make faster and better decisions.
– What were the sales volumes by region and product category for the last year?
– Which orders should we fill to maximize revenues?
– Will a 10% discount increase sales volume sufficiently?
Decision Support Systems
Created to facilitate the decision making process
So much information that it is difficult to extract it all from a traditional database
Need for a more comprehensive data storage facility
Data Warehouse
Decision Support Systems
Extract Information from data to use as the basis for decision making
Used at all levels of the Organization Tailored to specific business areas Ad Hoc queries to retrieve and display
information Combines historical operation data with
business activities
Decision Support Systems
OLAP vs. OLTP
OLTP (On-line Transaction Processing): is characterized by a large number of short on-line transactions .-----> Operational database
OLAP (On-line Analytical Processing):is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations.------> Data Warehouse
OLAP vs. OLTP
O L T P O L A P
u sers c le rk , IT p ro fe ss io n a l k n o w le d g e w o rk e r
fu n ctio n d ay to d ay o p e ra tio n s d ec is io n su p p o rt
D B d esig n a p p lic a tio n -o rien te d su b jec t-o rien te d
d a ta c u rren t, u p -to -d a te d e ta ile d , fla t re la tio n a l iso la ted
h is to ric a l, su m m arize d , m u ltid im e n sio n a l in teg ra ted , co n so lid a ted
u sa g e re p e titiv e a d -h o c
a ccess re ad /w rite in d e x /h a sh o n p rim . k ey
lo ts o f sca n s
u n it o f w o rk sh o rt, s im p le tran sac tio n c o m p lex q u e ry
# reco rd s a ccessed te n s m illio n s
# u sers th o u sa n d s h u n d red s
D B size 1 0 0 M B -G B 1 0 0 G B -T B
m etr ic tra n sac tio n th ro u g h p u t q u e ry th ro u g h p u t, re sp o n se
What is a Data Warehouse
The repository for the DSS is the DATA WAREHOUSE
Definition: Integrated, Subject-Oriented, Time-Variant, Nonvolatile database that provides support for decision making.
Integrated
The data warehouse is a centralized, consolidated database that integrated data derived from the entire organization
Multiple Sources Diverse Sources Diverse Formats
Subject-Oriented
Data is arranged and optimized to provide answer to questions from diverse functional areas
Data is organized and summarized by topic Sales / Marketing / Finance / Distribution /
Etc.
Time-Variant
The Data Warehouse represents the flow of data through time
Can contain projected data from statistical models
Data is periodically uploaded then time-dependent data is recomputed
Nonvolatile
Once data is entered it is NEVER removed Represents the company’s entire history
Near term history is continually added to it Always growing Must support terabyte databases and
multiprocessors
Read-Only database for data analysis and query processing
Dimensional Modeling Dimension
dimension is a data element that categorizes each item in a data set into non-overlapping regions
Facts a value or measurement, which represents a fact about the
managed entity or system.
typically numeric values that can be aggregated
Dimensional Modeling Database is a set of facts (points) in a
multidimensional space Fact tables
contains business facts or measures and foreign keys which refer to primary keys in the dimension tables
Dimension tables Each dimension table has a set of attributes
e.g., Day, Month, Year of Date
Attributes of a dimension may be related by partial order
Hierarchy: e.g., Day > Month > Year
Example of Star Schema
Example of Snowflake Schema
ETL
ETL
Extraction Transformation Loading – ETL To get data out of the source and load it into the data
warehouse – simply a process of copying data from one database to other
Data is extracted from an OLTP database, transformed to match the data warehouse schema and loaded into the data warehouse database
Many data warehouses also incorporate data from non-OLTP systems such as text files, legacy systems, and spreadsheets; such data also requires extraction, transformation, and loading
When defining ETL for a data warehouse, it is important to think of ETL as a process, not a physical implementation
ETL ETL is often a complex combination of process and
technology that consumes a significant portion of the data warehouse development efforts and requires the skills of business analysts, database designers, and application developers
It is not a one time event as new data is added to the Data Warehouse periodically – monthly, daily, hourly
Because ETL is an integral, ongoing, and recurring part of a data warehouse
Automated Well documented Easily changeable
ETL Staging Database
ETL operations should be performed on a relational database server separate from the source databases and the data warehouse database
Creates a logical and physical separation between the source systems and the data warehouse
Minimizes the impact of the intense periodic ETL activity on source and data warehouse databases