business analytics and data visualizationmit.wu.ac.th/mit/images/editor/files/ch 02(3).pdf · oltp...
TRANSCRIPT
BUSINESS ANALYTICS AND
DATA VISUALIZATION
1
DATA VISUALIZATION
ITM-761 Business Intelligenceดร. สลล บญพราหมณ
2
การทาความดน*น ยากและเหนผลชา แตกจาเปนตองทา เพราะหาไมความช:วซ:งทาไดงายจะเขามาแทนท:และจะพอกพนข*นอยางรวดเรวโดยไมทนรสกตวแตละคนจงตองต *งใจและเพยรพยายามใหสดกาลงในการสรางเสรมและสะสมความดในการสรางเสรมและสะสมความด
พระบรมราโชวาทพระบาทสมเดจพระเจาอยหวพระราชทานแกผสาเรจการศกษาจากโรงเรยนนายรอยตารวจ
ณ อาคารใหม สวนอมพร วนท: 14 สงหาคม 2525
Overview: The Business Analytics (BA)
� The use of analytical methods, either manually or automatically, to derive relationships from data
� business analytics (BA) includes the access, reporting, and analysis of data supported by
3
reporting, and analysis of data supported by software to drive business performance and decision making
4
� MicroStrategy’s classification of BA tools: The
five styles of BI
1. Enterprise reporting
2. Cube analysis
The Business Analytics (BA) Field:
An Overview 5
2. Cube analysis
3. Ad hoc querying and analysis
4. Statistical analysis and data mining
5. Report delivery and alerting
� SAP’s classification of strategic enterprise
management
� Three levels of support
1. Operational
The Business Analytics (BA) Field:
An Overview 6
1. Operational
2. Managerial
3. Strategic
� Executive information and support systems
� Executive information systems (EIS)
Provides rapid access to timely and relevant information aiding in monitoring an organization’s performance
The Business Analytics (BA) Field:
An Overview 7
performance
� Executive support systems (ESS)
Also provides analysis support, communications, office automation, and intelligence support
� On-Line Analytical Processing (OLAP) is a decision support tool that allows users to analyze different dimensions of multidimensional data.
� Designed for executives looking to make sense out of their information, OLAP structures data
Online Analytical Processing (OLAP) 8
of their information, OLAP structures data hierarchically to reflect the real dimensionality of the enterprise as understood by the users.
� Users can pivot, filter, drill down and drill up data and generate numbers of views with simple mouse manipulations.
9
� OLAP structure created
from the operational data
is called an OLAP cube.
� the cube holds data more
like a 3D spreadsheet
the cube holds data more
like a 3D spreadsheet
rather than a relational
database, allowing
different views of the data
to be quickly displayed
10
� In multidimensional OLAP (MOLAP) databases, cubes
are created and stored physically, whereas in
relational OLAP (ROLAP) databases, cubes are
virtually created, based on a star or snowflake
schemaschema
11
� Star and snowflake
schemas
The OLAP Report
12
� one of the most internationally authoritative sources
of information on OLAP products and applications,
defines OLAP in five keywords: Fast Analysis of
Shared Multidimensional Information, or FASMI for
shortshort
� Fast
� The system is targeted to deliver most responses
to users within about five seconds, with the
simplest analyses taking no more than one second
and very few taking more than 20 seconds
13
� Analysis
� The system can cope with any business logic and
statistical analysis that is relevant for the
application and the user, and keep it easy enough
for the target user
application and the user, and keep it easy enough
for the target user
14
� Shared
� The system implements all the security
requirements for confidentiality and, if multiple
write access is needed, concurrent update locking
at an appropriate level.
write access is needed, concurrent update locking
at an appropriate level.
� Not all applications need users to write data back,
but for the growing number that do, the system
should be able to handle multiple updates in a
timely, secure manner
15
� Multidimensional
� The system must provide a multidimensional
conceptual view of the data, including full support
for hierarchies and multiple hierarchiesfor hierarchies and multiple hierarchies
� Information
� The capacity of various products is measured in
terms of how much input data they can handle,
not how many gigabytes they take to store it
� OLTP concentrates on processing repetitive
transactions in large quantities and conducting
simple manipulations
� OLAP involves examining many data items
16
OLAP versus OLTP
� OLAP involves examining many data items
complex relationships
� OLAP may analyze relationships and look for
patterns, trends, and exceptions
� OLAP is a direct decision support method
17
� OLTP (on-line transaction processing)
� Major task of traditional relational DBMS
� Day-to-day operations: purchasing, inventory,
banking, manufacturing, payroll, registration, banking, manufacturing, payroll, registration,
accounting, etc.
� OLAP (on-line analytical processing)
� Major task of data warehouse system
� Data analysis and decision making
18
� Distinct features (OLTP vs. OLAP):
� User and system orientation: customer vs. market
� Data contents: current, detailed vs. historical,
consolidatedconsolidated
� Database design: ER + application vs. star + subject
� View: current, local vs. evolutionary, integrated
� Access patterns: update vs. read-only but complex
queries
19
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date detailed, flat relational
historical, summarized, multidimensional detailed, flat relational
isolated summarized, multidimensional integrated, consolidated
usage repetitive ad-hoc
access read/write index/hash on prim. key
lots of scans
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
Codd’s Rules for OLAP Systems
� In 1993, E.F. Codd formulated twelve rules as the
basis for selecting OLAP tools.
20
Codd’s Rules for OLAP Systems (cont.)
� Multi-dimensional conceptual view
� Supports EIS (Executive Information System) slice
and dice operations and is usually required in
financial modeling.
21
financial modeling.
� Transparency
� Is part of an open system that supports
heterogeneous data sources. Furthermore, the end
user should not be concerned about the details of
data access or conversions.
Codd’s Rules for OLAP Systems (cont.)
22
� Accessibility
� Presents the user with a single logical schema of
the data. OLAP engines act as middleware, sitting
between heterogeneous data sources and an
OLAP front-end.
between heterogeneous data sources and an
OLAP front-end.
� Consistent reporting performance
� Performance should not degrade as the number of
dimensions in the model increases.
Codd’s Rules for OLAP Systems (cont.)
23
� Client-server architecture
� Requires open, modular systems. Not only the
product should be client/server but the server
component of an OLAP product should allow that
various clients could be attached with minimum
component of an OLAP product should allow that
various clients could be attached with minimum
effort and programming for integration
� Generic dimensionality
� Not limited to 3D and not biased toward any
particular dimension. A function applied to one
dimension should also be able to be applied to
another
Codd’s rules for OLAP
� Dynamic sparse matrix handling (null values)
� Related both to the idea of nulls in relational
databases and to the notion of compressing large
files, a sparse matrix is one in which not every cell
contains data. OLAP systems should accommodate
24
contains data. OLAP systems should accommodate
varying storage and data-handling options
� Multi-user support
� Supports multiple concurrent users, including their
individual views or slices of a common database
Codd’s Rules for OLAP Systems (cont.)
25
� Unrestricted cross-dimensional operations
� All dimensions are created equal, so all forms of
calculation must be allowed across all dimensions,
not just the measures dimensionnot just the measures dimension
� Intuitive data manipulation (slicing and dicing
(pivoting), drill-down, consolidation(drill-up), etc)
� Users shouldn't have to use menus or perform
complex multiple step operations when an
intuitive drag and drop action will do
Codd’s Rules for OLAP Systems (cont.)
26
� Flexible reporting
� Users should be able to print just what they need,
and any changes to the underlying model should
be automatically reflected in reports.be automatically reflected in reports.
� Unlimited dimensions and aggregation levels
� Supports at least 15, and preferably 20,
dimensions
Codd’s Rules for OLAP Systems (cont.)
� There are proposals to re-defined or extended the
rules. For example to also include
� Comprehensive database management tools
� Ability to drill down to detail (source record) level
27
� Ability to drill down to detail (source record) level
� Incremental database refresh
� SQL interface to the existing enterprise
environment
OLAP operations
28
� Roll-up
� Takes the current aggregation
level of fact values and does a
further aggregation on one or
more of the dimensions. more of the dimensions.
� Equivalent to doing GROUP BY
to this dimension by using
attribute hierarchy.
� Decreases a number of
dimensions - removes row
headers
29
� Drill-down
� Opposite of roll-up.
� Summarizes data at a
lower level of a
dimension hierarchy, dimension hierarchy,
thereby viewing data
in a more specialized
level within a
dimension.
� Increases a number of
dimensions - adds new
headers
30
� Slice
� Performs a selection on one
dimension of the given
cube, resulting in a sub-
cube. cube.
� Reduces the dimensionality
of the cubes.
� Sets one or more
dimensions to specific
values and keeps a subset
of dimensions for selected
values
31
� Dice
� Define a sub-cube by performing a selection of one or more dimensions. dimensions.
� Refers to range select condition on one dimension, or to select condition on more than one dimension.
� Reduces the number of member values of one or more dimensions
Categories of OLAP Tools
� OLAP tools are categorized according to the
architecture used to store and process multi-
dimensional data.
32
� There are four main categories:
� Multi-dimensional OLAP (MOLAP)
� Relational OLAP (ROLAP)
� Hybrid OLAP (HOLAP)
� Desktop OLAP (DOLAP)
1) Multi-dimensional OLAP (MOLAP)
� Use specialized data structures and multi-dimensional
Database Management Systems (MDDBMSs) to
organize, navigate, and analyze data.
� Data is typically aggregated and stored according to
33
� Data is typically aggregated and stored according to
predicted usage to enhance query performance.
� This allows users to view different aspects of data
aggregates such as sales by time period, geography, or
product. The storage is not in a relational database
� Use array technology and efficient storage
techniques that minimize the disk space
requirements through sparse data management.
� Provides excellent performance when data is used as
designed, and the focus is on data for a specific
34
Provides excellent performance when data is used as
designed, and the focus is on data for a specific
decision-support application.
� Traditionally, require a tight coupling with the
application layer and presentation layer.
MOLAP Available tools
35
� Hyperion,
� Executive Viewer,
� CFO Vision,
� BI/Analyze,
� PowerPlay, � PowerPlay,
� Business Objects,
� Genita,
� Holos,
� MS OLAP Services,
� Pilot,
� ProCube
Typical Architecture for MOLAP Tools
36
37
� MOLAP utilizes a proprietary multidimensional
database to provide OLAP analyses. The main
premise of this architecture is that data must be
stored multidimensionally to be viewed
multidimensionallymultidimensionally
� Data from various operational systems is loaded into
a multidimensional database through a series of
batch routines.
38
� Once this atomic data has been loaded into the
multidimensional database, the general approach is
to perform a series of calculations in batch to
aggregate along the dimensions and fill the
multidimensional array structures.multidimensional array structures.
� Then indices are created, and hashing algorithms are
used to improve query access time
39
� MOLAP is a two-tier, client/server architecture. The multidimensional database serves as both the database layer and the application logic layer. In the database layer, it is responsible for all data storage, access, and retrieval processes.access, and retrieval processes.
� In the application logic layer, it is responsible for the execution of all OLAP requests. The presentation layer integrates with the application logic layer and provides an interface through which the users view and request OLAP analyses.
� The client/server architecture allows multiple users to access the same multidimensional database
40
41
� MOLAP Advantages
� Excellent performance since pre-aggregation
provides quicker response time
� Availability of extensive libraries of complex � Availability of extensive libraries of complex
functions for OLAP analyses
� Optimal for slice and dice operations
� Performs better than ROLAP when data is dense
42
� MOLAP Disadvantages
� Usually more than 90% of cells are empty - issue
with sparsity
� Limited in the amount of data it can handle, since � Limited in the amount of data it can handle, since
all calculations are performed when the cube is
built. Therefore, it is not commonly used above
20-50 GB - scalability problem
� Difficult to change dimension without re-
aggregation
43
� MOLAP Disadvantages (cont)
� Data must be copied and moved into data stores
� Originated from query tools, thereby lacking the
architecturearchitecture
� Requires additional investment since cube
technology is often proprietary and does not
already exist in organizations
� Lacks security and administration features which
RDBMSs can bring
2) Relational OLAP (ROLAP)
� Fastest-growing style of OLAP technology due to
requirements to analyze ever increasing amounts of
data and the realization that users cannot store all
the data they require in MOLAP databases.
44
the data they require in MOLAP databases.
� The traditional OLAP's slice and dice functionality is
equivalent to adding a WHERE clause in the SQL
statement. The design may be structured in the form
of a star or its variations
� ROLAP performs dynamic multidimensional analysis
of data stored in a relational database, rather than in
a multidimensional database
� Supports RDBMS products using a metadata layer -
avoids need to create a static multi-dimensional data
structure - facilitates the creation of multiple multi-
dimensional views of the two-dimensional relation.
� A typical use of ROLAP is for large data size that is
45
� A typical use of ROLAP is for large data size that is
infrequently queried, such as historical data
� To improve performance, some products use SQL
engines to support the complexity of multi-
dimensional analysis, while others recommend, or
require, the use of highly denormalized database
designs such as the star schema.
46
designs such as the star schema.
ROLAP Available tools
47
� Discover 3 from Oracle,
� DSS Agent from MicroStrategy,
� MetaCube from IBM Informix,
� Platinum Beacon from Platinum, � Platinum Beacon from Platinum,
� Brio,
� Business Objects,
� Cognos Powerplay
Typical Architecture for ROLAP Tools48
49
� ROLAP accesses data stored in a data warehouse
(relational database) to provide OLAP analyses
� OLAP is a three-tier, client/server architecture. The
database layer utilizes relational databases for data
storage, access, and retrieval processes.
database layer utilizes relational databases for data
storage, access, and retrieval processes.
� The application logic layer is the ROLAP engine
which executes the multidimensional reports from
multiple users. The ROLAP engine integrates with a
variety of presentation layers, through which users
perform OLAP analyses
50
51
� ROLAP Advantages
� Well known environments (relational database)
� Can leverage functionality that comes with
relational database with ROLAP technologiesrelational database with ROLAP technologies
� Can be used with data warehouse and OLTP
systems
� No pre-aggregation is needed - avoid the data
explosion effect that some MOLAP
implementations incur with large scale models
52
� ROLAP Advantages (cont.)
� Can handle large amounts of data - the limitation is
the data size of the underlying relational database.
OLAP itself has no limitation on data amount
Full security and administration is provided through � Full security and administration is provided through
RDBMS
� Performs better than MOLAP when the data is sparse
� Performance is getting better by adding more OLAP
functions and employing various storage and query
optimization techniques
53
� ROLAP Disadvantages
� Performance can be slow, since each ROLAP report
is a SQL query in the relational database
� Does not have complex functions that are � Does not have complex functions that are
provided by OLAP tools
� Limited by SQL functionality
� Hard to maintain aggregate tables in the data
warehouse
3) Hybrid OLAP (HOLAP)
� Hybrid On-Line Analytic Processing (HOLAP) is a mixture of MOLAP and ROLAP technologies.
� For summary type query, HOLAP leverages cube technology for faster performance. When detail information is needed, it can drill through from the
54
information is needed, it can drill through from the cube into the underlying relational database.
� Cubes stored as HOLAP are smaller than equivalent MOLAP cubes and respond quicker than ROLAP cubes for queries involving summary data.
� HOLAP storage is generally suitable for cubes that require rapid query response for summaries based on a large amount of base data
55
� in order to deliver the combined strengths of MOLAP
and ROLAP technologies, HOLAP systems must comply
with the following rules
� Fast access at all levels of aggregation (MOLAP
requirement)
Fast access at all levels of aggregation (MOLAP
requirement)
� Easy aggregate maintenance (MOLAP requirement)
� Compact aggregate storage (MOLAP requirement) -
for high-level aggregates in order to economize disk
space
56
� Dynamically updated dimensions (ROLAP
requirement) - real time access to the data itself
and to rapidly changing structures
� Multidimensional view based on RDBMS metadata
(ROLAP requirement) - should point to the
Multidimensional view based on RDBMS metadata
(ROLAP requirement) - should point to the
appropriate RDBMS tables and automatically
generate required SQL statements when
modifying the multidimensional view. It reduces
development time and maintenance
HOLAP Available tools
57
� Express from Oracle,
� IBM DB 2 OLAP Server,
� Microsoft OLAP Services,
� Sagent Holos� Sagent Holos
Typical Architecture for HOLAP Tools58
59
� HOLAP Advantages
� Combined advantages of both MOLAP and ROLAP
(for a full list, look at the MOLAP and ROLAP
sections)sections)
� Can combine the ROLAP technology for sparse
regions and MOLAP for dense regions. Also ROLAP
for storing the detailed data and MOLAP for
higher-level summary data
60
� HOLAP disadvantages
� Complex - HOLAP server must support both
MOLAP and ROLAP engines and tools to combine
both storage engines and operationsboth storage engines and operations
� Functionality overlap - between storage and
optimization techniques in ROLAP and MOLAP
engines
4) Desktop OLAP (DOLAP)
� Desktop On-Line Analytic Processing (DOLAP) is
single-tier, desktop-based OLAP technology.
� It is able to download a relatively small hypercube
from a central point, usually from data mart or data
warehouse, and perform multidimensional analyses
61
warehouse, and perform multidimensional analyses
while disconnected from the source.
62
� Data sets are limited to the boundaries defined by
the user with no access to granular data.
� In general, cubes contain summarized data,
organized in a fixed structure of dimensions.
Therefore, it is ideal for well-understood, recurring
organized in a fixed structure of dimensions.
Therefore, it is ideal for well-understood, recurring
analytic questions and reporting
� As with multi-dimensional databases on the server,
OLAP data may be held on disk or in RAM, however,
some DOLAP products allow only read access.
� Most vendors of DOLAP exploit the power of
desktop PC to perform some, if not most, multi-
63
Most vendors of DOLAP exploit the power of
desktop PC to perform some, if not most, multi-
dimensional calculations.
Available tools
64
� Cognos,
� Business Objects,
� Brio,
� Crystal Decisions, � Crystal Decisions,
� Hummingbird,
� Oracle
Typical Architecture for DOLAP Tools
65
66
� DOLAP advantages
� User friendly - user can pivot and manipulate data
locally from the returned result set stored on the
desktopdesktop
� Excellent query performance - it collects, aggregates,
and calculates data in advance of the analysis
� Low cost per seat and maintenance
� Useful for mobile users who cannot always connect
to the data warehouse
� Easiest to deploy among all OLAP approaches.
67
� DOLAP disadvantage
� Limited functionality and data capacity
Reports and Queries
� Reports
� Routine reports
� Ad hoc (or on-demand) reports
� Multilingual support
68
Multilingual support
� Scorecards and dashboards
� Report delivery and alerting
�Report distribution through any touchpoint
�Self-subscription as well as administrator-based distribution
�Delivery on-demand, on-schedule, or on-event
�Automatic content personalization
Reports and Queries
� Ad hoc query
A query that cannot be determined prior to the
moment the query is issued
� Structured Query Language (SQL)
69
� Structured Query Language (SQL)
A data definition and management language for
relational databases. SQL front ends most relational
DBMS
Multidimensionality
� Multidimensionality
The ability to organize, present, and analyze data by
several dimensions, such as sales by region, by
product, by salesperson, and by time (four
70
product, by salesperson, and by time (four
dimensions)
� Multidimensional presentation
� Dimensions
� Measures
� Time
Multidimensionality
� Multidimensional database
A database in which the data are organized
specifically to support easy and quick
multidimensional analysis
71
multidimensional analysis
� Data cube
A two-dimensional, three-dimensional, or higher-
dimensional object in which each dimension of the
data represents a measure of interest
Multidimensionality
� Cube
A subset of highly interrelated data that is organized
to allow users to combine any attributes in a cube
(e.g., stores, products, customers, suppliers) with any
metrics in the cube (e.g., sales, profit, units, age) to
72
metrics in the cube (e.g., sales, profit, units, age) to
create various two-dimensional views, or slices, that
can be displayed on a computer screen
Multidimensionality
73
Multidimensionality
� Multidimensional tools and vendors
� Tools with multidimensional capabilities often work
in conjunction with database query systems and
other OLAP tools
74
other OLAP tools
Multidimensionality
75
Multidimensionality
� Limitations of dimensionality
� The multidimensional database can take up significantly
more computer storage room than a summarized relational
database
Multidimensional products cost significantly more than
76
� Multidimensional products cost significantly more than
standard relational products
� Database loading consumes significant system resources
and time, depending on data volume and the number of
dimensions
� Interfaces and maintenance are more complex in
multidimensional databases than in relational databases
Advanced BA
� Data mining and predictive analysis
� Data mining
� Predictive analysis
Use of tools that help determine the probable
77
Use of tools that help determine the probable
future outcome for an event or the likelihood of a
situation occurring. These tools also identify
relationships and patterns
Data Visualization
� Data visualization
A graphical, animation, or video presentation of data and the results of data analysis
� The ability to quickly identify important trends in corporate and market data can provide competitive
78
corporate and market data can provide competitive advantage
� Check their magnitude of trends by using predictive models that provide significant business advantages in applications that drive content, transactions, or processes
Data Visualization
� New directions in data visualization
� In the 1990s data visualization has moved into:
� Mainstream computing, where it is integrated with
decision support tools and applications
79
decision support tools and applications
� Intelligent visualization, which includes data
(information) interpretation
80
81
Housing and povertyTraffic in Madrid
Data Visualization
� New directions in data visualization
� Dashboards and scorecards
� Visual analysis
� Financial data visualization
83
� Financial data visualization
Geographic Information Systems (GIS)
An information system
that uses spatial data,
such as digitized maps.
A GIS is a combination
of text, graphics, icons,
84
of text, graphics, icons,
and symbols on maps
Geographic Information Systems (GIS)
� As GIS tools become increasingly sophisticated and affordable, they help more companies and governments understand:
85
understand:
� Precisely where their trucks, workers, and resources are located
� Where they need to go to service a customer
� The best way to get from here to there
Geographic Information Systems (GIS)
� GIS and decision making
� GIS applications are used to improve decision
making in the public and private sectors including:
�Dispatch of emergency vehicles
86
�Dispatch of emergency vehicles
�Transit management
�Facility site selection
�Drought risk management
�Wildlife management
� Local governments use GIS applications for used
mapping and other decision-making applications
Geographic Information Systems (GIS)
� GIS combined with GPS
� Global positioning
systems (GPS)
Wireless devices that
87
Wireless devices that
use satellites to enable
users to detect the
position on earth of
items (e.g., cars or
people) the devices
are attached to, with
reasonable precision
Geographic Information Systems (GIS)
� GIS and the Internet/intranets
� Most major GIS software vendors provide Web access that hooks directly to their software
� GIS can help the manager of a retail operation determine where to locate retail outlets
88
determine where to locate retail outlets
� Some firms are deploying GIS on the Internet for internal use or for use by their customers (locate the closest store location)