chapter 5 business intelligence: data warehousing, data acquisition, data mining, business...

49
CHAPTER 5 CHAPTER 5 BUSINESS INTELLIGENCE: DATA BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND MINING, BUSINESS ANALYTICS, AND VISUALIZATION VISUALIZATION 5-1

Upload: dylan-taylor

Post on 26-Dec-2015

280 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

CHAPTER 5CHAPTER 5BUSINESS INTELLIGENCE: DATA BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND MINING, BUSINESS ANALYTICS, AND VISUALIZATIONVISUALIZATION

5-1

Page 2: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Data, Information, Knowledge

5-2

Page 3: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Data Collection, Problems, and Quality

Data Collection: could be done manually or by instruments and sensors

Data collection methods are surveys (using questionnaires), observations (using video cameras), and collecting information from experts (e.g., using interviews). In addition, sensors and scanners are used for automatic data collection.

Suggest a reliable method of data collection to be used to identify a customer buying patterns.

5-3

Page 4: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Data Collection, Problems, and Quality (con.)

5-4

Data Problems The major DSS data problems are summarized in following table along with some possible solutions

Page 5: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Data Collection, Problems, and Quality (con.)

5-5

Data quality determines the usefulness of data as well as the quality of the decisions based on them.

Data quality problems are divided into following four categories and dimensions:

Contextual data quality Intrinsic data quality Accessibility data quality Representation data quality

Often neglected or casually handled Problems exposed when data is summarized

Page 6: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Data Integrity 5-6

Data integrity assures the accuracy and consistency of data

One of the major issues of DQ is data integrity

Data integrity issues Uniformity Version Completeness check Conformity check Genealogy or drill-down

Page 7: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Data Access and Integration5-7

Recognize what to access • Integrate disparate and

heterogeneous databases to develop enterprise-wide systems

XML becomes standard language for database integration and data, transfer

Page 8: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Database Management Systems

5-8

Software program for managing a database

Manages data (i.e. update , delete , insert, sort, manipulate and retrieve data)

Generates reports Better data security Combines with modeling language for

construction of DSS

Page 9: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Database Models

Relational Flat, two-dimensional tables with multiple

access queries It is simple for the user to learn & easily

expanded or altered Can be accessed in a number of formats not

anticipated at the time of the initial design and development of the database

It can support large amount of data Hierarchical

Top down, like a tree Fields have only one “parent”, each “parent”

can have multiple “children” quick & useful mainly in transaction processing

Network Relationships created through linked lists, using

pointers “Children” can have multiple “parents” It can save storage space through the sharing of

some items

5-9

Page 10: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Database Models (con.)5-10

Object oriented Data analyzed at conceptual level Inheritance, abstraction, encapsulation

Multimedia Based Multiple data formats like JPEG, GIF, bitmap, PNG, sound,

video, virtual reality Requires specific hardware for full feature availability

Document Based Document storage and management

Intelligent Intelligent agents and ANN

Inference engines

Page 11: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Data Warehouse

© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang

5-11

is a comprehensive database that supports all decision analysis required by an organization by providing summarized and detailed information.

It has access to all information relevant to the organization, which may come from many different sources, both internal and external.

Page 12: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Data Warehouse (con.) Data extraction:

get data from sources Data cleaning:

detect errors in the data and rectify them when possible

Data transformation: convert data from host format to warehouse

format , check integrity Load:

sort, summarize, consolidate, compute views, and build indices and partitions

propagate the updates from the data sources to the warehouse

12

Page 13: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Data warehouse characteristicsSubject orientedData from both internal and external sources is presentedScrubbed so that data from heterogeneous sources are standardizedTime-variant Nonvolatile

Read onlyNot normalized; may be redundantMetadata included

5-13

Page 14: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Characteristics of Data Warehouses- Subject oriented Organized around major subjects, such as

product, sales.

Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing.

Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision process.

14

Page 15: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Characteristics of Data Warehouses- Integrated Constructed by integrating multiple,

heterogeneous data sources.

Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions

(e.g.,LastName and FamilyName in DB1 and DB2 have the same signification)

encoding structures (e.g, Attribute User_Id is along int in DB1 and it is a string in DB2

attribute measures (e.g, cm vs inch) …

15

Page 16: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Characteristics of Data Warehouses- Time Variant Data warehouse data : provide

information from a historical perspective (e.g., past 5-10 years)

Every data in the data warehouse contains an element of time.

16

Page 17: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Characteristics of Data Warehouses- Non Volatile Operational update of data doesn’t occur

in the data warehouse environment.

Doesn't require transaction processing, recovery, and concurrency control mechanism.

Require only two operations in data accessing Initial loading of data and quering.

17

Page 18: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Characteristics of Data Warehouses- Metadata included Metadata refers to data about data The primary purpose of metadata should

be to provide context to the data; that is, enriching information leading to knowledge

Plays vital role in explaining how , why, and where data can be found, retrieved, stored and used efficiently in an information system

5-18

Page 19: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Data Warehouse vs. Heterogeneous DBMS Traditional heterogeneous DB

integration: Build wrappers/mediators on top of heterogeneous

databases Query driven approach

A query posed to a client site, will be transformed into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer set

Data warehouse: Update-driven

Information from heterogeneous sources is integrated in advance and stored in warehouses for direct query and analysis

19

Page 20: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Data Warehouse vs. operational databases

DW Traditional DB

Large amount of data from multiple sources that may include different DB models or files acquired from independent systems and platforms.

It is a transactional (relational, object-oriented)

Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. Optimizes for retrieval.

Focusing on daily operations or transaction processing Optimizes for routine transaction processing

Provide information from a historical perspective (e.g., past 5-10 years).

Current value data.

It is nonvolatile. In traditional DB ,transactions are the agent of change to the database.

Supports DSS, Data Mining and OLAP. Supports OLTP.

20

Page 21: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

From tables to Data cubes A data warehouse is based on a multidimensional data

model which views data in the form of data cube. A data cube, such as sales, allows data to be modeled

and viewed in multiple dimensions: Dimension tables contains descriptions about the

subject of the business. such as item (item_name, brand, type) or time (day,

week, month, quarter, year Fact table contain a factual or quantitative data

Fact table also contains measures (such as dollars_sold) and keys to each of the related dimension tables.

21

Page 22: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

From tables to Data cubes (cont.)

Relational representation of pivot table

22

Page 23: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

From tables to Data cubes (cont.)

2-D view of sales cross-tabulation (pivot table)

23

Page 24: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

From tables to Data cubes (cont.)

24

Page 25: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Conceptual Modeling of Data Warehouses Modeling data warehouses: dimensions &

measures Star schema: a fact table in the middle

connected to a set of dimension tables.

Snowflake schema: a refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension table, forming a shape similar to snowflake.

Fact constellations: multiple fact tables share dimension tables, viewed as a collection of stars.

25

Page 26: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Example of Star Schema26

Page 27: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Example of Snowflake Schema

27

Page 28: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Example of Fact Constellation

28

Page 29: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Multidimensional Data

Dimensions are : product, month, region

Measure is sales_amount

29

Page 30: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Data Marts5-30

It is a subset of data warehouse, typically consisting of single subject are

Dependent Created from warehouse Replicated

Functional subset of warehouse

Independent Scaled down, less expensive version of data

warehouse Designed for a department or SBU or department Organization may have multiple data marts

Difficult to integrate

Page 31: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

OLAP

It refers to variety of activities usually performed by end users in online systems.

No agreement on what activities are considered OLAP. However, one OLAP tool includes such activities as: Requesting ad hoc report and graphs Conducting statistical analysis Modeling and visualization capabilities Building DSS

5-31

Page 32: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

OLAP Tools Known as business intelligence, business analytics, decision support, data access,

database front ends OLAP vs. OLTP tools

Codd’s 12 rules of OLAP tool Multidimensional conceptual view Transparency Accessibility Consistent reporting performance Client-server architecture Generic dimensionality Dynamic sparse matrix handling Multi-user support Unrestricted cross-dimensional operations Intuitive data manipulation Flexible reporting Unlimited dimensions and aggregation levels

5-32

Page 33: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

OLTP vs. OLAP

OLTP

(On Line Transaction Processing)

OLAP

(On Line Analytical Processing)

User Any one Decision-makers, analysts.

Function Day to day operations. Decision support.

DB Design Application-oriented (E-R based) Subject-oriented (Star, snowflake)

Data Current. Historical.

View Detailed. Summarized.

Access Read/write. Read Mostly.

# Records accessed

Tens. Millions.

#Users Thousands. Hundreds.

Db size 100 MB-GB. 100 GB-TB.

33

Page 34: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Typical OLAP operations

Roll up (drill-up): summarize data by climbing up hierarchy by dimension reduction

Drill down (roll down): reverse of roll-up from higher level summary to lower level summary or detailed data, introducing new dimensions

Slice and dice: project and select Slice

Performs a selection on one dimension of the given cube, resulting in a sub-cube.

Reduces the dimensionality of the cubes. Dice

Refers to range select condition on one dimension, or to select condition on more than one dimension.

Reduces the number of member values of one or more dimensions. Pivot (rotate): reorient the cube, visualization, 3D to series of 2D planes.

34

Page 35: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

OLAP-Roll up (drill-up)35

Roll-Up

Food Line Outdoor Line CATEGORY_total

Canada 29,116.5 69,310 98,426.5

Mexico 12,743.5 24,284 37,027.5

United States 102,561.5 232,679 335,240.5

Food Line Outdoor Line CATEGORY_total North America 144,421.5 326,273 470,694.5

Page 36: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

OLAP-Drill down (roll down)

36

Food Line Outdoor Line CATEGORY_total Asia 59,728 151,174 210,902

Food Line Outdoor Line CATEGORY_total

Malaysia 618 9,418 10,036

China 33,198.5 74,165 107,363.5

India 6,918 0 6,918

Japan 13,871.5 34,965 48,836.5

Singapore 5,122 32,626 37,748

Belgium 7797.5 21,125 28,922.5

Drill-Down

Page 37: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

OLAP-Slice37

Slice

Food Line Outdoor Line CATEGORY_total North America 144,421.5 326,273 470,694.5

992,481690,751301,730REGION_total

470,694.5326,273144,421.5North America

310,884.5213,30497,580.5Europe

210,902151,17459,728Asia

CATEGORY_total

Outdoor Line

Food Line

992,481690,751301,730REGION_total

470,694.5326,273144,421.5North America

310,884.5213,30497,580.5Europe

210,902151,17459,728Asia

CATEGORY_total

Outdoor Line

Food Line

Page 38: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

OLAP-Dice38

Food Line Outdoor Line

Mexico 12,743.5 24,284

United States 102,561.5 232,679

Dice

Food Line Outdoor Line CATEGORY_total

Canada 29,116.5 69,310 98,426.5

Mexico 12,743.5 24,284 37,027.5

United States 102,561.5 232,679 335,240.5

Page 39: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Data Mining5-39

Process that uses statistical, mathematical, artificial intelligence, and machine-learning techniques to extract and identify useful information and subsequent knowledge from large databases

Automatic and quick data analysis Data mining includes tasks/activities known as:

Knowledge extraction Data archaeology Data exploration Data pattern processing Data dredging Information harvesting

Page 40: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

How Data Mining Works5-40

Three types of methods are used to identify patterns in data Simple models (SOL-based query, OLAP, human judgment) Intermediate models (regression, decision trees, clustering) Complex models (neural networks, other rule induction)

Data mining application classes Classification Clustering Association Sequencing Regression Forecasting Others

Page 41: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Hypothesis Vs. Discovery Driven Data Mining Hypothesis driven data mining begins

with a proposition by the user, who then seeks to validate the truthfulness of the proposition. For example, a marketing manager may begin with the proposition, "Are DVD players sales related to sales of television sets?"

Discovery- driven data mining finds patterns, associations, and relationships among the data. It can uncover facts that were previously unknown

5-41

Page 42: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Tools and Techniques5-42

Data mining tools and techniques Statistical methods (association , regression

and cluster ) Decision trees (classification , clustering ) Case based reasoning(pattern detection ) Neural computing (pattern detection ) Intelligent agents Genetic algorithms

Page 43: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Text Mining

It is the application of data mining to nonstructured or less structured text files

It helps the organization to: Find the "hidden" content of documents, including additional useful

relationships. Relate documents across previous unnoticed divisions; for

example, discover that customers in two different product divisions have the same characteristics.

Group documents by common themes; for example, all the customers of an insurance firm who have similar complaints and cancel their policies

5-43

Page 44: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Multidimensionality5-44

It is an efficient way to organize data in different ways for analysis and presentation.

Its major advantage is that the data will be organized according to managers need, not analysts

Three factors ate considered in multidimensionality: dimensions, measures, and time. Here are some examples: Dimensions: products, salespeople, market segments,

business units, geographic locations, distribution channels, countries, industries

Measures: money, sales volume, head count, inventory profit, actual vs. forecasted

Time: daily, weekly, monthly, quarterly, yearly.

Page 45: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Data Visualization5-45

Technologies supporting visualization and interpretation Digital imaging, GIS, GUI, tables,

multidimensions, graphs, VR, 3D, animation Identify relationships and trends

Data manipulation allows real time look at performance data

Page 46: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Multidimensionality

Multidimensionality has some limitations The multidimensional database can take up

significantly more computer storage Multidimensional products cost significantly

more Database loading consumes system

resources and time, depending on data volume and number of dimensions.

Interfaces and maintenance are more complex than in relational databases.

5-46

Page 47: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

Geographic Information System (GIS)

5-47

Computerized system for managing and manipulating data with digitized maps Geographically oriented Geographic spreadsheet for models Software allows web access to maps Used for modeling and simulations

Page 48: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

GIS (con.)5-48

Page 49: CHAPTER 5 BUSINESS INTELLIGENCE: DATA WAREHOUSING, DATA ACQUISITION, DATA MINING, BUSINESS ANALYTICS, AND VISUALIZATION 5-1

References

" 4 Regression." Regression. N.p., n.d. Web. 13 Nov. 2014.

"5 Classification." Classification. N.p., n.d. Web. 13 Nov. 2014.

"7 Clustering." Clustering. N.p., n.d. Web. 13 Nov. 2014.

"8 Association." Association. N.p., n.d. Web. 13 Nov. 2014.

5-49