ponniam1.ppt

51
06/20/22 Girish Tere, Lecturer (CS ), TCSC 1 M. Sc. (CS/IT) Part I Paper IV Data Warehousing and Mining Text Books: Paulraj Ponnian, “Data Warehousing Fundamentals”, John Wiley. W.H. Inmon, “Building the Data Warehouses”, Wiley Dreamtech R. Kimpall, “The Data Warehouse Toolkit”, John Wiley Ralph Kimball, “The Data Warehouse Lifecycle toolkit”, John Wiley

Upload: harshali-y-patil

Post on 29-Dec-2015

37 views

Category:

Documents


0 download

DESCRIPTION

Ponniam1.ppt

TRANSCRIPT

Page 1: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

1

M. Sc. (CS/IT) Part IPaper IV

Data Warehousing and Mining

Text Books: Paulraj Ponnian, “Data Warehousing

Fundamentals”, John Wiley. W.H. Inmon, “Building the Data Warehouses”, Wiley

Dreamtech R. Kimpall, “The Data Warehouse Toolkit”, John

Wiley Ralph Kimball, “The Data Warehouse Lifecycle

toolkit”, John Wiley

Page 2: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

2

The need for DW Understand the desperate need for

strategic information Recognize the information crises at every

enterprise Distinguish between operational and

informational systems Past attempts to provide strategic

information The solution – Data Warehousing

Page 3: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

3

Introduction What is your role in IT? Your IT experience Applications to run business What they do? What they provide? What executives requires? Where is the strategic information

required?

Page 4: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

4

Organization’s use of DW Retail

Customer Loyalty Market Planning

Financial Risk Management Fraud Detection

Airlines Root Profitability Yield Managemnt

Manufacturing Cost Reduction Logistics Management

Utilities Asset Management Resource

Management Government

Manpower Planning Cost Control

Page 5: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

5

Understand the desperate need for strategic information Who needs strategic information in

an Enterprise? What is strategic information? Examples of Business Objectives

Retain the present customer base Increase the customer base by 15%

over the next 5 years Gain market share by 10% in next 3

years

Page 6: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

6

Examples of Business Objectives (cont…)

Improve product quality levels in the top five product groups

Enhance customer service level in shipments

Bring three new products to market in 2 years

Increase sales by 15% in the North East Division

Page 7: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

7

Strategic Information (SI) Is it for running the day-to-day

operation of the business? What is SI? Characteristics of SI

Page 8: Ponniam1.ppt

Characteristics of SI Integrated Must have a single,

enterprise-wide view

Data Integrity Information must be accurate and must conform to business rules

Accessible Easily accessible with intuitive access paths, and responsive for analysis

Credible Every business factor must have unique value

Timely Information must be available within the stipulated time period

Page 9: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

9

The Information Crisis How much data is stored and available? Where is all this data? On which platforms? On one PC or across the network? Facts are Organization have lots of data IT resources and systems are not

affective to use this data as SI

Page 10: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

10

Real Problem Most companies are faced with information

crisis not because of lack of sufficient data, but because the available data is not readily usable for strategic decision making.

Why is this so?We need information integrated from all systems.

Operational data is event drivenOperational data is not directly suitable for

review from different viewpoints

Page 11: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

11

Technology Trends Name of Computer Department in

Company “DP”, “MIS”, “IS”, “IT” Phenomenon growth of IT in areas

like Computing Technology Human/Machine Interface Processing Options

What technology SI needs?

Page 12: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

12

Technology Trends (cont…) The user will ask a question and

get the results… This interactive process continues Why making provision of SI is

feasible now?

Page 13: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

13

Opportunities and Risks What are the opportunities

available to companies resulting from the possible use of SI?

What are threats and risks resulting from lack of SI available in companies?

Page 14: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

14

Some Opportunities … SI required for Reliance

Telecommunication industry SI required for ICICI Bank SI required for Mediclaim companies SI required for Apna Bazar A Community based pharmacy

company

Page 15: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

15

Some Risks … A car rental company (fleet

management) A multinational company - Supplier

of systems and components to automobile industry (Inconsistent data)

Page 16: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

16

Failures of past DSS Example – A Chennai Branch is not … … You have to gather the data from

multiple applications and start from scratch.

In order to understand the reasons for the failures of IT to provide SI in the past, we need to consider how IT was attempting to do this all these years.

Page 17: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

17

Past DSSs Ad- Hoc reports Special Extract Programs Small applications Information Centers DSS EIS (only programmed screens and

reports were available)

Page 18: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

18

Inability to provide information Figure 1.4 IT receives too many ad hoc requests,

resulting in a large overload. Requests keep changing Users ask for more and more reports Users have to depend on IT to provide the

information You need very flexible and conductive

environment for providing info for making strategic decisions. IT has been unable to provide such an environment.

Page 19: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

19

Operational vs DSS What is the basic reason for the

failure of all the previous attempts by IT to provide SI?

Do we need different types of systems?

Page 20: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

20

Making the wheels ofBusiness Turn OLTP Systems Used to run the day-to-day core

business of company

Page 21: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

21

Get the data inMaking the wheels of business turn

Take an order Process a claim Make a shipment Generate an invoice Receive cash Reserve an Airline ticket

Page 22: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

22

Get the information outWatching the wheels of business turn

Show me the top-selling products Show me the problem regions Tell me why (drill down) Let me see other data (drill across) Show me highest margins Alert me when a district sells below

target

Page 23: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

23

We need to design and build informational systems That serve different purposes Whose scopes are different Whose data content is different Where the data usage patterns are

different Where the data access types are

different

Page 24: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

24

M. Sc. (CS/IT) Part IPaper IV

Data Warehousing and Mining Text Books: 1. Paulraj Ponnian, “Data Warehousing Fundamentals”, John Wiley. 2. W.H. Inmon, “Building the Data Warehouses”, Wiley Dreamtech 3. R. Kimpall, “The Data Warehouse Toolkit”, John Wiley 4. Ralph Kimball, “The Data Warehouse Lifecycle toolkit”, John Wiley

Page 25: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

25

The need for DW Understand the desperate need for

strategic information Recognize the information crises at every

enterprise Distinguish between operational and

informational systems Past attempts to provide strategic

information The solution – Data Warehousing

Page 26: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

26

Introduction What is your role in IT? Your IT experience Applications to run business What they do? What they provide? What executives requires? Where is the strategic information

required?

Page 27: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

27

Organization’s use of DW Retail

Customer Loyalty Market Planning

Financial Risk Management Fraud Detection

Airlines Root Profitability Yield Managemnt

Manufacturing Cost Reduction Logistics Management

Utilities Asset Management Resource

Management Government

Manpower Planning Cost Control

Page 28: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

28

Understand the desperate need for strategic information Who needs strategic information in

an Enterprise? What is strategic information? Examples of Business Objectives

Retain the present customer base Increase the customer base by 15%

over the next 5 years Gain market share by 10% in next 3

years

Page 29: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

29

Examples of Business Objectives (cont…)

Improve product quality levels in the top five product groups

Enhance customer service level in shipments

Bring three new products to market in 2 years

Increase sales by 15% in the North East Division

Page 30: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

30

Strategic Information (SI) Is it for running the day-to-day

operation of the business? What is SI? Characteristics of SI

Page 31: Ponniam1.ppt

Characteristics of SI

Integrated Must have a single, enterprise-wide view

Data Integrity Information must be accurate and must conform to business rules

Accessible Easily accessible with intuitive access paths, and responsive for analysis

Credible Every business factor must have unique value

Timely Information must be available within the stipulated time period

Page 32: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

32

The Information Crisis How much data is stored and available? Where is all this data? On which platforms? On one PC or across the network? Facts are Organization have lots of data IT resources and systems are not

affective to use this data as SI

Page 33: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

33

Real Problem Most companies are faced with information

crisis not because of lack of sufficient data, but because the available data is not readily usable for strategic decision making.

Why is this so?We need information integrated from all systems.

Operational data is event drivenOperational data is not directly suitable for

review from different viewpoints

Page 34: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

34

Technology Trends Name of Computer Department in

Company “DP”, “MIS”, “IS”, “IT” Phenomenon growth of IT in areas

like Computing Technology Human/Machine Interface Processing Options

What technology SI needs?

Page 35: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

35

Technology Trends (cont…) The user will ask a question and

get the results… This interactive process continues Why making provision of SI is

feasible now?

Page 36: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

36

Operational and Informational Systems

Data Content Current values Archived, derived, summarized

Data Structure Optimized for transactions

Optimized for complex queries

Access Frequency High Medium to low

Access Type Read, update, delete Read

Usage Predictable, Repetitive

Ad hoc, random, heuristic

Response Time msecs Many seconds

Users Large numbers Relatively small numbers

Page 37: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

37

DW – The correct solution We need different types of DSS to

provide SI Information required for strategic

decision making is not available in operational systems

New environment is required for analysis, deciding trends and monitoring performance

Page 38: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

38

Features of new environment : Database designed for analytical tasks Data from multiple applications Easy to use and helping to long interactive

sessions by users Read-intensive data usage Direct interaction with the system by the users

without help from IT staff Content updated periodically and stable Content to include current and historical data Ability for users to run queries and get results

online Ability for users to make reports

Page 39: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

39

Processing requirements in the new environment (analytical processing requirements)

Running of simple queries and reports against current and historical data

Ability to perform “what if” analysis Ability to query, analyze and again

make query – continue this process as many as times required

Realize historical trends, mistakes and apply/correct them for future results

Page 40: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

40

BI at DW The needed environment is DW It is kept separate from the system

environment supporting the day-to-day operations

DW contains BI.

Page 41: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

41

Basic business process

Data transformation

DataWarehouse

Key measurements, business dimensions

OperationalSystems

Extraction,Cleansing,

aggregation

Page 42: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

42

E.g. of BI at DW DW containing units of sales stored

along business dimensions Important : Data staging area

Page 43: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

43

Definition of DW - DW is an informational environment that

Provides an integrated and total view of the enterprise

Makes the enterprise’s current and historical information easily available for decision making

Makes decision-support transactions possible without burdening operational systems

Renders consistently organization’s information Presents a flexible and interactive source of

strategic information

Page 44: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

44

DW concept Is not to generate fresh data Is to make use of large existing

data and to transform it into forms suitable for providing SI

Take all the data you already have in the organization, clean and transform it, and then use it to provide SI

Page 45: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

45

DW – An Environment,Not a Product It is a user-centric and user-driven

environment An ideal environment for data analysis and

decision support Constantly changing, flexible and

interactive Useful for the ask-answer-ask-again pattern Provides the ability to discover answers to

complex, unpredictable questions

Page 46: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

46

The basic concept of DW is: Take all the data from the operational

systems Where necessary, include relevant data

from outside, such as industry benchmark indicators

Integrate all the data from the various sources

Remove inconsistencies and transform the data

Store the data in formats suitable for easy access for decision making

Page 47: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

47

DW involves following functions Data extraction Loading the data Transforming the data Storing the data Providing UI

Page 48: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

48

Technologies used in DW Data Quality

Data Modeling Data Acquisition Data Management Metadata Management

Administration Analysis Applications Development Tools Storage Management

Page 49: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

49

Match the columns1. information crisis2. SI3. operational systems4. information center5. DW6. order processing7. EIS8. data staging area9. extract programs10. IT

A. OLTP applicationB. Produce ad hoc reportsC. explosive growthD. despite lots of dataE. data cleaned and

transformedF. users go to get

informationG. used for decision makingH. environment, not productI. for day-to-day operationsJ. Simple, easy to use

Page 50: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

50

Class Test1. What do you mean by SI? For a commercial bank, name five

types of strategic objectives.2. Do you agree that a typical retail store collects huge

volumes of data through its operational systems? Name three types of transaction data likely to be collected by a retail store in large volumes during its daily operations.

3. Why were all the past attempts by IT to provide SI failures? List three concrete reasons and explain.

4. Differentiate between operational systems and informational systems.

5. List characteristics of the computing environment needed to provide SI.

6. What types of processing take place in a DW?7. A DW is an environment, not a product. Discuss.

Page 51: Ponniam1.ppt

04/19/23 Girish Tere, Lecturer (CS), TCSC

51

Class Test (cont…)8. You are the IT Director of a nationwide insurance company.

Write a memo to the VP explaining the types of opportunities that can be realized with What do you mean by SI? For a commercial bank, name five types of strategic objectives.

9. For an airlines company, how can SI increase the number of frequent flyers? Discuss giving specific details.