chapter 11 data management: warehousing, analyzing, mining & visualization

42
Chapter 11 Chapter 11 Data Management: Data Management: Warehousing, Analyzing, Warehousing, Analyzing, Mining & Visualization Mining & Visualization

Post on 20-Dec-2015

262 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Chapter 11Chapter 11Data Management:Data Management:

Warehousing, Analyzing, Warehousing, Analyzing, Mining & VisualizationMining & Visualization

Page 2: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Difficulties of Managing Difficulties of Managing DataData

• The amount of data increases exponentially.

• Data are scattered throughout organizations and are collected by many individuals using several methods and devices.

• Only small portions of an organization’s data are relevant for any specific decision.

• An ever-increasing amount of external data needs to be considered in making organizational decisions.

• Data are frequently stored in several servers and locations in an organization.

Page 3: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Difficulties, cont’d.Difficulties, cont’d.• Raw data may be stored in different

computing systems, databases, formats, and human and computer languages.

• Legal requirements relating to data differ among countries and change frequently.

• Selecting data management tools can be a major problem because of the huge number of products available.

• Data security, quality, and integrity are critical yet are easily jeopardized.

Page 4: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Data Sources and CollectionData Sources and Collection• Internal Data: An organization’s internal data

are about people, products, services, and processes.

• Personal Data: IS users or other corporate employees may document their own expertise by creating personal data.

• External Data: There are many sources for external data, ranging from commercial databases to sensors and satellites.

• The Internet & Commercial Database Services: Some external data flow to an organization through electronic data interchange (EDI), through other company-to-company channels, or the Internet.

Page 5: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Data QualityData Quality• Data Quality (DQ)

is an extremely important issue since quality determines the data’s usefulness as well as the quality of the decisions based on the data

Page 6: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Data Quality ProblemsData Quality Problems(Strong, et al., 1997)(Strong, et al., 1997)

• Intrinsic DQ: Accuracy, objectivity, believability, and reputation.

• Accessibility DQ: Accessibility and access security.

• Contextual DQ: Relevancy, value added, timeliness, completeness, amount of data.

• Representation DQ: Interpretability, ease of understanding, concise representation, consistent representation.

Page 7: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Object-Oriented DatabasesObject-Oriented Databases• Last time we discussed hierarchical,

network, and relational databases• An object-oriented database is a part of

the object-oriented paradigm, which also includes object-oriented programming, operating systems, and modeling.

• Object-oriented databases are sometimes referred to as multimedia databases and are managed by special multimedia database management systems

Page 8: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Document ManagementDocument Management• Document Management is the

automated control of electronic documents, page images, spreadsheets, word processing documents, and complex, compound documents through their entire life cycle within an organization, from initial creation to final archiving.

• Benefits of Document Management:– Greater control over production, storage, and

distribution of documents– Greater efficiency in the reuse of information – Control of a document through a workflow

process– Reduction of product cycle times

Page 9: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Data ProcessingData Processing• Data processing in organizations can be

viewed as either transactional or analytical• Transactional

– The data in TPS are organized mainly in a hierarchical structure and are centrally processed.

– Databases and processing systems are known as operational systems.

• Analytical– Analytical processing involves analysis of

accumulated data, mainly by end-users.– Includes DSS, EIS, Web applications, and other

end-user activities.

Page 10: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Delivery SystemsDelivery Systems• A good data

delivery system should be able to support:– Easy data access by

the end-users.– A quick decision-

making process.– Accurate and

effective decision making.

– Flexible decision making.

Page 11: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Data WarehousesData Warehouses• The purpose of a data warehouse is

to establish a data repository that makes operational data accessible in a form readily acceptable for analytical processing activities (e.g. decision support, EIS)

• Data warehouses include a companion called metadata, meaning data about data.

Page 12: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Benefits of Data Benefits of Data WarehousingWarehousing

• The ability to reach data quickly, as they are located in one place.

• The ability to reach data easily, frequently by end-users themselves, using Web browsers.

Page 13: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Characteristics of Data Characteristics of Data WarehousingWarehousing

• Organization: Data are organized by detailed subjects.

• Consistency: Data in different operational databases may be encoded differently. In the warehouse they will be coded in a consistent manner.

• Time variant: The data are kept for 5 to 10 years so they can be used for trends, forecasting, and comparisons over time.

• Non-volatile: Once entered into the warehouse, data are not updated.

• Relational: The data warehouse uses a relational structure.

• Client/Server: The data warehouse uses the client/server to provide the end user an easy access to its data.

Page 14: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Data Warehouse FrameworkData Warehouse Framework

Page 15: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Data Warehouse SuitabilityData Warehouse Suitability• Data warehousing is most appropriate for

organizations in which some of the following apply:– Large amounts of data need to be accessed by end-

users.– The operational data are stored in different systems.– An information-based approach to management is in

use.– There is a large, diverse customer base.– The same data are represented differently in different

systems.– Data are stored in highly technical formats that are

difficult to decipher.– Extensive end-user computing is performed.

Page 16: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Data MartData Mart• An alternative to data warehousing

used by many smaller firms is the creation of a lower cost, scaled-down version of a data warehouse, called a data mart. A data mart refers to a small warehouse designed for a strategic business unit (SBU) or a department.

Page 17: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Data Mart TypesData Mart Types• Replicated (dependent) Data

Marts: Sometimes it is easier to work with a subset of the data warehouse. In such cases one can replicate functional subsets of the data warehouse in smaller databases.

• Stand-Alone Data Marts: A company can have one or more independent data marts without having a data warehouse.

Page 18: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Knowledge Discovery in Knowledge Discovery in Databases (KDD)Databases (KDD)

• KDD is the process of extracting useful knowledge from volumes of data. – It is the subject of extensive research.

• KDD’s objective is to identify valid, novel, potentially useful, and ultimately understandable patterns in data.

• KDD is useful because it is supported by three technologies that are now sufficiently mature: – Massive data collection– Powerful multiprocessor computers– Data mining algorithms

Page 19: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Tools and Techniques of Tools and Techniques of KDDKDD

• Ad-hoc queries allow users to request in real time information from the computer that is not available in the periodic reports.

• Online analytical processing (OLAP) refers to such end-user activities as DSS modeling using spreadsheets and graphics, which are done online.

• Ready-made Web-based Analysis. Many vendors provide ready made analytical tools, mostly in finance, marketing, and operations.

Page 20: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Data MiningData Mining• Data mining derives its name from the

similarities between searching for valuable business information in a large database, and mining a mountain for valuable ore.

• Data mining technology can generate new business opportunities by providing these capabilities:– Automated prediction of trends and behaviors. Data

mining automates the process of finding predictive information in large databases.

– Automated discovery of previously unknown patterns. Data mining tools identify previously hidden patterns in one step.

Page 21: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Applications of Data MiningApplications of Data Mining• Retailing & Sales• Banking• Manufacturing &

Production• Brokerage &

Securities trading

• Computer hardware & software

• Insurance• Police work• Government &

Defense• Airlines• Health care• Broadcasting• Marketing

Page 22: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Text MiningText Mining• Text mining is the application of

data mining to non-structured or less structured text files.

• Text mining helps organizations to do the following:– Find the 'hidden' content of documents,

including additional useful relationships.– Group documents by common themes.

Page 23: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Web MiningWeb Mining• Web Mining refers to mining tools

used to analyze a large amount of data on the Web, such as what customers are doing on the Web—that is, to analyze clickstream data.

Page 24: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Data VisualizationData Visualization• Data visualization refers to the

presentation of data by technologies such as digital images, geographical information systems, graphical user interfaces, multidimensional tables and graphs, virtual reality, three-dimensional presentations, and animation.

Page 25: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

MultidimensionalityMultidimensionality• Modern data and information may have

several dimensions. – e.g. Management may be interested in

examining sales figures in a certain city by product, by time period, by salesperson, and by store.

• It is important to provide the user with a technology that allows her to add, replace, or change dimensions quickly and easily in a table and/or graphical presentation.

• The technology of slicing, dicing, and similar manipulations of data is called Multidimensionality.

Page 26: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Multidimensionality, cont’d.Multidimensionality, cont’d.• 3 factors are considered in

multidimensionality:– Dimensions– Measures– Time

Page 27: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Examples of DimensionsExamples of Dimensions• Products• Salespeople• Market segments• Business units• Geographical locations• Distribution channels• Countries• Industries

Page 28: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Examples of MeasuresExamples of Measures• Money• Sales volume• Head count• Inventory Profit• Actual vs. forecasted results

Page 29: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Examples of TimeExamples of Time• Daily• Weekly• Monthly• Quarterly• Yearly

Page 30: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Advantages of Advantages of MultidimensionalityMultidimensionality

• Data can be presented and navigated with relative ease.

• Multidimensional databases are easier to maintain.

• *Multidimensional databases are significantly faster than relational databases as a result of the additional dimensions and the anticipation of how the data will be accessed by users.*

Page 31: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Geographical Information Geographical Information Systems (GIS)Systems (GIS)

• A geographical information system (GIS) is a computer-based system for capturing, storing, checking, integrating, manipulating, and displaying data using digitized maps.

• Every record or digital object has an identified geographical location.

Page 32: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Example of GIS in ActionExample of GIS in Action• Banks are using GIS for plotting the

following:– Branch and ATM locations– Customer demographics – Volume and traffic patterns of business

activities– Geographical area served by each branch– Market potential for banking activities– Strengths and weaknesses against the

competition– Branch performance

Page 33: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

GIS, cont’d.GIS, cont’d.• GIS Software varies in its capabilities, from

simple computerized mapping systems to enterprise wide tools for decision support data analysis.

• GIS Data are available from a wide variety of sources. Government sources (via the Internet and CD-ROM) provide some data, while vendors provide diversified commercial data as well.

• GIS & Decision Making: The graphical format of makes it easy for managers to visualize the data & make decisions.

• GIS and the Internet or intranet. Most major GIS software vendors are providing Web access, such as embedded browsers, or a Web/Internet/intranet server that hooks directly into their software.

Page 34: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Visual Interactive Modeling Visual Interactive Modeling (VIM)(VIM)

• Visual interactive modeling (VIM) uses computer graphic displays to represent the impact of different management decisions on goals such as profit or market share. – A VIM can be used both for supporting

decisions & training. – It can represent a static or a dynamic

system.

Page 35: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Visual Interactive Simulation Visual Interactive Simulation (VIS)(VIS)

• Visual interactive simulation (VIS) is one of the most developed areas in VIM. – It is a decision simulation in which the

end-user watches the progress of the simulation model in an animated form using graphics terminals.

Page 36: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Virtual Reality (VR)Virtual Reality (VR)• Virtual reality (VR) is interactive,

computer-generated, three-dimensional graphics.

• VR applications to date have been used to support decision making indirectly. – Boeing has developed a virtual aircraft mock-

up to test designs. – At Volvo, VR is used to test virtual cars in

virtual accidents. • Data visualization helps financial decision

makers by using visual, spatial & aural immersion virtual systems. – Some stock brokerages have a VR application

in which users surf over a landscape of stock futures, with color, hue, and intensity.

Page 37: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Data Mining and Warehousing Data Mining and Warehousing Implementation ExamplesImplementation Examples

• Alamo Rent-a-Car discovered that German tourists liked bigger cars. So now, when Alamo advertises its rental business in Germany, the ads include information about its larger models.

• Au Bon Pain Company discovered that they were not selling as much cream cheese as planned. When they analyzed point-of-sale data, they found that customers preferred small, one-serving packaging.

• AT&T and MCI sift through terabytes of customer phone data to fine-tune marketing campaigns and determine new discount calling plans.

Page 38: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

CASE: Data Mining Powers CASE: Data Mining Powers Wal-Mart (p. 510)Wal-Mart (p. 510)

• An interesting case study exploring how Wal-Mart uses data warehousing and data mining to get the right product on the appropriate shelf at the lowest cost

Page 39: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Web-Based Data Web-Based Data Management SystemsManagement Systems

• Business intelligence activities – from data acquisition, through warehousing, to mining – can be performed with Web tools or are interrelated with Web technologies and e-Commerce.

• e-Commerce software vendors are providing Web tools that connect the data warehouse with EC ordering and cataloging systems.– e.g. Tradelink, a product of Hitachi

• Data warehousing and decision support vendors are connecting their products with Web technologies and EC. – e.g. Comshare’s DecisionWeb, Web Intelligence from

Business Objects, and Cognos’s DataMerchant.

Page 40: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Managerial IssuesManagerial Issues• Cost–benefit issues & justification. A cost–

benefit analysis must be undertaken before any commitment to new technologies.

• Where to store data physically. Should data be distributed close to their sources? Or should data be centralized for easier control.

• Legal issues. Data mining gives raise to a variety of legal issues.

• The legacy data problem. What should be done with masses of information already stored in a variety of formats, often known as the legacy data acquisition problem?

Page 41: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Managerial Issues, cont’d.Managerial Issues, cont’d.• Disaster recovery. How well can an

organization’s business processes recover after an information system disaster?

• Internal or external? Should a firm store & maintain its databases internally or externally?

• Data security and ethics. Are the company’s competitive data safe from external snooping or sabotage?

• Ethics. Should people have to pay for use of online data?

Page 42: Chapter 11 Data Management: Warehousing, Analyzing, Mining & Visualization

Managerial Issues, cont’d.Managerial Issues, cont’d.• Privacy. Collecting data in a

warehouse and conducting data mining may result in the invasion of individual privacy.

• Data purging. When is it beneficial to “clean house” and purge information systems of obsolete or non–cost-effective data?

• Data delivery. A problem regarding how to move data efficiently around an enterprise also exists.