data component february 2013decision support systems course.. dr. aref rashad1 decision support...
Post on 04-Jan-2016
227 Views
Preview:
TRANSCRIPT
Data Component
February 2013Decision Support Systems Course .. Dr. Aref Rashad
1
Decision Support System Course
Dr. Aref Rashad
Part:3
Values Matrix
help designers of DSS to know what information to include
Characteristics of Useful Information
• Timeliness• Sufficiency• Level of Detail and Aggregation• Redundancy• Understandability• Freedom from Bias• Reliability• Decision Relevance• Cost Efficiency• Comparability • Quantifiability• Appropriateness of Format
Timeliness of DataTimeliness addresses whether the information is available to the decision maker soon enough for it to be meaningful
Decision Support Systems Course .. Dr. Aref Rashad
5February 2013
whether the data are adequate to support the decision under consideration.
Sufficiency
Level of Detail
The aggregation level of the data is also an important factor for determining the usefulness of information in a DSS
Understandability
The key is to simplify the representation in the database without losing the meaning of the data.
Decision Support Systems Course .. Dr. Aref Rashad
6February 2013
Freedom from BiasIt is not appropriate for the designer to bias the analyses if it can be avoided. Bias can be caused by a wide variety of problems in the data, such as non representativeness with regard to time horizon, variables, comparability, or sampling procedures
Decision RelevancePerhaps the most obvious issue to consider when building a database is the relevance of the information to the choices under consideration
ComparabilityWhen deciding whether data are valuable, we need to assess whether they can be compared to other relevant data. Comparable means that, in important ways, measurement conditions have been held constant
Decision Support Systems Course .. Dr. Aref Rashad
7February 2013
ReliabilityDecision makers will assume that the data are correct if they are included in the database; designers therefore need to ensure that they are accurate. They should verify the input of data and the integrity of the database
RedundancyIn a perfect world, the less information is repeated, the less storage is used. This goal is laudable because it should not limit the user's ability to link data from multiple sources.
Cost EfficiencyThe benefit of improved decision-making capability must outweigh the cost of providing it or there is no advantage in the improvement. Said differently, data are only cost efficient in a database if there is positive value in the changed decision behavior associated with acting on the data in question after the cost of obtaining those data are subtracted.
Decision Support Systems Course .. Dr. Aref Rashad
8February 2013
QuantifiabilityQuantifiability does not assume that all valuable measures are quantified. Rather, it means the data are quantified at the appropriate level and that only appropriate operations can be performed on them. The level of quantification, referred to as the scale, dictates the types of meaningful mathematical operations that can be performed with the data.
Appropriateness of FormatThe final determinant of the value of information is whether it is displayed in an appropriate fashion. This refers to the medium for their presentation, the ordering in which data arepresented to the decision maker and the amount of graphics that are used.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang
5-9
Data, Information, Knowledge
• DataItems that are the most elementary descriptions of things,
events, activities, and transactionsMay be internal or external
• InformationOrganized data that has meaning and value
• KnowledgeProcessed data or information that conveys understanding
or learning applicable to a problem or activity
5-10
Data • Raw data collected manually or by instruments• Quality is critical– Quality determines usefulness
• Contextual data quality• Intrinsic data quality• Accessibility data quality• Representation data quality
– Often neglected or casually handled– Problems exposed when data is summarized
5-11
Data Sources
• Access needed to multiple sources– Often enterprise-wide – Disparate and heterogeneous databases– XML becoming language standard
• Web– Intelligent agents– Document management systems– Content management systems
• Commercial databases– Sell access to specialized databases
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang
5-12
Decision Support Systems Course .. Dr. Aref Rashad
13February 2013
DatabasesThese databases are collections of interrelated data. The goal behind the database concept is to store related data together in a format independent of the DSS
These data are linked together so that information from different physical locations on the storage medium can be joined together for transmission to the users‘ screens with a minimum amount of trouble.
Evolution of Users’ Needs and DSS Capabilities
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang
5-15
Database Management Systems
• Software program• Supplements operating system• Manages data• Queries data and generates reports• Data security• Combines with modeling language for construction of
DSS
The DBMS serves as a buffer between the needs of the applications and the physical storage of the data. It captures and extracts data from the appropriate physical location and feeds it to the application program in the manner requested.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang
5-16
Database Models• Hierarchical
– Top down, like inverted tree– Fields have only one “parent”, each “parent” can have multiple
“children”– Fast
• Network – Relationships created through linked lists, using pointers– “Children” can have multiple “parents”– Greater flexibility, substantial overhead
• Relational– Flat, two-dimensional tables with multiple access queries– Examines relations between multiple tables– Flexible, quick, and extendable with data independence
• Object oriented– Data analyzed at conceptual level– Inheritance, abstraction, encapsulation
5-17
Enterprise Data Model
Decision Support Systems Course .. Dr. Aref Rashad
19February 2013
A data warehouse is a database management system :
Exists separate from the operations systems. It is subject and time variant and integrated, as are the operational data. It is nonvolatile and hence able to support a variety of analyses consistently
The difficult steps in building the data warehouse:
What data are relevant to particular decisions, How the data should be represented and blended, How to ensure they are meaningful, consistent, and accurate
Data Warehouse
The goal of the data warehouse is to bring together data from a variety of sources and merge it in a way to make it useful for decision makers.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang
5-20
Data Warehouse
• Subject oriented• Scrubbed so that data from heterogeneous sources are standardized• Time series; no current status• Nonvolatile
– Read only• Summarized• Not normalized; may be redundant• Data from both internal and external sources is present• Metadata included
– Data about data• Business metadata• Semantic metadata
Process of Building a Data Warehouse
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang
5-22
Migrating Data
• Business rules– Stored in metadata repository– Applied to data warehouse centrally
• Data extracted from all relevant sources– Loaded through data-transformation tools or programs– Separate operation and decision support environments
• Correct problems in quality before data stored– Cleanse and organize in consistent manner
Decision Support Systems Course .. Dr. Aref Rashad
23February 2013
Data ScrubbingThe first step in building the data warehouse is to load data from the disparate data warehouses. The next step is to scrub or clean the data
• Eliminate problems of misspelling, transposition of letters, variations in spelling, and typographical errors.
• Identify records not using corporate standards for coding
• Identify poorly documented data.
• Remove duplicate records
• Remove obsolete data
Decision Support Systems Course .. Dr. Aref Rashad
24February 2013
• Remove spurious and invalid records
• Validate data (especially with external databases
• Merge third-party information.
• Enrich data with attributes .
• Identify missing or inconsistent data.
• Identify and tag similar records suspected to be duplicates.
Data Scrubbing
Decision Support Systems Course .. Dr. Aref Rashad
25February 2013
The goal of the data warehouse is to give users a nonvolatile view of the organization. This means that we need to know not only the data at any given point in time but also the relative data at any given point in time.
Currency is one of the factors that needs to be consistent in the data warehouse
Adjustment also includes provision of additional dimensions to the data that might make analyses richer.
Time is another important factor that needs to be included in the data warehouse
The goal across all of these adjustments is to provide the best picture of the organization; its customers, suppliers, and competitors; and as much other outside influences as possible so that the analyses are as reliable as possible.
Data Adjustment
Data Warehouse Tasks
5-27
Architecture• May have one or more tiers
• Determined by warehouse, data acquisition (back end), and client (front end)
• One tier, where all run on same platform, is rare• Two tier usually combines DSS engine (client)
with warehouse–More economical
• Three tier separates these functional parts
5-28
5-29
Decision Support Systems Course .. Dr. Aref Rashad
30February 2013
Online Analytical Processing (OLAP)Interactive analysis of data, allowing data to be summarized and
viewed in different ways online
Data that can be modeled as dimension attributes and measure attributes are called multidimensional data.
– Measure attributes • measure some value• can be aggregated upone.g. the attribute number of the sales relation
– Dimension attributes• define the dimensions on which measure attributes (or
aggregates thereof) are viewede.g. the attributes item_name, color, and size of the sales relation
Decision Support Systems Course .. Dr. Aref Rashad
31February 2013
Dimensions:Time, Product, Store
Attributes:Product (upc, price, …)Store ……
Hierarchies:Product Brand …Day Week QuarterStore Region Country
Decision Support Systems Course .. Dr. Aref Rashad
32February 2013
Online Analytical Processing
• Pivoting: changing the dimensions used in a cross-tabulation
• Dicing: defining dimension increments
• Slicing: creating a cross-tab for fixed values only
• Rollup: moving from finer-granularity data to a coarser granularity
• Drill down: The opposite operation - that of moving
from coarser-granularity data to finer-granularity data
Decision Support Systems Course .. Dr. Aref Rashad
33February 2013
Decision Support Systems Course .. Dr. Aref Rashad
34February 2013
OLAP Implementation
• OLAP implementations using only relational database features are called relational OLAP (ROLAP) systems
• OLAP systems used multidimensional arrays in memory to store data cubes are referred to as multidimensional OLAP (MOLAP) systems.
• Hybrid systems, which store some summaries in memory and store the base data and other summaries in a relational database, are called hybrid OLAP (HOLAP) systems.
Star Schema (in RDBMS)
Star Schema Example
Star Schema with Sample Data
Points to be noticed about ROLAP
• Defines complex, multi-dimensional data with simple model
• Reduces the number of joins a query has to process• Allows the data warehouse to evolve with relatively
low maintenance• Can contain both detailed and summarized data.• ROLAP is based on familiar, proven, and already
selected technologies.
MOLAP: Dimensional Modeling Using the Multi Dimensional Model
• MDDB: a special-purpose data model• Facts stored in multi-dimensional arrays• Dimensions used to index array• Sometimes on top of relational DB• Products– Pilot, Arbor Essbase, Gentia
Decision Support Systems Course .. Dr. Aref Rashad
40February 2013
MOLAP
Data Cube
Store
Prod
uct
Time
M T W Th F S S
Juice
Milk
Coke
Cream
Soap
Bread
NYSF
LA
10
34
56
32
12
56
56 units of bread sold in LA on M
Dimensions:Time, Product, Store
Attributes:Product (upc, price, …)Store ……
Hierarchies:Product Brand …Day Week QuarterStore Region Country
roll-up to week
roll-up to brand
roll-up to region
Can have n dimensions; Tables can be used as views on a data cube
Decision Support Systems Course .. Dr. Aref Rashad
42February 2013
Dicing & slicing
Points to be noticed about MOLAP• Pre-calculating or pre-consolidating transactional data
improves speed. BUT
Fully pre-consolidating incoming data, MDDs require an enormous amount of overhead both in processing time and in storage. An input file of 200MB can easily expand to 5GB
• MDDs are great candidates for the <50GB department data marts.
• Rolling up and Drilling down through aggregate data.
HOLAP : Hybrid OLAP
• HOLAP = Hybrid OLAP:
– Best of both worlds
– Storing detailed data in RDBMS
– Storing aggregated data in MDBMS
– User access via MOLAP tools
Multi-dimensional
access Multidimensional Viewer
RelationalViewer
ClientMDBMS Server
Multi-dimension
aldata
SQL-Read
RDBMS Server
Userdata Meta data
Deriveddata
SQL-Reach
Through
SQL-Read
Data Flow in HOLAP
When deciding which technology to go for, consider:
1) Performance: • How fast will the system appear to the end-user? • MDD server vendors believe this is a key point in their favor.
2) Data volume and scalability: • While MDD servers can handle up to 50GB of storage, RDBMS
servers can handle hundreds of gigabytes and terabytes.
What-if analysisIFA. You require write access B. Your data is under 50 GBC. Your timetable to implement is 60-90 daysD. Lowest level already aggregatedE. Data access on aggregated levelF. You’re developing a general-purpose application for inventory movement or assets management
THENConsider an MDD /MOLAP solution for your data mart
IF
A. Your data is over 100 GBB. You have a "read-only" requirementC. Historical data at the lowest level of granularityD. Detailed access, long-running queriesE. Data assigned to lowest level elements
THENConsider an RDBMS/ROLAP solution for your data mart.
IFA. OLAP on aggregated and detailed dataB. Different user groupsC. Ease of use and detailed data
THENConsider an HOLAP for your data mart
Examples• ROLAP– Telecommunication startup: call data records (CDRs) – E-Commerce Site– Credit Card Company
• MOLAP– Analysis and budgeting in a financial department– Sales analysis
• HOLAP– Sales department of a multi-national company– Banks and Financial Service Providers
Tools available• ROLAP:
– ORACLE 8i– ORACLE Reports; ORACLE Discoverer– ORACLE Warehouse Builder– Arbors Software’s Essbase
• MOLAP:– ORACLE Express Server– ORACLE Express Clients (C/S and Web)– MicroStrategy’s DSS server– Platinum Technologies’ Plantinum InfoBeacon
• HOLAP:– ORACLE 8i– ORACLE Express Serve– ORACLE Relational Access Manager– ORACLE Express Clients (C/S and Web)
Conclusion• ROLAP: RDBMS -> star/snowflake schema
• MOLAP: MDD -> Cube structures
• ROLAP or MOLAP: Data models used play major role in performance differences
• MOLAP: for summarized and relatively lesser volumes of data (10-50GB)
• ROLAP: for detailed and larger volumes of data
• Both storage methods have strengths and weaknesses
• The choice is requirement specific, though currently data warehouses are predominantly built using RDBMSs/ROLAP.
Decision Support Systems Course .. Dr. Aref Rashad
51February 2013
Data Mining vs OLAP
Data Mining• Data mining is the process of semi-automatically analyzing large
databases to find useful patterns • Prediction based on past history
– Predict if a credit card applicant poses a good credit risk, based on some attributes (income, job type, age, ..) and past history
– Predict if a pattern of phone calling card usage is likely to be fraudulent
• Some examples of prediction mechanisms:– Classification
• Given a new item whose class is unknown, predict to which class it belongs
– Regression formulae• Given a set of mappings for an unknown function, predict
the function result for a new parameter value
– Associations• Find books that are often bought by “similar” customers. If a
new such customer buys one such book, suggest the others too.
• Associations may be used as a first step in detecting causatione.g. association between exposure to chemical X and cancer,
– Clusters• e.g. typhoid cases were clustered in an area surrounding a
contaminated well• Detection of clusters remains important in detecting epidemics
Data Mining
Other Types of Mining
• Text mining: application of data mining to textual documents– cluster Web pages to find related pages– cluster pages a user has visited to organize their visit history– classify Web pages automatically into a Web directory
• Data visualization systems help users examine large volumes of data and detect patterns visually– Can visually encode large amounts of information on a single
screen– Humans are very good a detecting visual patterns
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang
5-55
Data mining application classes of problems
– Classification– Clustering– Association– Sequencing– Regression– Forecasting– Hypothesis or discovery driven– ……..
Data Mining Applications
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang
5-56
Tools and Techniques• Data mining– Statistical methods– Decision trees– Case based reasoning– Neural computing– Intelligent agents– Genetic algorithms
• Text Mining– Hidden content– Group by themes– Determine relationships
top related