mi0036 set-1& set-2
DESCRIPTION
12/12/2011TRANSCRIPT
SIKKIM MANIPAL UNIVERSITY
BUSINESS INTELLIGENCE TOOLS – 4 CREDITS
SUBJECT CODE – MI0036
ASSIGNMENT SET – 1
Q.1 Define the term business intelligence tools? Discuss the roles in Business Intelligence project?
Business Intelligence (BI) is a generic term used to describe leveraging the organizational
internal and external data, information for making the
best possible business decisions. The field of Businessintelligence is very diverse and
comprises the tools and technologies used to access and analyze various types of business
information. These tools gather and store the data and allow the user to view and analyze
the information from a wide variety of dimensions and thereby assist the decision-makers
make better business decisions. Thus the BusinessIntelligence (BI) systems and tools play
a vital role as far as organizations are concerned in making improved decisions in the
current cut throat competitive scenario. In simple terms, BusinessIntelligence is an
environment in which business users receive reliable, consistent, meaningful and timely
information. This data enables the business users conduct analyses that yield overall
understanding of how the business has been, how it is now and how it will be in the near
future. Also, the BI tools monitor the financial and operational health of the organization
through generation of various types of reports, alerts, alarms, key performance indicators
and dashboards. Business intelligence tools are a type of application software designed to
help in making better business decisions. These tools aid in the analysis and presentation
of data in a more meaningful way and so play a key role in the strategic planning
process of an organization. They illustrate business intelligence in the areas of market
research and segmentation, customer profiling, customer support, profitability, and
inventory and distribution analysis to name a few. Various types of BI systems viz.
Decision Support Systems, Executive Information Systems (EIS), Multidimensional
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Analysis software or OLAP (On-Line Analytical Processing) tools, data mining tools are
discussed further. Whatever is the type, the Business Intelligencecapabilities of the
system is to let its users slice and dice the information from their organization's numerous
databases without having to wait for their IT departments to develop complex queries and
elicit answers.
Although it is possible to build BI systems without the benefit of a data warehouse, most
of the systems are an integral part of the user-facing end of the data warehouse in
practice. In fact, we can never think of building a data warehouse without BI Systems.
That is the reason; sometimes, the words ‘data warehousing’ and ‘business intelligence’
are being used interchangeably.
Below Figure depicts how the data from one end gets transformed to information at the
other end for business information.
Roles in Business Intelligence project:
A typical BI Project consists of the following roles and the responsibilities of each of
these roles are detailed below:
Project Manager: Monitors the progress on continuum basis and is responsible for
the success of the project.
Technical Architect: Develops and implements the overall technical architecture
of the BI system, from the backend hardware/software to the client desktop
configurations.
Database Administrator (DBA): Keeps the database available for the applications
to run smoothly and also involves in planning and executing a backup/recovery
plan, as well as performance tuning.
ETL Developer: Involves himself in planning, developing, and deploying the
extraction, transformation, and loading routine for the data warehouse from the
legacysystems.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Front End Developer: Develops the front-end, whether it be client-server or over
the web.
OLAP Developer: Dexlops the OLAP cubes.
Data Modeler: Is responsible for taking the data structure that exists in the
enterprise and model it into a scheme that is suitable for OLAP analysis.
QA Group: Ensures the correctness of the data in the data warehouse.
Trainer: Works with the end users to make them familiar with how the front end
is set up so that the end users can get the most benefit out of the system.
Q.2. What do you mean by data ware house? What are the major concepts and
terminology used in the study of data ware house?
In computing, a data warehouse (DW) is a database used for reporting and analysis. The
data stored in the warehouse is uploaded from the operational systems. The data may pass
through anoperational data store for additional operations before it is used in the DW for
reporting.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
A data warehouse maintains its functions in three layers: staging, integration, and
access. Staging is used to store raw data for use by developers. The integration layer is
used to integrate data and to have a level of abstraction from users. The access layer is for
getting data out for users.
Data warehouses can be subdivided into data marts. Data marts store subsets of data from
a warehouse.
This definition of the data warehouse focuses on data storage. The main source of the
data is cleaned, transformed, catalogued and made available for use by managers and
other business professionals for data mining, online analytical processing , market
research and decision support (Marakas & O'Brien 2009). However, the means to retrieve
and analyze data, to extract, transform and load data, and to manage the data
dictionary are also considered essential components of a data warehousing system. Many
references to data warehousing use this broader context. Thus, an expanded definition for
data warehousing includes business intelligence tools, tools to extract, transform and
load data into the repository, and tools to manage and retrieve metadata.
A common way of introducing data warehousing is to refer to the characteristics of a data
warehouse as set forth by William Inmon:
Subject Oriented
Integrated
Nonvolatile
Time Variant
Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more about
your company's sales data, you can build a warehouse that concentrates on sales. Using
this warehouse, you can answer questions like "Who was our best customer for this item
last year?" This ability to define a data warehouse by subject matter, sales in this case,
makes the data warehouse subject oriented.
Integrated
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Integration is closely related to subject orientation. Data warehouses must put data from
disparate sources into a consistent format. They must resolve such problems as naming
conflicts and inconsistencies among units of measure. When they achieve this, they are
said to be integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change. This is
logical because the purpose of a warehouse is to enable you to analyze what has occurred.
Time Variant
In order to discover trends in business, analysts need large amounts of data. This is very
much in contrast to online transaction processing (OLTP) systems, where performance
requirements demand that historical data be moved to an archive. A data warehouse's
focus on change over time is what is meant by the term time variant.
DATA WAREHOUSE TERMINOLOGY
Bruce W. Johnson, M.S.
Ad Hoc Query:
A database search that is designed to extract specific information from a database. It is
ad hoc if it is designed at the point of execution as opposed to being a “canned” report.
Most ad hoc query software uses the structured query language (SQL).
Aggregation:
The process of summarizing or combining data.
Catalog:
A component of a data dictionary that describes and organizes the various aspects of a
database such as its folders, dimensions, measures, prompts, functions, queries and other
database objects. It is used to create queries, reports, analyses and cubes.
Cross Tab:
A type of multi-dimensional report that displays values or measures in cells created by
the intersection of two or more dimensions in a table format.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Dashboard:
A data visualization method and workflow management tool that brings together useful
information on a series of screens and/or web pages. Some of the information that may
be contained on a dashboard includes reports, web links, calendar, news, tasks, e-mail,
etc. When incorporated into a DSS or EIS key performance indicators may be
represented as graphics that are linked to various hyperlinks, graphs, tables and other
reports. The dashboard draws its information from multiple sources applications, office
products, databases, Internet, etc.
Cube:
A multi-dimensional matrix of data that has multiple dimensions (independent variables)
and measures (dependent variables) that are created by an Online Analytical Processing
System (OLAP). Each dimension may be organized into a hierarchy with multiple levels.
The intersection of two or more dimensional categories is referred to as a cell.
Data-based Knowledge:
Factual information used in the decision making process that is derived from data marts
or warehouses using business intelligence tools. Data warehousing organizes information
into a format so that it represents an organizations knowledge with respect to a particular
subject area, e.g. finance or clinical outcomes.
Data Cleansing:
The process of cleaning or removing errors, redundancies and inconsistencies in the data
that is being imported into a data mart or data warehouse. It is part of the quality
assurance process.
Data Mart:
A database that is similar in structure to a data warehouse, but is typically smaller and is
focused on a more limited area. Multiple, integrated data marts are sometimes referred to
as an Integrated Data Warehouse. Data marts may be used in place of a larger data
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
warehouse or in conjunction with it. They are typically less expensive to develop and
faster to deploy and are therefore becoming more popular with smaller organizations.
Data Migration:
The transfer of data from one platform to another. This may include conversion from one
language, file structure and/or operating environment to another.
Data Mining:
The process of researching data marts and data warehouses to detect specific patterns in
the data sets. Data mining may be performed on databases and multi-dimensional data
cubes with ad hoc query tools and OLAP software. The queries and reports are typically
designed to answer specific questions to uncover trends or hidden relationships in the
data.
Data Scrubbing:
See Data Cleansing
Data Transformation:
The modification of transaction data extracted from one or more data sources before it is
loaded into the data mart or warehouse. The modifications may include data cleansing,
translation of data into a common format so that is can be aggregated and compared,
summarizing the data, etc.
Data Warehouse:
An integrated, non-volatile database of historical information that is designed around
specific content areas and is used to answer questions regarding an organizations
operations and environment.
Database Management System:
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
The software that is used to create data warehouses and data marts. For the purposes of
data warehousing, they typically include relational database management systems and
multi-dimensional database management systems. Both types of database management
systems create the database structures, store and retrieve the data and include various
administrative functions.
Decision Support System (DSS):
A set of queries, reports, rule-based analyses, tables and charts that are designed to aid
management with their decision-making responsibilities. These functions are typically
“wrapped around” a data mart or data warehouse. The DSS tends to employ more
detailed level data than an EIS.
Dimension:
A variable, perspective or general category of information that is used to organize and
analyze information in a multi-dimensional data cube.
Drill Down:
The ability of a data-mining tool to move down into increasing levels of detail in a data
mart, data warehouse or multi-dimensional data cube.
Drill Up:
The ability of a data-mining tool to move back up into higher levels of data in a data
mart, data warehouse or multi-dimensional data cube.
Executive Information Management System (EIS):
A type of decision support system designed for executive management that reports
summary level information as opposed to greater detail derived in a decision support
system.
Extraction, Transformation and Loading (ETL) Tool:
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Software that is used to extract data from a data source like a operational system or data
warehouse, modify the data and then load it into a data mart, data warehouse or multi-
dimensional data cube.
Granularity:
The level of detail in a data store or report.
Hierarchy:
The organization of data, e.g. a dimension, into a outline or logical tree structure. The
strata of a hierarchy are referred to as levels. The individual elements within a level are
referred to as categories. The next lower level in a hierarchy is the child; the next higher
level containing the children is their parent.
Legacy System:
Older systems developed on platforms that tend to be one or more generations behind the
current state-of-the-art applications. Data marts and warehouses were developed in large
part due to the difficulty in extracting data from these system and the inconsistencies and
incompatibilities among them.
Level:
A tier or strata in a dimensional hierarchy. Each lower level represents an increasing
degree of detail. Levels in a location dimension might include country, region, state,
county, city, zip code, etc.
Measure:
A quantifiable variable or value stored in a multi-dimensional OLAP cube. It is a value
in the cell at the intersection of two or more dimensions.
Member:
One of the data points for a level of a dimension.
Meta Data:
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Information in a data mart or warehouse that describes the tables, fields, data types,
attributes and other objects in the data warehouse and how they map to their data sources.
Meta data is contained in database catalogs and data dictionaries.
Multi-Dimensional Online Processing (MOLAP):
Software that creates and analyzes multi-dimensional cubes to store its information.
Non-Volatile Data:
Data that is static or that does not change. In transaction processing systems the data is
updated on a continual regular basis. In a data warehouse the database is added to or
appended, but the existing data seldom changes.
Normalization:
The process of eliminating duplicate information in a database by creating a separate
table that stores the redundant information. For example, it would be highly inefficient to
re-enter the address of an insurance company with every claim. Instead, the database
uses a key field to link the claims table to the address table. Operational or transaction
processing systems are typically “normalized”. On the other hand, some data warehouses
find it advantageous to de-normalize the data allowing for some degree of redundancy.
Online Analytical Processing (OLAP):
The process employed by multi-dimensional analysis software to analyze the data
resident in data cubes. There are different types of OLAP systems named for the type of
database employed to create them and the data structures produced.
Open Database Connectivity (ODBC):
A database standard developed by Microsoft and the SQL Access Group Consortium that
defines the “rules” for accessing or retrieving data from a database.
Relational Database Management System:
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Database management systems that have the ability to link tables of data through a
common or key field. Most databases today use relational technologies and support a
standard programming language called Structured Query Language (SQL).
Relational Online Analytical Processing (ROLAP):
OLAP software that employs a relational strategy to organize and store the data in its
database.
Replication:
The process of copying data from one database table to another.
Scalable:
The attribute or capability of a database to significantly expand the number of records
that it can manage. It also refers to hardware systems and their ability to be expanded or
upgraded to increase their processing speed and handle larger volumes of data.
Structured Query Language (SQL):
A standard programming language used by contemporary relational database
management systems.
Synchronization:
The process by which the data in two or more separate database are synchronized so that
the records contain the same information. If the fields and records are updated in one
database the same fields and records are updated in the other.
About the Author:
Bruce W. Johnson, MS, PMP is the CEO of Johnson Consulting Services, Inc. He is an
information management consultant who specializes in working with social service,
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
healthcare and government agencies. He can be reached at (800) 988-0934 or by e-mail
Q.3. what are the data modeling techniques used in data warehousing environment?
Two data modeling techniques that are relevant in a data warehousing environment are
ER modeling and dimensional modeling.
ER modeling produces a data model of the specific area of interest, using two basic
concepts: entities and the relationships between those entities. Detailed
ER models also contain attributes, which can be properties of either the entities or the
relationships. The ER model is an abstraction tool because it can be used to understand
and simplify the ambiguous data relationships in the business world and complex systems
environments.
Dimensional modeling uses three basic concepts: measures, facts, and dimensions.
Dimensional modeling is powerful in representing the requirements of the business user
in the context of database tables.
Both ER and dimensional modeling can be used to create an abstract model of a specific
subject. However, each has its own limited set of modeling concepts and associated
notation conventions. Consequently, the techniques look different, and they are indeed
different in terms of semantic representation. The following sections describe the
modeling concepts and notation conventions for both ER modeling and dimensional
modeling that will be used throughout this book.
ER Modeling
A prerequisite for reading this book is a basic knowledge of ER modeling.
Therefore we do not focus on that traditional technique. We simply define the necessary
terms to form some consensus and present notation conventions used in the rest of this
book.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Figure 12. A Sample ER Model. Entity, relationship, and attributes in an ER diagram.
Basic Concepts
An ER model is represented by an ER diagram, which uses three basic graphic symbols
to conceptualize the data: entity, relationship, and attribute.
6.3.1.1 Entity
An entity is defined to be a person, place, thing, or event of interest to the business or the
organization. An entity represents a class of objects, which are things in the real world
that can be observed and classified by their properties and characteristics. In some books
on IE, the term entity type is used to represent classes of objects and entity for an instance
of an entity type. In this book, we will use them interchangeably.
6.3.1.2 Relationship
A relationship is represented with lines drawn between entities. It depicts the structural
interaction and association among the entities in a model. A relationship is designated
grammatically by a verb, such as owns, belongs, and has. The relationship between two
entities can be defined in terms of the cardinality. This is the maximum number of
instances of one entity that are related to a single instance in another table and vice versa.
The possible cardinalities are: one-to-one (1:1), one-to-many (1:M), and many-to-many
(M:M).
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
In a detailed (normalized) ER model, any M:M relationship is not shown because it is
resolved to an associative entity.
6.3.1.3 Attributes
Attributes describe the characteristics of properties of the entities. In Figure 12,
Product ID, Description, and Picture are attributes of the PRODUCT entity. For
clarification, attribute naming conventions are very important. An attribute name should
be unique in an entity and should be self-explanatory. For example, simply saying date1
or date2 is not allowed, we must clearly define each. As examples, they could be defined
as the order date and delivery date.
Dimensional Modeling
In some respects, dimensional modeling is simpler, more expressive, and easier to
understand than ER modeling. But, dimensional modeling is a relatively new concept and
not firmly defined yet in details, especially when compared to ER modeling techniques.
This section presents the terminology that we use in this book as we discuss dimensional
modeling.
Basic Concepts
Dimensional modeling is a technique for conceptualizing and visualizing data models as
a set of measures that are described by common aspects of the business. It is especially
useful for summarizing and rearranging the data and presenting views of the data to
support data analysis. Dimensional modeling focuses on numeric data, such as values,
counts, weights, balances, and occurrences.
Dimensional modeling has several basic concepts:
· Facts
· Dimensions
· Measures (variables)
6.4.1.1 Fact
A fact is a collection of related data items, consisting of measures and context data. Each
fact typically represents a business item, a business transaction, or an event that can be
used in analyzing the business or business processes.
In a data warehouse, facts are implemented in the core tables in which all of the numeric
data is stored.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
6.4.1.2 Dimension
A dimension is a collection of members or units of the same type of views. In a diagram,
a dimension is usually represented by an axis. In a dimensional model, every data point in
the fact table is associated with one and only one member from each of the multiple
dimensions. That is, dimensions determine the contextual background for the facts. Many
analytical processes are used to quantify the impact of dimensions on the facts.
Dimensions are the parameters over which we want to perform Online Analytical
Processing (OLAP).
6.4.1.3 Measure
A measure is a numeric attribute of a fact, representing the performance or behavior of
the business relative to the dimensions. The actual numbers are called as variables. For
example, measures are the sales in money, the sales volume, the quantity supplied, the
supply cost, the transaction amount, and so forth. A measure is determined by
combinations of the members of the dimensions and is located on facts.
Q.4 Discuss the categories in which data is divided before structuring it into data
ware house?
A Data Warehouse is not an individual repository product. Rather, it is an overall
strategy, or process, for building decision support systems and a knowledge-based
applications architecture and environment that supports both everyday tactical decision
making and long-term business strategizing. The Data Warehouse environment positions
a business to utilize an enterprise-wide data store to link information from diverse
sources and make the information accessible for a variety of user purposes, most notably,
strategic analysis. Business analysts must be able to use the Warehouse for such strategic
purposes as trend identification, forecasting, competitive analysis, and targeted market
research.
Data Warehouses and Data Warehouse applications are designed primarily to support
executives, senior managers, and business analysts in making complex business
decisions. Data Warehouse applications provide the business community with access to
accurate, consolidated information from various internal and external sources.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
The primary objective of Data Warehousing is to bring together information from
disparate sources and put the information into a format that is conducive to making
business decisions. This objective necessitates a set of activities that are far more
complex than just collecting data and reporting against it. Data Warehousing requires
both business and technical expertise and involves the following activities:
Accurately identifying the business information that must be contained in the
Warehouse
Identifying and prioritizing subject areas to be included in the Data Warehouse
Managing the scope of each subject area which will be implemented into the
Warehouse on an iterative basis
Developing a scaleable architecture to serve as the Warehouse’s technical and
application foundation, and identifying and selecting the
hardware/software/middleware components to implement it
Extracting, cleansing, aggregating, transforming and validating the data to ensure
accuracy and consistency
Defining the correct level of summarization to support business decision making
Establishing a refresh program that is consistent with business needs, timing and
cycles
Providing user-friendly, powerful tools at the desktop to access the data in the
Warehouse
Educating the business community about the realm of possibilities that are
available to them through Data Warehousing
Establishing a Data Warehouse Help Desk and training users to effectively utilize
the desktop tools
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Establishing processes for maintaining, enhancing, and ensuring the ongoing
success and applicability of the Warehouse
Until the advent of Data Warehouses, enterprise databases were expected to serve
multiple purposes, including online transaction processing, batch processing, reporting,
and analytical processing. In most cases, the primary focus of computing resources was
on satisfying operational needs and requirements. Information reporting and analysis
needs were secondary considerations. As the use of PCs, relational databases, 4GL
technology and end-user computing grew and changed the complexion of information
processing, more and more business users demanded that their needs for information be
addressed. Data Warehousing has evolved to meet those needs without disrupting
operational processing.
In the Data Warehouse model, operational databases are not accessed directly to perform
information processing. Rather, they act as the source of data for the Data Warehouse,
which is the information repository and point of access for information processing. There
are sound reasons for separating operational and informational databases, as described
below.
The users of informational and operational data are different. Users of
informational data are generally managers and analysts; users of operational data
tend to be clerical, operational and administrative staff.
Operational data differs from informational data in context and currency.
Informational data contains an historical perspective that is not generally used by
operational systems.
The technology used for operational processing frequently differs from the
technology required to support informational needs.
The processing characteristics for the operational environment and the
informational environment are fundamentally different.
The Data Warehouse functions as a Decision Support System (DSS) and an Executive
Information System (EIS), meaning that it supports informational and analytical needs by
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
providing integrated and transformed enterprise-wide historical data from which to do
management analysis. A variety of sophisticated tools are readily available in the
marketplace to provide user-friendly access to the information stored in the Data
Warehouse.
Data Warehouses can be defined as subject-oriented, integrated, time-variant, non-
volatile collections of data used to support analytical decision making. The data in the
Warehouse comes from the operational environment and external sources. Data
Warehouses are physically separated from operational systems, even though the
operational systems feed the Warehouse with source data.
Subject Orientation
Data Warehouses are designed around the major subject areas of the enterprise; the
operational environment is designed around applications and functions. This difference in
orientation (data vs. process) is evident in the content of the database. Data Warehouses
do not contain information that will not be used for informational or analytical
processing; operational databases contain detailed data that is needed to satisfy
processing requirements but which has no relevance to management or analysis.
Integration and Transformation
The data within the Data Warehouse is integrated. This means that there is consistency
among naming conventions, measurements of variables, encoding structures, physical
attributes, and other salient data characteristics. An example of this integration is the
treatment of codes such as gender codes. Within a single corporation, various
applications may represent gender codes in different ways: male vs. female, m vs. f, and
1 vs. 0, etc. In the Data Warehouse, gender is always represented in a consistent way,
regardless of the many ways by which it may be encoded and stored in the source data.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
As the data is moved to the Warehouse, it is transformed into a consistent representation
as required.
Time Variance
All data in Data Warehouse is accurate as of some moment in time, providing an
historical perspective. This differs from the operational environment in which data is
intended to be accurate as of the moment of access. The data in the Data Warehouse is, in
effect, a series of snapshots. Once the data is loaded into the enterprise data store and data
marts, it cannot be updated. It is refreshed on a periodic basis, as determined by the
business need. The operational data store, if included in the Warehouse architecture, may
be updated.
Non-Volatility
Data in the Warehouse is static, not dynamic. The only operations that occur in Data
Warehouse applications are the initial loading of data, access of data, and refresh of data.
For these reasons, the physical design of a Data Warehouse optimizes the access of data,
rather than focusing on the requirements of data update and delete processing.
Data Warehouse Configurations
A Data Warehouse configuration, also known as the logical architecture, includes the
following components:
One Enterprise Data Store (EDS) - a central repository which supplies atomic
(detail level) integrated information to the whole organization.
(optional) one Operational Data Store - a "snapshot" of a moment in time's
enterprise-wide data
(optional) one or more individual Data Mart(s) - summarized subset of the
enterprise's data specific to a functional area or department, geographical region,
or time period
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
One or more Metadata Store(s) or Repository(ies) - catalog(s) of reference
information about the primary data. Metadata is divided into two categories:
information for technical use, and information for business end-users.
The EDS is the cornerstone of the Data Warehouse. It can be accessed for both
immediate informational needs and for analytical processing in support of strategic
decision making, and can be used for drill-down support for the Data Marts which
contain only summarized data. It is fed by the existing subject area operational systems
and may also contain data from external sources. The EDS in turn feeds individual Data
Marts that are accessed by end-user query tools at the user's desktop. It is used to
consolidate related data from multiple sources into a single source, while the Data Marts
are used to physically distribute the consolidated data into logical categories of data, such
as business functional departments or geographical regions. The EDS is a collection of
daily "snapshots" of enterprise-wide data taken over an extended time period, and thus
retains and makes available for tracking purposes the history of changes to a given data
element over time. This creates an optimum environment for strategic analysis. However,
access to the EDS can be slow, due to the volume of data it contains, which is a good
reason for using Data Marts to filter, condense and summarize information for specific
business areas. In the absence of the Data Mart layer, users can access the EDS directly.
Metadata is "data about data," a catalog of information about the primary data that
defines access to the Warehouse. It is the key to providing users and developers with a
road map to the information in the Warehouse. Metadata comes in two different forms:
end-user and transformational. End-user metadata serves a business purpose; it translates
a cryptic name code that represents a data element into a meaningful description of the
data element so that end-users can recognize and use the data. For example, metadata
would clarify that the data element "ACCT_CD" represents "Account Code for Small
Business." Transformational metadata serves a technical purpose for development and
maintenance of the Warehouse. It maps the data element from its source system to the
Data Warehouse, identifying it by source field name, destination field code,
transformation routine, business rules for usage and derivation, format, key, size, index
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
and other relevant transformational and structural information. Each type of metadata is
kept in one or more repositories that service the Enterprise Data Store.
While an Enterprise Data Store and Metadata Store(s) are always included in a sound
Data Warehouse design, the specific number of Data Marts (if any) and the need for an
Operational Data Store are judgment calls. Potential Data Warehouse configurations
should be evaluated and a logical architecture determined according to business
requirements.
The Data Warehouse Process
The james martin + co Data Warehouse Process does not encompass the analysis and
identification of organizational value streams, strategic initiatives, and related business
goals, but it is a prescription for achieving such goals through a specific architecture. The
Process is conducted in an iterative fashion after the initial business requirements and
architectural foundations have been developed with the emphasis on populating the Data
Warehouse with "chunks" of functional subject-area information each iteration. The
Process guides the development team through identifying the business requirements,
developing the business plan and Warehouse solution to business requirements, and
implementing the configuration, technical, and application architecture for the overall
Data Warehouse. It then specifies the iterative activities for the cyclical planning, design,
construction, and deployment of each population project. The following is a description
of each stage in the Data Warehouse Process. (Note: The Data Warehouse Process also
includes conventional project management, startup, and wrap-up activities which are
detailed in the Plan, Activate, Control and End stages, not described here.)
Business Case Development
A variety of kinds of strategic analysis, including Value Stream Assessment, have likely
already been done by the customer organization at the point when it is necessary to
develop a Business Case. The Business Case Development stage launches the Data
Warehouse development in response to previously identified strategic business initiatives
and "predator" (key) value streams of the organization. The organization will likely have
identified more than one important value stream. In the long term it is possible to
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
implement Data Warehouse solutions that address multiple value streams, but it is the
predator value stream or highest priority strategic initiative that usually becomes the
focus of the short-term strategy and first run population projects resulting in a Data
Warehouse.
At the conclusion of the relevant business reengineering, strategic visioning, and/or value
stream assessment activities conducted by the organization, a Business Case can be built
to justify the use of the Data Warehouse architecture and implementation approach to
solve key business issues directed at the most important goals. The Business Case defines
the outlying activities, costs, benefits, and critical success factors for a multi-generation
implementation plan that results in a Data Warehouse framework of an information
storage/access system. The Warehouse is an iterative designed/developed/refined solution
to the tactical and strategic business requirements. The Business Case addresses both the
short-term and long-term Warehouse strategies (how multiple data stores will work
together to fulfill primary and secondary business goals) and identifies both immediate
and extended costs so that the organization is better able to plan its short and long-term
budget appropriation.
Business Question Assessment
Once a Business Case has been developed, the short-term strategy for implementing the
Data Warehouse is mapped out by means of the Business Question Assessment (BQA)
stage. The purpose of BQA is to:
Establish the scope of the Warehouse and its intended use
Define and prioritize the business requirements and the subsequent information
(data) needs the Warehouse will address
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Identify the business directions and objectives that may influence the required
data and application architectures
Determine which business subject areas provide the most needed information;
prioritize and sequence implementation projects accordingly
Drive out the logical data model that will direct the physical implementation
model
Measure the quality, availability, and related costs of needed source data at a high
level
Define the iterative population projects based on business needs and data
validation
The prioritized predator value stream or most important strategic initiative is analyzed to
determine the specific business questions that need to be answered through a Warehouse
implementation. Each business question is assessed to determine its overall importance to
the organization, and a high-level analysis of the data needed to provide the answers is
undertaken. The data is assessed for quality, availability, and cost associated with
bringing it into the Data Warehouse. The business questions are then revisited and
prioritized based upon their relative importance and the cost and feasibility of acquiring
the associated data. The prioritized list of business questions is used to determine the
scope of the first and subsequent iterations of the Data Warehouse, in the form of
population projects. Iteration scoping is dependent on source data acquisition issues and
is guided by determining how many business questions can be answered in a three to six
month implementation time frame. A "business question" is a question deemed by the
business to provide useful information in determining strategic direction. A business
question can be answered through objective analysis of the data that is available.
Architecture Review and Design
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
The Architecture is the logical and physical foundation on which the Data Warehouse
will be built. The Architecture Review and Design stage, as the name implies, is both a
requirements analysis and a gap analysis activity. It is important to assess what pieces of
the architecture already exist in the organization (and in what form) and to assess what
pieces are missing which are needed to build the complete Data Warehouse architecture.
During the Architecture Review and Design stage, the logical Data Warehouse
architecture is developed. The logical architecture is a configuration map of the necessary
data stores that make up the Warehouse; it includes a central Enterprise Data Store, an
optional Operational Data Store, one or more (optional) individual business area Data
Marts, and one or more Metadata stores. In the metadata store(s) are two different kinds
of metadata that catalog reference information about the primary data.
Once the logical configuration is defined, the Data, Application, Technical and Support
Architectures are designed to physically implement it. Requirements of these four
architectures are carefully analyzed so that the Data Warehouse can be optimized to serve
the users. Gap analysis is conducted to determine which components of each architecture
already exist in the organization and can be reused, and which components must be
developed (or purchased) and configured for the Data Warehouse.
The Data Architecture organizes the sources and stores of business information and
defines the quality and management standards for data and metadata.
The Application Architecture is the software framework that guides the overall
implementation of business functionality within the Warehouse environment; it controls
the movement of data from source to user, including the functions of data extraction, data
cleansing, data transformation, data loading, data refresh, and data access (reporting,
querying).
The Technical Architecture provides the underlying computing infrastructure that enables
the data and application architectures. It includes platform/server, network,
communications and connectivity hardware/software/middleware, DBMS, client/server
2-tier vs.3-tier approach, and end-user workstation hardware/software. Technical
architecture design must address the requirements of scalability, capacity and volume
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
handling (including sizing and partitioning of tables), performance, availability, stability,
chargeback, and security.
The Support Architecture includes the software components (e.g., tools and structures for
backup/recovery, disaster recovery, performance monitoring, reliability/stability
compliance reporting, data archiving, and version control/configuration management) and
organizational functions necessary to effectively manage the technology investment.
Architecture Review and Design applies to the long-term strategy for development and
refinement of the overall Data Warehouse, and is not conducted merely for a single
iteration. This stage develops the blueprint of an encompassing data and technical
structure, software application configuration, and organizational support structure for the
Warehouse. It forms a foundation that drives the iterative Detail Design activities. Where
Design tells you what to do; Architecture Review and Design tells you what pieces you
need in order to do it.
The Architecture Review and Design stage can be conducted as a separate project that
runs mostly in parallel with the Business Question Assessment stage. For the technical,
data, application and support infrastructure that enables and supports the storage and
access of information is generally independent from the business requirements of which
data is needed to drive the Warehouse. However, the data architecture is dependent on
receiving input from certain BQA activities (data source system identification and data
modeling), so the BQA stage must conclude before the Architecture stage can conclude.
The Architecture will be developed based on the organization's long-term Data
Warehouse strategy, so that future iterations of the Warehouse will have been provided
for and will fit within the overall architecture.
Tool Selection
The purpose of this stage is to identify the candidate tools for developing and
implementing the Data Warehouse data and application architectures, and for performing
technical and support architecture functions where appropriate. Select the candidate tools
that best meet the business and technical requirements as defined by the Data Warehouse
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
architecture, and recommend the selections to the customer organization. Procure the
tools upon approval from the organization.
It is important to note that the process of selecting tools is often dependent on the existing
technical infrastructure of the organization. Many organizations feel strongly for various
reasons about using tools for the Data Warehouse applications that they already have in
their "arsenal" and are reluctant to purchase new application packages. It is recommended
that a thorough evaluation of existing tools and the feasibility of their reuse be done in the
context of all tool evaluation activities. In some cases, existing tools can be form-fitted to
the Data Warehouse; in other cases, the customer organization may need to be convinced
that new tools would better serve their needs.
It may even be feasible that this series of activities is skipped altogether, if the
organization is insistent that particular tools be used (no room for negotiation), or if tools
have already been assessed and selected in anticipation of the Data Warehouse project.
Tools may be categorized according to the following data, technical, application, or
support functions:
Source Data Extraction and Transformation
Data Cleansing
Data Load
Data Refresh
Data Access
Security Enforcement
Version Control/Configuration Management
Backup and Recovery
Disaster Recovery
Performance Monitoring
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Database Management
Platform
Data Modeling
Metadata Management
Iteration Project Planning
The Data Warehouse is implemented (populated) one subject area at a time, driven by
specific business questions to be answered by each implementation cycle. The first and
subsequent implementation cycles of the Data Warehouse are determined during the
BQA stage. At this point in the Process the first (or next if not first) subject area
implementation project is planned. The business requirements discovered in BQA and, to
a lesser extent, the technical requirements of the Architecture Design stage are now
refined through user interviews and focus sessions to the subject area level. The results
are further analyzed to yield the detail needed to design and implement a single
population project, whether initial or follow-on. The Data Warehouse project team is
expanded to include the members needed to construct and deploy the Warehouse, and a
detailed work plan for the design and implementation of the iteration project is developed
and presented to the customer organization for approval.
Detail Design
In the Detail Design stage, the physical Data Warehouse model (database schema) is
developed, the metadata is defined, and the source data inventory is updated and
expanded to include all of the necessary information needed for the subject area
implementation project, and is validated with users. Finally, the detailed design of all
procedures for the implementation project is completed and documented. Procedures to
achieve the following activities are designed:
Warehouse Capacity Growth
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Data Extraction/Transformation/Cleansing
Data Load
Security
Data Refresh
Data Access
Backup and Recovery
Disaster Recovery
Data Archiving
Configuration Management
Testing
Transition to Production
User Training
Help Desk
Change Management
Implementation
Once the Planning and Design stages are complete, the project to implement the current
Data Warehouse iteration can proceed quickly. Necessary hardware, software and
middleware components are purchased and installed, the development and test
environment is established, and the configuration management processes are
implemented. Programs are developed to extract, cleanse, transform and load the source
data and to periodically refresh the existing data in the Warehouse, and the programs are
individually unit tested against a test database with sample source data. Metrics are
captured for the load process. The metadata repository is loaded with transformational
and business user metadata. Canned production reports are developed and sample ad-hoc
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
queries are run against the test database, and the validity of the output is measured. User
access to the data in the Warehouse is established. Once the programs have been
developed and unit tested and the components are in place, system functionality and user
acceptance testing is conducted for the complete integrated Data Warehouse system.
System support processes of database security, system backup and recovery, system
disaster recovery, and data archiving are implemented and tested as the system is
prepared for deployment. The final step is to conduct the Production Readiness Review
prior to transitioning the Data Warehouse system into production. During this review, the
system is evaluated for acceptance by the customer organization.
Transition to Production
The Transition to Production stage moves the Data Warehouse development project into
the production environment. The production database is created, and the
extraction/cleanse/transformation routines are run on the operations system source data.
The development team works with the Operations staff to perform the initial load of this
data to the Warehouse and execute the first refresh cycle. The Operations staff is trained,
and the Data Warehouse programs and processes are moved into the production libraries
and catalogs. Rollout presentations and tool demonstrations are given to the entire
customer community, and end-user training is scheduled and conducted. The Help Desk
is established and put into operation. A Service Level Agreement is developed and
approved by the customer organization. Finally, the new system is positioned for ongoing
maintenance through the establishment of a Change Management Board and the
implementation of change control procedures for future development cycles.
Q.5 Discuss the purpose of executive information system in an organization?
Implementing an Executive Information System (EIS)
An EIS is a tool that provides direct on-line access to relevant information about aspects
of a business that are of particular interest to the senior manager.
Introduction
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Many senior managers find that direct on-line access to organizational data is
helpful. For example, Paul Frech, president of Lockheed-Georgia, monitored employee
contributions to company-sponsored programs (United Way, blood drives) as a surrogate
measure of employee morale (Houdeshel and Watson, 1987). C. Robert Kidder, CEO of
Duracell, found that productivity problems were due to salespeople in Germany wasting
time calling on small stores and took corrective action (Main, 1989).
Information systems have long been used to gather and store information, to
produce specific reports for workers, and to produce aggregate reports for managers.
However, senior managers rarely use these systems directly, and often find the aggregate
information to be of little use without the ability to explore underlying details (Watson &
Rainer, 1991, Crockett, 1992).
An Executive Information System (EIS) is a tool that provides direct on-line
access to relevant information in a useful and navigable format. Relevant information is
timely, accurate, and actionable information about aspects of a business that are of
particular interest to the senior manager. The useful and navigable format of the system
means that it is specifically designed to be used by individuals with limited time, limited
keyboarding skills, and little direct experience with computers. An EIS is easy to
navigate so that managers can identify broad strategic issues, and then explore the
information to find the root causes of those issues.
Executive Information Systems differ from traditional information systems in the
following ways:
They are specifically tailored to executive's information needs.
They are able to access data about specific issues and problems as well as
aggregate reports
They provide extensive on-line analysis tools including trend analysis, exception
reporting & "drill-down" capability
They access a broad range of internal and external data
They are particularly easy to use (typically mouse or touchscreen driven)
They are used directly by executives without assistance
They present information in a graphical form
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Purpose of EIS
The primary purpose of an Executive Information System is to support
managerial learning about an organization, its work processes, and its interaction with the
external environment. Informed managers can ask better questions and make better
decisions. Vandenbosch and Huff (1992) from the University of Western Ontario found
that Canadian firms using an EIS achieved better business results if their EIS promoted
managerial learning. Firms with an EIS designed to maintain managers' "mental models"
were less effective than firms with an EIS designed to build or enhance managers'
knowledge.
This distinction is supported by Peter Senge in The Fifth Dimension. He
illustrates the benefits of learning about the behaviour of systems versus simply learning
more about their states. Learning more about the state of a system leads to reactive
management fixes. Typically these reactions feed into the underlying system behaviour
and contribute to a downward spiral. Learning more about system behaviour and how
various system inputs and actions interrelate will allow managers to make more proactive
changes to create long-term improvement.
A secondary purpose for an EIS is to allow timely access to information. All of
the information contained in an EIS can typically be obtained by a manager through
traditional methods. However, the resources and time required to manually compile
information in a wide variety of formats, and in response to ever changing and ever more
specific questions usually inhibit managers from obtaining this information. Often, by the
time a useful report can be compiled, the strategic issues facing the manager have
changed, and the report is never fully utilized.
Timely access also influences learning. When a manager obtains the answer to a
question, that answer typically sparks other related questions in the manager's mind. If
those questions can be posed immediately, and the next answer retrieved, the learning
cycle continues unbroken. Using traditional methods, by the time the answer is produced,
the context of the question may be lost, and the learning cycle will not continue. An
executive in Rockart & Treacy's 1982 study noted that:
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Your staff really can't help you think. The problem with giving a question to the
staff is that they provide you with the answer. You learn the nature of the real question
you should have asked when you muck around in the data (p. 9).
A third purpose of an EIS is commonly misperceived. An EIS has a powerful
ability to direct management attention to specific areas of the organization or specific
business problems. Some managers see this as an opportunity to discipline subordinates.
Some subordinates fear the directive nature of the system and spend a great deal of time
trying to outwit or discredit it. Neither of these behaviours is appropriate or productive.
Rather, managers and subordinates can work together to determine the root causes of
issues highlighted by the EIS.
The powerful focus of an EIS is due to the maxim "what gets measured gets
done." Managers are particularly attentive to concrete information about their
performance when it is available to their superiors. This focus is very valuable to an
organization if the information reported is actually important and represents a balanced
view of the organization's objectives.
Misaligned reporting systems can result in inordinate management attention to
things that are not important or to things which are important but to the exclusion of other
equally important things. For example, a production reporting system might lead
managers to emphasize volume of work done rather than quality of work. Worse yet,
productivity might have little to do with the organization's overriding customer service
objectives.
Contents of EIS
A general answer to the question of what data is appropriate for inclusion in an
Executive Information System is "whatever is interesting to executives." While this
advice is rather simplistic, it does reflect the variety of systems currently in use.
Executive Information Systems in government have been constructed to track data about
Ministerial correspondence, case management, worker productivity, finances, and human
resources to name only a few. Other sectors use EIS implementations to monitor
information about competitors in the news media and databases of public information in
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
addition to the traditional revenue, cost, volume, sales, market share and quality
applications.
Frequently, EIS implementations begin with just a few measures that are clearly
of interest to senior managers, and then expand in response to questions asked by those
managers as they use the system. Over time, the presentation of this information becomes
stale, and the information diverges from what is strategically important for the
organization. A "Critical Success Factors" approach is recommended by many
management theorists (Daniel, 1961, Crockett, 1992, Watson and Frolick, 1992).
Practitioners such as Vandenbosch (1993) found that:
While our efforts usually met with initial success, we often found that after six
months to a year, executives were almost as bored with the new information as they had
been with the old. A strategy we developed to rectify this problem required organizations
to create a report of the month. That is, in addition to the regular information provided for
management committee meetings, the CEO was charged with selecting a different
indicator to focus on each month (Vandenbosch, 1993, pp. 8-9).
While the above indicates that selection of data for inclusion in an EIS is difficult,
there are several guidelines that help to make that assessment. A practical set of
principles to guide the design of measures and indicators to be included in an EIS is
presented below (Kelly, 1992b). For a more detailed discussion of methods for selecting
measures that reflect organizational objectives, see the section "EIS and Organizational
Objectives."
EIS measures must be easy to understand and collect. Wherever possible, data
should be collected naturally as part of the process of work. An EIS should not add
substantially to the workload of managers or staff.
EIS measures must be based on a balanced view of the organization's objective.
Data in the system should reflect the objectives of the organization in the areas of
productivity, resource management, quality and customer service.
Performance indicators in an EIS must reflect everyone's contribution in a fair and
consistent manner. Indicators should be as independent as possible from variables outside
the control of managers.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
EIS measures must encourage management and staff to share ownership of the
organization's objectives. Performance indicators must promote both team-work and
friendly competition. Measures will be meaningful for all staff; people must feel that
they, as individuals, can contribute to improving the performance of the organization.
EIS information must be available to everyone in the organization. The objective
is to provide everyone with useful information about the organization's performance.
Information that must remain confidential should not be part of the EIS or the
management system of the organization.
EIS measures must evolve to meet the changing needs of the organization.
Barriers to Effectiveness
There are many ways in which an EIS can fail. Dozens of high profile, high cost
EIS projects have been cancelled, implemented and rarely used, or implemented and used
with negative results. An EIS is a high risk project precisely because it is intended for use
by the most powerful people in an organization. Senior managers can easily misuse the
information in the system with strongly detrimental effects on the organization. Senior
managers can refuse to use a system if it does not respond to their immediate personal
needs or is too difficult to learn and use.
Unproductive Organizational Behaviour Norms
Issues of organizational behaviour and culture are perhaps the most deadly
barriers to effective Executive Information Systems. Because an EIS is typically
positioned at the top of an organization, it can create powerful learning experiences and
lead to drastic changes in organizational direction. However, there is also great potential
for misuse of the information. Green, Higgins and Irving (1988) found that performance
monitoring can promote bureaucratic and unproductive behaviour, can unduly focus
organizational attention to the point where other important aspects are ignored, and can
have a strongly negative impact on morale.
The key barrier to EIS effectiveness, therefore, is the way in which the
organization uses the information in the system. Managers must be aware of the dangers
of statistical data, and be skilled at interpreting and using data in an effective way. Even
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
more important is the manager's ability to communicate with others about statistical data
in a non-defensive, trustworthy, and constructive manner. Argyris (1991) suggests a
universal human tendency towards strategies that avoid embarrassment or threat, and
towards feelings of vulnerability or incompetence. These strategies include:
The stating criticism of others in a way that you feel is valid but also in a way that
prevents others from deciding for themselves
Failing to include any data that others could use to objectively evaluate your
criticism
Stating your conclusions in ways that disguise their logical implications and
denying those implications if they are suggested
To make effective use of an EIS, mangers must have the self-confidence to accept
negative results and focus on the resolution of problems rather than on denial and blame.
Since organizations with limited exposure to planning and targeting, data-based decision-
making, statistical process control, and team-based work models may not have dealt with
these behavioural issues in the past, they are more likely to react defensively and reject an
EIS.
Technical Excellence
An interesting result from the Vandenbosch & Huff (1988) study was that the
technical excellence of an EIS has an inverse relationship with effectiveness. Systems
that are technical masterpieces tend to be inflexible, and thus discourage innovation,
experimentation and mental model development.
Flexibility is important because an EIS has such a powerful ability to direct
attention to specific issues in an organization. A technical masterpiece may accurately
direct management attention when the system is first implemented, but continue to direct
attention to issues that were important a year ago on its first anniversary. There is
substantial danger that the exploration of issues necessary for managerial learning will be
limited to those subjects that were important when the EIS was first developed. Managers
must understand that as the organization and its work changes, an EIS must continually
be updated to address the strategic issues of the day.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
A number of explanations as to why technical masterpieces tend to be less
flexible are possible. Developers who create a masterpiece EIS may become attached to
the system and consciously or unconsciously dissuade managers from asking for changes.
Managers who are uncertain that the benefits outweigh the initial cost of a masterpiece
EIS may not want to spend more on system maintenance and improvements. The time
required to create a masterpiece EIS may mean that it is outdated before it is
implemented.
While usability and response time are important factors in determining whether
executives will use a system, cost and flexibility are paramount. A senior manager will be
more accepting of an inexpensive system that provides 20% of the needed information
within a month or two than with an expensive system that provides 80% of the needed
information after a year of development. The manager may also find that the inexpensive
system is easier to change and adapt to the evolving needs of the business. Changing a
large system would involve throwing away parts of a substantial investment. Changing
the inexpensive system means losing a few weeks of work. As a result, fast, cheap,
incremental approaches to developing an EIS increase the chance of success.
Technical Problems
Paradoxically, technical problems are also frequently reported as a significant
barrier to EIS success. The most difficult technical problem -- that of integrating data
from a wide range of data sources both inside and outside the organization -- is also one
of the most critical issues for EIS users. A marketing vice-president, who had spent
several hundred thousand dollars on an EIS, attended a final briefing on the system. The
technical experts demonstrated the many graphs and charts of sales results, market share
and profitability. However, when the vice-president asked for a graph of market share
and advertising expense over the past ten years, the system was unable to access
historical data. The project was cancelled in that meeting.
The ability to integrate data from many different systems is important because it
allows managerial learning that is unavailable in other ways. The president of a
manufacturing company can easily get information about sales and manufacturing from
the relevant VPs. Unfortunately, the information the president receives will likely be
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
incompatible, and learning about the ways in which sales and manufacturing processes
influence each other will not be easy. An EIS will be particularly effective if it can
overcome this challenge, allowing executives to learn about business processes that cross
organizational boundaries and to compare business results in disparate functions.
Another technical problem that can kill EIS projects is usability. Senior managers
simply have the choice to stop using a system if they find it too difficult to learn or use.
They have very little time to invest in learning the system, a low tolerance for errors, and
initially may have very little incentive to use it. Even if the information in the system is
useful, a difficult interface will quickly result in the manager assigning an analyst to
manipulate the system and print out the required reports. This is counter-productive
because managerial learning is enhanced by the immediacy of the question - answer
learning cycle provided by an EIS. If an analyst is interacting with the system, the analyst
will acquire more learning than the manager, but will not be in a position to put that
learning to its most effective use.
Usability of Executive Information Systems can be enhanced through the use of
prototyping and usability evaluation methods. These methods ensure that clear
communication occurs between the developers of the system and its users. Managers
have an opportunity to interact with systems that closely resemble the functionality of the
final system and thus can offer more constructive criticism than they might be able to
after reading an abstract specification document. Systems developers also are in a
position to listen more openly to criticisms of a system since a prototype is expected to be
disposable. Several evaluation protocols are available including observation and
monitoring, software logging, experiments and benchmarking, etc. (Preece et al, 1994).
The most appropriate methods for EIS design are those with an ethnographic flavour
because the experience base of system developers is typically so different from that of
their user population (senior executives).
Misalignment Between Objectives & EIS
A final barrier to EIS effectiveness was mentioned earlier in the section on
purpose. As noted there, the powerful ability of an EIS to direct organizational attention
can be destructive if the system directs attention to the wrong variables. There are many
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
examples of this sort of destructive reporting. Grant, Higgins and Irving (1988) report the
account of an employee working under a misaligned reporting system.
I like the challenge of solving customer problems, but they get in the way of
hitting my quota. I'd like to get rid of the telephone work. If (the company) thought
dealing with customers was important, I'd keep it; but if it's just going to be production
that matters, I'd gladly give all the calls to somebody else.
Traditional cost accounting systems are also often misaligned with organizational
objectives, and placing these measures in an EIS will continue to draw attention to the
wrong things. Cost accounting allocates overhead costs to direct labour hours. In some
cases the overhead burden on each direct labour hour is as much as 1000%. A manager
operating under this system might decide to sub-contract 100 hours of direct labor at $20
per hour. On the books, this $2,000 saving is accompanied by $20,000 of savings in
overhead. If the sub-contractor charges $5,000 for the work, the book savings are $2,000
+ $20,000 - $5,000 = $17,000. In reality, however, the overhead costs for an idle machine
in a factory do not go down much at all. The sub-contract actually ends up costing $5,000
- $2,000 = $3,000. (Peters, 1987)
Characteristics of Successful EIS Implementations
Find an Appropriate Executive Champion
EIS projects that succeed do so because at least one member of the senior
management team agrees to champion the project. The executive champion need not fully
understand the technical issues, but must be a person who works closely with all of the
senior management team and understands their needs, work styles and their current
methods of obtaining organizational information. The champion's commitment must
include a willingness to set aside time for reviewing prototypes and implementation
plans, influencing and coaching other members of the senior management team, and
suggesting modifications and enhancements to the system.
Deliver a Simple Prototype Quickly
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Executives judge a new EIS on the basis of how easy it is to use and how relevant
the information in the system is to the current strategic issues in the organization. As a
result, the best EIS projects begin as a simple prototype, delivered quickly, that provides
data about at least one critical issue. If the information delivered is worth the hassle of
learning the system, a flurry of requirements will shortly be generated by executives who
like what they see, but want more. These requests are the best way to plan an EIS that
truly supports the organization, and are more valuable than months of planning by a
consultant or analyst.
One caveat concerning the simple prototype approach is that executive requests
will quickly scatter to questions of curiosity rather than strategy in an organization where
strategic direction and objectives are not clearly defined. A number of methods are
available to support executives in defining business objectives and linking them to
performance monitors in an EIS. These are discussed further in the section on EIS and
Organizational Objectives below.
Involve Your Information Systems Department
In some organizations, the motivation for an EIS project arises in the business
units quite apart from the traditional information systems (IS) organization. Consultants
may be called in, or managers and analysts in the business units may take the project on
without consulting or involving IS. This is a serious mistake. Executive Information
Systems rely entirely on the information contained in the systems created and maintained
by this department. IS professionals know best what information is available in an
organization's systems and how to get it. They must be involved in the team. Involvement
in such a project can also be beneficial to IS by giving them a more strategic perspective
on how their work influences the organization.
Communicate & Train to Overcome Resistance
A final characteristic of successful EIS implementations is that of communication.
Executive Information Systems have the potential to drastically alter the prevailing
patterns of organizational communication and thus will typically be met with resistance.
Some of this resistance is simply a matter of a lack of knowledge. Training on how to use
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
statistics and performance measures can help. However, resistance can also be rooted in
the feelings of fear, insecurity and cynicism experienced by individuals throughout the
organization. These attitudes can only be influenced by a strong and vocal executive
champion who consistently reinforces the purpose of the system and directs the attention
of the executive group away from unproductive and punitive behaviours.
EIS and Organizational Culture
Henry Mintzberg (1972) has argued that impersonal statistical data is irrelevant to
managers. John Dearden (1966) argued that the promise of real-time management
information systems was a myth and would never be of use to top managers. Grant,
Higgins, and Irving (1988) argue that computerized performance monitors undermine
trust, reduce autonomy and fail to illuminate the most important issues.
Many of these arguments against EISs have objective merit. Manager's really do
value the tangible tidbits of detail they encounter in their daily interactions more highly
than abstract numerical reports. Rumours suggest a future, while numbers describe a past.
Conversations are rich in detail and continuously probe the reasons for the situation,
while statistics are vague approximations of reality. When these vague approximations
are used to intimidate or control behaviour rather than to guide learning, they really do
have a negative impact on the organization.
Yet both of these objections point to a deeper set of problems -- the assumptions,
beliefs, values and behaviours that people in the organization hold and use to respond to
their environment. Perhaps senior managers find statistical data to be irrelevant because
they have found too many errors in previous reports? Perhaps people in the organization
prefer to assign blame rather than discover the true root cause of problems. The culture
of an organization can have a dramatic influence on the adoption and use of an Executive
Information System. The following cultural characteristics will contribute directly to the
success or failure of an EIS project.
Learning vs Blaming
A learning organization is one that seeks first to understand why a problem
occurred, and not who is to blame. It is a common and natural response for managers to
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
try to deflect responsibility for a problem on to someone else. An EIS can help to do this
by indicating very specifically who failed to meet a statistical target, and by how much. A
senior manager, armed with EIS data, can intimidate and blame the appropriate person.
The blamed person can respond by questioning the integrity of the system, blaming
someone else, or even reacting in frustration by slowing work down further.
In a learning organization, any unusual result is seen as an opportunity to learn
more about the business and its processes. Managers who find an unusual statistic explore
it further, breaking it down to understand its components and comparing it with other
numbers to establish cause and effect relationships. Together as a team, management uses
numerical results to focus learning and improve business processes across the
organization. An EIS facilitates this approach by allowing instant exploration of a
number, its components and its relationship to other numbers.
Continuous Improvement vs Crisis Management
Some organizations find themselves constantly reacting to crises, with little time
for any proactive measures. Others have managed to respond to each individual crisis
with an approach that prevents other similar problems in the future. They are engaged in
a continual cycle of improving business practices and finding ways to avoid crisis.
Crises in government are frequently caused by questions about organizational
performance raised by an auditor, the Minister, or members of the Opposition. An EIS
can be helpful in responding to this sort of crisis by providing instant data about the
actual facts of the situation. However, this use of the EIS does little to prevent future
crises.
An organizational culture in which continual improvement is the norm can use the
EIS as an early warning system pointing to issues that have not yet reached the crisis
point, but are perhaps the most important areas on which to focus management attention
and learning. Organizations with a culture of continuous improvement already have an
appetite for the sort of data an EIS can provide, and thus will exhibit less resistance.
Team Work vs Hierarchy
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
An EIS has the potential to substantially disrupt an organization that relies upon
adherence to a strict chain of command. The EIS provides senior managers with the
ability to micro-manage details at the lowest levels in the organization. A senior manger
with an EIS report who is surprised at the individual results of a front-line worker might
call that person directly to understand why the result is unusual. This could be very
threatening for the managers between the senior manager and the front-line worker. An
EIS can also provide lower level managers with access to information about peer
performance and even the performance of their superiors.
Organizations that are familiar with work teams, matrix managed projects and
other forms of interaction outside the chain of command will find an EIS less disruptive.
Senior managers in these organizations have learned when micro-management is
appropriate and when it is not. Middle managers have learned that most interactions
between their superiors and their staff are not threatening to their position. Workers are
more comfortable interacting with senior managers when the need arises, and know what
their supervisor expects from them in such an interaction.
Data-based Decisions vs Decisions in a Vacuum
The total quality movement, popular in many organizations today, emphasizes a
set of tools referred to as Statistical Process Control (SPC). These analytical tools provide
managers and workers with methods of understanding a problem and finding solutions
rather than allocating blame and passing the buck. Organizations with training and
exposure to SPC and analytical tools will be more open to an EIS than those who are
suspicious of numerical measures and the motives of those who use them.
It should be noted that data-based decision making does not deny the role of
intuition, experience, or negotiation amongst a group. Rather, it encourages decision-
makers to probe the facts of a situation further before coming to a decision. Even if the
final decision contradicts the data, chances are that an exploration of the data will help
the decision-maker to understand the situation better before a decision is reached. An EIS
can help with this decision-making process.
Information Sharing vs Information Hoarding
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Information is power in many organizations, and managers are motivated to hoard
information rather than to share it widely. For example, managers may hide information
about their own organizational performance, but jump at any chance to see information
about performance of their peers.
A properly designed EIS promotes information sharing throughout the organization.
Peers have access to information about each other's domain; junior managers have
information about how their performance contributes to overall organizational
performance. An organization that is comfortable with information sharing will have
developed a set of "good manners" for dealing with this broad access to information.
These behavioural norms are key to the success of an EIS.
Specific Objectives vs Vague Directions
An organization that has experience developing and working toward Specific,
Measurable, Achievable and Consistent (SMAC) objectives will also find an EIS to be
less threatening. Many organizations are uncomfortable with specific performance
measures and targets because they believe their work to be too specialized or
unpredictable. Managers in these organizations tend to adopt vague generalizations and
statements of the exceedingly obvious in place of SMAC objectives that actually focus
and direct organizational performance. In a few cases, it may actually be true that
numerical measures are completely inappropriate for a few aspects of the business. In
most cases, managers with this attitude have a poor understanding of the purpose of
objective and target-setting exercises. Some business processes are more difficult to
measure and set targets for than others. Yet almost all business processes have at least a
few characteristics that can be measured and improved through conscientious objective
setting. (See the following section on EIS and Organizational Objectives.)
EIS and Organizational Objectives
A number of writers have discovered that one of the major difficulties with EIS
implementations is that the information contained in the EIS either does not meet
executive requirements, or meets executive requirements, but fails to guide the
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
organization towards its objectives. As discussed earlier, organizations that are
comfortable in establishing and working towards Specific, Measurable, Achievable, and
Consistent (SMAC) objectives will find it easier to create an EIS that actually drives
organizational performance. Yet even these organizations may have difficulty because
their stated objectives do not represent all of the things that are important.
Crockett (1992) suggests a four step process for developing EIS information
requirements based on a broader understanding of organizational objectives. The steps
are: (1) identify critical success factors and stakeholder expectations, (2) document
performance measures that monitor the critical success factors and stakeholder
expectations, (3) determine reporting formats and frequency, and (4) outline information
flows and how information can be used. Crockett begins with stakeholders to ensure that
all relevant objectives and critical success factors are reflected in the EIS.
Kaplan and Norton (1992) suggest that goals and measures need to be developed
from each of four perspectives: financial, customer, internal business, and innovation and
learning. These perspectives help managers to achieve a balance in setting objectives, and
presenting them in a unified report exposes the tough tradeoffs in any management
system. An EIS built on this basis will not promote productivity while ignoring quality,
or customer satisfaction while ignoring cost.
Meyer (1994) raises several questions that should be asked about measurement
systems for teams. Four are appropriate for evaluating objectives and measures
represented in an EIS. They are:
Are all critical organizational outcomes tracked?
Are all "out-of-bounds" conditions tracked? (Conditions that are serious enough to trigger
a management review.)
Are all the critical variables required to reach each outcome tracked?
Is there any measure that would not cause the organization to change its behaviour?
In summary, proper definition of organizational objectives and measures is a helpful
precondition for reducing organizational resistance to an EIS and is the root of effective
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
EIS use. The benefits of an EIS will be fully realized only when it helps to focus
management attention on issues of true importance to the organization.
Methodology
Implementation of an effective EIS requires clear consensus on the objectives and
measures to be monitored in the system and a plan for obtaining the data on which those
measures are based. The sections below outline a methodology for achieving these two
results. As noted earlier, successful EIS implementations generally begin with a simple
prototype rather than a detailed planning process. For that reason, the proposed planning
methodologies are as simple and scope-limited as possible.
Q.6 Discuss the challenges involved in data integration and coordination process?
Data Integration Primer
Challenges to Data Integration
One of the most fundamental challenges in the process of data integration is
setting realistic expectations. The term data integration conjures a perfect coordination of
diversified databases, software, equipment, and personnel into a smoothly functioning
alliance, free of the persistent headaches that mark less comprehensive systems of
information management. Think again.
The requirements analysis stage offers one of the best opportunities in the process
to recognize and digest the full scope of complexity of the data integration task.
Thorough attention to this analysis is possibly the most important ingredient in creating a
system that will live to see adoption and maximum use.
As the field of data integration progresses, however, other common impediments
and compensatory solutions will be easily identified. Current integration practices have
already highlighted a few familiar challenges as well as strategies to address them, as
outlined below.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Heterogeneous Data
Challenges
For most transportation agencies, data integration involves synchronizing huge
quantities of variable, heterogeneous data resulting from internal legacy systems that vary
in data format. Legacy systems may have been created around flat file, network, or
hierarchical databases, unlike newer generations of databases which use relational data.
Data in different formats from external sources continue to be added to the legacy
databases to improve the value of the information. Each generation, product, and home-
grown system has unique demands to fulfill in order to store or extract data. So data
integration can involve various strategies for coping with heterogeneity. In some cases,
the effort becomes a major exercise in data homogenization, which may not enhance the
quality of the data offered.
Strategies
A detailed analysis of the characteristics and uses of data is necessary to mitigate
issues with heterogeneous data. First, a model is chosen-either a federated or data
warehouse environment- that serves the requirements of the business applications
and other uses of the data. Then the database developer will need to ensure that
various applications can use this format or, alternatively, that standard operating
procedures are adopted to convert the data to another format.
Bringing disparate data together in a database system or migrating and fusing
highly incompatible databases is painstaking work that can sometimes feel like an
overwhelming challenge. Thankfully, software technology has advanced to
minimize obstacles through a series of data access routines that allow structured
query languages to access nearly all DBM and data file systems-relational or non-
relational.
Bad Data
Challenges
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Data quality is a top concern in any data integration strategy. Legacy data must be
cleaned up prior to conversion and integration, or an agency will almost certainly face
serious data problems later. Legacy data impurities have a compounding effect; by
nature, they tend to concentrate around high volume data users.
If this information is corrupt, so, too, will be the decisions made from it. It is not
unusual for undiscovered data quality problems to emerge in the process of cleaning
information for use by the integrated system. The issue of bad data leads to procedures
for regularly auditing the quality of information used. But who holds the ultimate
responsibility for this job is not always clear.
Strategies:
The issue of data quality exists throughout the life of any data integration system.
So it is best to establish both practices and responsibilities right from the start, and
make provisions for each to continue in perpetuity.
The best processes result when developers and users work together to determine
the quality controls that will be put in place in both the development phase and
the ongoing use of the system.
Lack of Storage Capacity
Challenges
The unanticipated need for additional performance and capacity is one of the most
common challenges to data integration, particularly in data warehousing. Two storage-
related requirements generally come into play: extensibility and scalability. Anticipating
the extent of growth in an environment in which the need for storage can increase
exponentially once a system is initiated drives fears that the storage cost will exceed the
benefit of data integration. Introducing such massive quantities of data can push the limits
of hardware and software. This may force developers to instigate costly fixes if an
architecture for processing much larger amounts of data must be retrofitted into the
planned system.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Strategies
Alternative storage is becoming routine for data warehouses that are likely to
grow in size. Planning for such options helps keep expanding databases
affordable.
The cost per gigabyte of storage on disk drives continues to decline as technology
improves. From 2000 to 2004, for instance, the cost of data storage declined ten-
fold. High-performance storage disks are expected to follow the downward
pricing spiral.
Unanticipated Costs
Challenges
Data integration costs are fueled largely by items that are difficult for the
uninitiated to quantify, and thus predict. These might include:
Labor costs for initial planning, evaluation, programming and additional data
acquisition
Software and hardware purchases
Unanticipated technology changes/advances
Both labor and the direct costs of data storage and maintenance
It is important to note that, regardless of efforts to streamline maintenance, the
realities of a fully functioning data integration system may demand a great deal more
maintenance than could be anticipated.
Unrealistic estimating can be driven by an overly optimistic budget, particularly in
these times of budget shortfall and doing more with less. More users, more analysis needs
and more complex requirements may drive performance and capacity problems. Limited
resources may cause project timelines to be extended, without commensurate funding.
Unanticipated issues, or new issues, may call for expensive consulting help. And the
dynamic atmosphere of today's transportation agency must be taken into account, in
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
which lack of staff, changes in business processes, problems with hardware and software,
and shifting leadership can drive additional expense.
The investment in time and labor required to extract, clean, load, and maintain data
can creep if the quality of the data presented is weak. It is not unusual for this to produce
unanticipated labor costs that are rather alarmingly out of proportion to the total project
budget.
Strategies
The approach to estimating project costs must be both far-sighted and realistic.
This requires an investment in experienced analysts, as well as cooperation, where
possible, among sister agencies on lessons learned.
Special effort should be made to identify items that may seem unlikely but could
dramatically impact total project cost.
Extraordinary care in planning, investing in expertise, obtaining stakeholder buy-
in and participation, and managing the process will each help ensure that cost
overruns are minimized and, when encountered, can be most effectively resolved.
Data integration is a fluid process in which such overruns may occur at each step
along the way, so trained personnel with vigilant oversight are likely to return
dividends instead of adding to cost.
A viable data integration approach must recognize that the better data integration
works for users, the more fundamental it will become to business processes. This
level of use must be supported by consistent maintenance. It might be tempting to
think that a well designed system will, by nature, function without much upkeep
or tweaking. In fact, the best systems and processes tend to thrive on the routine
care and support of well-trained personnel, a fact that wise managers generously
anticipate in the data integration plan and budget.
Lack of Cooperation from Staff
Challenges
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
User groups within an agency may have developed databases on their own,
sometimes independently from information systems staff, that are highly responsive to
the users' particular needs. It is natural that owners of these functioning standalone units
might be skeptical that the new system would support their needs as effectively.
Other proprietary interests may come into play. For example, division staff may
not want the data they collect and track to be at all times transparently visible to
headquarters staff without the opportunity to address the nuances of what the data appear
to show. Owners or users may fear that higher ups without appreciation of the
peculiarities of a given method of operation will gain more control over how data is
collected and accessed organization-wide.
In some agencies, the level of personnel, consultants, and financial support
emanating from the highest echelons of management may be insufficient to dispel these
fears and gain cooperation. Top management must be fully invested in the project.
Otherwise, the likelihood is smaller that the strategic data integration plan and the
resources associated with it will be approved. The additional support required to engage
and convey to everyone in the agency the need for and benefits of data integration is
unlikely to flow from leaders who lack awareness of or commitment to the benefits of
data integration.
Strategies
Any large-scale data integration project, regardless of model, demands that
executive management be fully on board. Without it, the initiative is, quite
simply, likely to fail.
Informing and involving the diversity of players during the crucial requirements
analysis stage, and then in each subsequent phase and step, is probably the single
most effective way to gain buy-in, trust, and cooperation. Collecting and
addressing each user's concerns may be a daunting proposition, particularly for
knowledgeable information professionals who prefer to "cut to the chase."
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
However, without a personal stake in the process and a sense of ownership of the
final product, the long-term health of this major investment is likely to be
compromised by users who feel that change has been enforced upon them rather
than designed to advance their interests.
Incremental education, another benefit of stakeholder involvement, is easier to
impart than after-the-fact training, particularly since it addresses both the
capabilities and limitations of the system, helping to calibrate appropriate
expectations along the way.
Since so much of the project's success is dependent upon understanding and
conveying both human and technical issues, skilled communicators are a logical
component of any data integration team. Whether staff or consultants,
professional communications personnel are most effective as core participants,
rather than occasional or outside contributors. They are trained to recognize and
ameliorate gaps in understanding and motivation. Their skills also help maximize
the conditions for cooperation and enthusiastic adoption. In many transportation
agencies, public information personnel actually focus a significant amount of their
time and budget on internal audiences rather than external customers. This makes
them well attuned to the operational realities of a variety of internal stakeholders.
Peer Perspectives...
At least three conditions were required for the success of Virginia DOT's
development effort:
Upper management had to support the business objectives of the project and the
creation of a new system to meet the objectives
Project managers had to receive the budget, staff, and IT resources necessary to
initiate and complete the process
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
All stakeholders and eventual system users from the agency's districts and
headquarters had to cooperate with the project team throughout the process(22)
Lack of Data Management Expertise
Challenges
As more transportation agencies nationwide undertake the integration of data, the
availability of experienced personnel increases. However, since data integration is a
multi-year, highly complex proposition, even these leaders may not have the kind of
expertise that evolves over a full project life-cycle. Common problems develop at
different stages of the process and these can better be anticipated and addressed when key
personnel have managed the typical variables of each project phase.
Also, the process of transferring historical data from its independent source to the
integrated system may benefit from the knowledge of the manager who originally
captured and stored the information. High turnover in such positions, along with early
retirements and other personnel shifts driven by an historically tight budget environment,
may complicate the mining and preparation of this data for convergence with the new
system.
Strategies
A seasoned and highly knowledgeable data integration project leader and a data
manager with state of the practice experience are the minimum required to design
a viable approach to integration. Choosing this expertise very carefully can help
ensure that the resulting architecture is sufficiently modular, can be maintained,
and is robust enough to support a wide range of owner and user needs while
remaining flexible enough to accommodate changing transportation decision-
support requirements over a period of years.
Perception of Data Integration as an Overwhelming Effort
Challenges
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
When transportation agencies consider data integration, one pervasive notion is
that the analysis of existing information needs and infrastructure, much less the
organization of data into viable channels for integration, requires a monumental initial
commitment of resources and staff. Resource-scarce agencies identify this perceived
major upfront overhaul as "unachievable" and "disruptive." In addition, uncertainties
about funding priorities and potential shortfalls can exacerbate efforts to move forward.
Strategies
Methodical planning is essential in data integration. Setting incremental (or
phased) goals helps ensure that each phase can be understood, achieved, and
funded adequately. This approach also allows the integration process to be
flexible and agile, minimizing risks associated with funding and other resource
uncertainties and priority shifts. In addition, the smaller, more accurate goals will
help sustain the integration effort and make it less disruptive to those using and
providing data.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
BUSINESS INTELLIGENCE TOOLS – 4 CREDITS
SUBJECT CODE – MI0036
ASSIGNMENT SET – 2
Q.1 Explain business development life cycle in detail?
Business Life Cycle
Your business is changing. With the passage of time, your company will go
through various stages of the business life cycle. Learn what upcoming focuses,
challenges and financing sources you will need to succeed.
A business goes through stages of development similar to the cycle of life for the
human race. Parenting strategies that work for your toddler can not be applied to your
teenager. The same goes for your small business. It will be faced with a different cycle
throughout its life. What you focus on today will change and require different approaches
to be successful.
The 7 Stages of the Business Life Cycle
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Seed
The seed stage of your business life cycle is when your business is just a thought or an
idea. This is the very conception or birth of a new business.
Challenge: Most seed stage companies will have to overcome the challenge of
market acceptance and pursue one niche opportunity. Do not spread money and time
resources too thin.
Focus: At this stage of the business the focus is on matching the business opportunity
with your skills, experience and passions. Other focal points include: deciding on a
business ownership structure, finding professional advisors, and business planning.
Money Sources: Early in the business life cycle with no proven market or customers
the business will rely on cash from owners, friends and family. Other potential
sources include suppliers, customers, government grants and banks.
WNB products to consider: Classic Checking Account / Business Savings Account /
SBA Resources / Minnesota SBDC / Minnesota Community Capital Fund
Start-Up
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Your business is born and now exists legally. Products or services are in production and
you have your first customers.
Challenge: If your business is in the start-up life cycle stage, it is likely you have
overestimated money needs and the time to market. The main challenge is not to
burn through what little cash you have. You need to learn what profitable needs
your clients have and do a reality check to see if your business is on the right track.
Focus: Start-ups require establishing a customer base and market presence along with
tracking and conserving cash flow.
Money Sources: Owner, friends, family, suppliers, customers, grants, and banks.
WNB products to consider: Seed Stage Products / Working Capital Loan / Line of
Credit / Equipment Financing / Business Internet Banking / Bill Payer / Credit Card
Processing
Growth
Your business has made it through the toddler years and is now a child. Revenues and
customers are increasing with many new opportunities and issues. Profits are strong, but
competition is surfacing.
Challenge: The biggest challenge growth companies face is dealing with the constant
range of issues bidding for more time and money. Effective management is required
and a possible new business plan. Learn how to train and delegate to conquer this
stage of development.
Focus: Growth life cycle businesses are focused on running the business in a more
formal fashion to deal with the increased sales and customers. Better accounting and
management systems will have to be set-up. New employees will have to be hired to
deal with the influx of business.
Money Sources: Banks, profits, partnerships, grants and leasing options.
WNB products to consider: Line of Credit / Equipment Financing / Construction
Loan / Commercial Real Estate Loan / Health Savings Account / Remote Deposit /
Cash Management / Business Credit Card
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Established
Your business has now matured into a thriving company with a place in the market and
loyal customers. Sales growth is not explosive but manageable. Business life has become
more routine.
Challenge: It is far too easy to rest on your laurels during this life stage. You have
worked hard and have earned a rest but the marketplace is relentless and
competitive. Stay focused on the bigger picture. Issues like the economy,
competitors or changing customer tastes can quickly end all you have work for.
Focus: An established life cycle company will be focused on improvement and
productivity. To compete in an established market, you will require better business
practices along with automation and outsourcing to improve productivity.
Money Sources: Profits, banks, investors and government.
WNB products to consider: Premium Checking Account / Business Money Fund
Account / Sweep Account / Private Financial / 401K Planning / Investment
Brokerage / Health Savings Account / Remote Deposit / Cash Management /
Business Credit Card / Line of Credit
Expansion
This life cycle is characterized by a new period of growth into new markets and
distribution channels. This stage is often the choice of the business owner to gain a larger
market share and find new revenue and profit channels.
Challenge: Moving into new markets requires the planning and research of a seed or
start-up stage business. Focus should be on businesses that complement your
existing experience and capabilities. Moving into unrelated businesses can be
disastrous.
Focus: Add new products or services to existing markets or expand existing business
into new markets and customer types.
Money Sources: Joint ventures, banks, licensing, new investors and partners.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
WNB products to consider: Acquisition Financing / Private Financial / Line of
Credit / Equipment Financing / Construction Loan / Commercial Real Estate Loan /
Investment Brokerage
Mature
Year over year sales and profits tend to be stable, however competition remains fierce.
Eventually sales start to fall off and a decision is needed whether to expand or exit the
company.
Challenge: Businesses in the mature stage of the life cycle will be challenged with
dropping sales, profits, and negative cash flow. The biggest issue is how long the
business can support a negative cash flow. Ask is it time to move back to the
expansion stage or move on to the final life cycle stage...exit.
Focus: Search for new opportunities and business ventures. Cutting costs and finding
ways to sustain cash flow are vital for the mature stage.
Money Sources: Suppliers, customers, owners, and banks.
WNB products to consider: Private Financial / 401K Planning / Employee Stock
Ownership Plans (ESOP) / Investment Brokerage / Health Savings Account /
Remote Deposit / Cash Management / Line of Credit
Exit
This is the big opportunity for your business to cash out on all the effort and years of hard
work. Or it can mean shutting down the business.
Challenge: Selling a business requires your realistic valuation. It may have been
years of hard work to build the company, but what is its real value in the current
market place. If you decide to close your business, the challenge is to deal with the
financial and psychological aspects of a business loss.
Focus: Get a proper valuation on your company. Look at your business operations,
management and competitive barriers to make the company worth more to the
buyer. Set-up legal buy-sell agreements along with a business transition plan.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Money Sources: Find a business valuation partner. Consult with your accountant and
financial advisors for the best tax strategy to sell or close-out down business.
WNB products to consider: Acquisition Financing / Employee Stock Ownership
Plans (ESOP) / Investment Brokerage / Trust
Q.2. Discuss the various components of data ware house?
Components of a Data Warehouse
Overall Architecture
The data warehouse architecture is based on a relational database management
system server that functions as the central repository for informational data. Operational
data and processing is completely separated from data warehouse processing. This central
information repository is surrounded by a number of key components designed to make
the entire environment functional, manageable and accessible by both the operational
systems that source data into the warehouse and by end-user query and analysis tools.
Typically, the source data for the warehouse is coming from the operational
applications. As the data enters the warehouse, it is cleaned up and transformed into an
integrated structure and format. The transformation process may involve conversion,
summarization, filtering and condensation of data. Because the data contains a historical
component, the warehouse must be capable of holding and managing large volumes of
data as well as different data structures for the same database over time.
The next sections look at the seven major components of data warehousing:
Data Warehouse Database
The central data warehouse database is the cornerstone of the data warehousing
environment. This database is almost always implemented on the relational database
management system (RDBMS) technology. However, this kind of implementation is
often constrained by the fact that traditional RDBMS products are optimized for
transactional database processing. Certain data warehouse attributes, such as very large
database size, ad hoc query processing and the need for flexible user view creation
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
including aggregates, multi-table joins and drill-downs, have become drivers for different
technological approaches to the data warehouse database. These approaches include:
Parallel relational database designs for scalability that include shared-memory,
shared disk, or shared-nothing models implemented on various multiprocessor
configurations (symmetric multiprocessors or SMP, massively parallel processors
or MPP, and/or clusters of uni- or multiprocessors).
An innovative approach to speed up a traditional RDBMS by using new index
structures to bypass relational table scans.
Multidimensional databases (MDDBs) that are based on proprietary database
technology; conversely, a dimensional data model can be implemented using a
familiar RDBMS. Multi-dimensional databases are designed to overcome any
limitations placed on the warehouse by the nature of the relational data model.
MDDBs enable on-line analytical processing (OLAP) tools that architecturally
belong to a group of data warehousing components jointly categorized as the data
query, reporting, analysis and mining tools.
Sourcing, Acquisition, Cleanup and Transformation Tools
A significant portion of the implementation effort is spent extracting data from
operational systems and putting it in a format suitable for informational applications that
run off the data warehouse.
The data sourcing, cleanup, transformation and migration tools perform all of the
conversions, summarizations, key changes, structural changes and condensations needed
to transform disparate data into information that can be used by the decision support tool.
They produce the programs and control statements, including the COBOL programs,
MVS job-control language (JCL), UNIX scripts, and SQL data definition language
(DDL) needed to move data into the data warehouse for multiple operational systems.
These tools also maintain the meta data. The functionality includes:
Removing unwanted data from operational databases
Converting to common data names and definitions
Establishing defaults for missing data
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Accommodating source data definition changes
The data sourcing, cleanup, extract, transformation and migration tools have to deal with
some significant issues including:
Database heterogeneity. DBMSs are very different in data models, data access
language, data navigation, operations, concurrency, integrity, recovery etc.
Data heterogeneity. This is the difference in the way data is defined and used in
different models - homonyms, synonyms, unit compatibility (U.S. vs metric),
different attributes for the same entity and different ways of modeling the same
fact.
These tools can save a considerable amount of time and effort. However, significant
shortcomings do exist. For example, many available tools are generally useful for simpler
data extracts. Frequently, customized extract routines need to be developed for the more
complicated data extraction procedures.
Meta data
Meta data is data about data that describes the data warehouse. It is used for
building, maintaining, managing and using the data warehouse. Meta data can be
classified into:
Technical meta data, which contains information about warehouse data for use by
warehouse designers and administrators when carrying out warehouse
development and management tasks.
Business meta data, which contains information that gives users an easy-to-
understand perspective of the information stored in the data warehouse.
Equally important, meta data provides interactive access to users to help
understand content and find data. One of the issues dealing with meta data relates to the
fact that many data extraction tool capabilities to gather meta data remain fairly
immature. Therefore, there is often the need to create a meta data interface for users,
which may involve some duplication of effort.
Meta data management is provided via a meta data repository and accompanying
software. Meta data repository management software, which typically runs on a
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
workstation, can be used to map the source data to the target database; generate code for
data transformations; integrate and transform the data; and control moving data to the
warehouse.
As user's interactions with the data warehouse increase, their approaches to
reviewing the results of their requests for information can be expected to evolve from
relatively simple manual analysis for trends and exceptions to agent-driven initiation of
the analysis based on user-defined thresholds. The definition of these thresholds,
configuration parameters for the software agents using them, and the information
directory indicating where the appropriate sources for the information can be found are
all stored in the meta data repository as well.
Access Tools
The principal purpose of data warehousing is to provide information to business
users for strategic decision-making. These users interact with the data warehouse using
front-end tools. Many of these tools require an information specialist, although many end
users develop expertise in the tools. Tools fall into four main categories: query and
reporting tools, application development tools, online analytical processing tools, and
data mining tools.
Query and Reporting tools can be divided into two groups: reporting tools and
managed query tools. Reporting tools can be further divided into production reporting
tools and report writers. Production reporting tools let companies generate regular
operational reports or support high-volume batch jobs such as calculating and printing
paychecks. Report writers, on the other hand, are inexpensive desktop tools designed for
end-users.
Managed query tools shield end users from the complexities of SQL and database
structures by inserting a metalayer between users and the database. These tools are
designed for easy-to-use, point-and-click operations that either accept SQL or generate
SQL database queries.
Often, the analytical needs of the data warehouse user community exceed the
built-in capabilities of query and reporting tools. In these cases, organizations will often
rely on the tried-and-true approach of in-house application development using graphical
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
development environments such as PowerBuilder, Visual Basic and Forte. These
application development platforms integrate well with popular OLAP tools and access all
major database systems including Oracle, Sybase, and Informix.
OLAP tools are based on the concepts of dimensional data models and
corresponding databases, and allow users to analyze the data using elaborate,
multidimensional views. Typical business applications include product performance and
profitability, effectiveness of a sales program or marketing campaign, sales forecasting
and capacity planning. These tools assume that the data is organized in a
multidimensional model.
A critical success factor for any business today is the ability to use information
effectively. Data mining is the process of discovering meaningful new correlations,
patterns and trends by digging into large amounts of data stored in the warehouse using
artificial intelligence, statistical and mathematical techniques.
Data Marts
The concept of a data mart is causing a lot of excitement and attracts much
attention in the data warehouse industry. Mostly, data marts are presented as an
alternative to a data warehouse that takes significantly less time and money to build.
However, the term data mart means different things to different people. A rigorous
definition of this term is a data store that is subsidiary to a data warehouse of integrated
data. The data mart is directed at a partition of data (often called a subject area) that is
created for the use of a dedicated group of users. A data mart might, in fact, be a set of
denormalized, summarized, or aggregated data. Sometimes, such a set could be placed on
the data warehouse rather than a physically separate store of data. In most instances,
however, the data mart is a physically separate store of data and is resident on separate
database server, often a local area network serving a dedicated user group. Sometimes the
data mart simply comprises relational OLAP technology which creates highly
denormalized dimensional model (e.g., star schema) implemented on a relational
database. The resulting hypercubes of data are used for analysis by groups of users with a
common interest in a limited portion of the database.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
These types of data marts, called dependent data marts because their data is
sourced from the data warehouse, have a high value because no matter how they are
deployed and how many different enabling technologies are used, different users are all
accessing the information views derived from the single integrated version of the data.
Unfortunately, the misleading statements about the simplicity and low cost of data
marts sometimes result in organizations or vendors incorrectly positioning them as an
alternative to the data warehouse. This viewpoint defines independent data marts that in
fact, represent fragmented point solutions to a range of business problems in the
enterprise. This type of implementation should be rarely deployed in the context of an
overall technology or applications architecture. Indeed, it is missing the ingredient that is
at the heart of the data warehousing concept -- that of data integration. Each independent
data mart makes its own assumptions about how to consolidate the data, and the data
across several data marts may not be consistent.
Moreover, the concept of an independent data mart is dangerous -- as soon as the
first data mart is created, other organizations, groups, and subject areas within the
enterprise embark on the task of building their own data marts. As a result, you create an
environment where multiple operational systems feed multiple non-integrated data marts
that are often overlapping in data content, job scheduling, connectivity and management.
In other words, you have transformed a complex many-to-one problem of building a data
warehouse from operational and external data sources to a many-to-many sourcing and
management nightmare.
Data Warehouse Administration and Management
Data warehouses tend to be as much as 4 times as large as related operational
databases, reaching terabytes in size depending on how much history needs to be saved.
They are not synchronized in real time to the associated operational data but are updated
as often as once a day if the application requires it.
In addition, almost all data warehouse products include gateways to transparently
access multiple enterprise data sources without having to rewrite applications to interpret
and utilize the data. Furthermore, in a heterogeneous data warehouse environment, the
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
various databases reside on disparate systems, thus requiring inter-networking tools. The
need to manage this environment is obvious.
Managing data warehouses includes security and priority management;
monitoring updates from the multiple sources; data quality checks; managing and
updating meta data; auditing and reporting data warehouse usage and status; purging
data; replicating, subsetting and distributing data; backup and recovery and data
warehouse storage management.
Q.3. Discuss data extraction process? What are the various methods being used for
data extraction?
Overview of Extraction in Data Warehouses
Extraction is the operation of extracting data from a source system for further use
in a data warehouse environment. This is the first step of the ETL process. After the
extraction, this data can be transformed and loaded into the data warehouse.
The source systems for a data warehouse are typically transaction processing
applications. For example, one of the source systems for a sales analysis data warehouse
might be an order entry system that records all of the current order activities.
Designing and creating the extraction process is often one of the most time-
consuming tasks in the ETL process and, indeed, in the entire data warehousing process.
The source systems might be very complex and poorly documented, and thus determining
which data needs to be extracted can be difficult. The data has to be extracted normally
not only once, but several times in a periodic manner to supply all changed data to the
data warehouse and keep it up-to-date. Moreover, the source system typically cannot be
modified, nor can its performance or availability be adjusted, to accommodate the needs
of the data warehouse extraction process.
These are important considerations for extraction and ETL in general. This
chapter, however, focuses on the technical considerations of having different kinds of
sources and extraction methods. It assumes that the data warehouse team has already
identified the data that will be extracted, and discusses common techniques used for
extracting data from source databases.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Designing this process means making decisions about the following two main
aspects:
Which extraction method do I choose?
This influences the source system, the transportation process, and the time needed
for refreshing the warehouse.
How do I provide the extracted data for further processing?
This influences the transportation method, and the need for cleaning and
transforming the data.
Introduction to Extraction Methods in Data Warehouses
The extraction method you should choose is highly dependent on the source
system and also from the business needs in the target data warehouse environment. Very
often, there is no possibility to add additional logic to the source systems to enhance an
incremental extraction of data due to the performance or the increased workload of these
systems. Sometimes even the customer is not allowed to add anything to an out-of-the-
box application system.
The estimated amount of the data to be extracted and the stage in the ETL process
(initial load or maintenance of data) may also impact the decision of how to extract, from
a logical and a physical perspective. Basically, you have to decide how to extract data
logically and physically.
Logical Extraction Methods
There are two types of logical extraction:
Full Extraction
Incremental Extraction
Full Extraction
The data is extracted completely from the source system. Because this extraction
reflects all the data currently available on the source system, there's no need to keep track
of changes to the data source since the last successful extraction. The source data will be
provided as-is and no additional logical information (for example, timestamps) is
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
necessary on the source site. An example for a full extraction may be an export file of a
distinct table or a remote SQL statement scanning the complete source table.
Incremental Extraction
At a specific point in time, only the data that has changed since a well-defined
event back in history will be extracted. This event may be the last time of extraction or a
more complex business event like the last booking day of a fiscal period. To identify this
delta change there must be a possibility to identify all the changed information since this
specific time event. This information can be either provided by the source data itself such
as an application column, reflecting the last-changed timestamp or a change table where
an appropriate additional mechanism keeps track of the changes besides the originating
transactions. In most cases, using the latter method means adding extraction logic to the
source system.
Many data warehouses do not use any change-capture techniques as part of the
extraction process. Instead, entire tables from the source systems are extracted to the data
warehouse or staging area, and these tables are compared with a previous extract from the
source system to identify the changed data. This approach may not have significant
impact on the source systems, but it clearly can place a considerable burden on the data
warehouse processes, particularly if the data volumes are large.
Oracle's Change Data Capture mechanism can extract and maintain such delta
information. See Chapter 16, " Change Data Capture" for further details about the Change
Data Capture framework.
Physical Extraction Methods
Depending on the chosen logical extraction method and the capabilities and
restrictions on the source side, the extracted data can be physically extracted by two
mechanisms. The data can either be extracted online from the source system or from an
offline structure. Such an offline structure might already exist or it might be generated by
an extraction routine.
There are the following methods of physical extraction:
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Online Extraction
Offline Extraction
Online Extraction
The data is extracted directly from the source system itself. The extraction process can
connect directly to the source system to access the source tables themselves or to an
intermediate system that stores the data in a preconfigured manner (for example, snapshot
logs or change tables). Note that the intermediate system is not necessarily physically
different from the source system.
With online extractions, you need to consider whether the distributed transactions are
using original source objects or prepared source objects.
Offline Extraction
The data is not extracted directly from the source system but is staged explicitly outside
the original source system. The data already has an existing structure (for example, redo
logs, archive logs or transportable tablespaces) or was created by an extraction routine.
You should consider the following structures:
Flat files
Data in a defined, generic format. Additional information about the source
object is necessary for further processing.
Dump files
Oracle-specific format. Information about the containing objects may or
may not be included, depending on the chosen utility.
Redo and archive logs
Information is in a special, additional dump file.
Transportable tablespaces
A powerful way to extract and move large volumes of data between
Oracle databases. A more detailed example of using this feature to extract
and transport data is provided in Chapter 13, " Transportation in Data
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Warehouses". Oracle Corporation recommends that you use transportable
tablespaces whenever possible, because they can provide considerable
advantages in performance and manageability over other extraction
techniques.
See Oracle Database Utilities for more information on using export/import.
Change Data Capture
An important consideration for extraction is incremental extraction, also called
Change Data Capture. If a data warehouse extracts data from an operational system on a
nightly basis, then the data warehouse requires only the data that has changed since the
last extraction (that is, the data that has been modified in the past 24 hours). Change Data
Capture is also the key-enabling technology for providing near real-time, or on-time, data
warehousing.
When it is possible to efficiently identify and extract only the most recently
changed data, the extraction process (as well as all downstream operations in the ETL
process) can be much more efficient, because it must extract a much smaller volume of
data. Unfortunately, for many source systems, identifying the recently modified data may
be difficult or intrusive to the operation of the system. Change Data Capture is typically
the most challenging technical issue in data extraction.
Because change data capture is often desirable as part of the extraction process and it
might not be possible to use the Change Data Capture mechanism, this section describes
several techniques for implementing a self-developed change capture on Oracle Database
source systems:
Timestamps
Partitioning
Triggers
These techniques are based upon the characteristics of the source systems, or may require
modifications to the source systems. Thus, each of these techniques must be carefully
evaluated by the owners of the source system prior to implementation.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Each of these techniques can work in conjunction with the data extraction technique
discussed previously. For example, timestamps can be used whether the data is being
unloaded to a file or accessed through a distributed query. See Chapter 16, " Change Data
Capture" for further details.
Timestamps
The tables in some operational systems have timestamp columns. The timestamp
specifies the time and date that a given row was last modified. If the tables in an
operational system have columns containing timestamps, then the latest data can easily be
identified using the timestamp columns. For example, the following query might be
useful for extracting today's data from an orders table:
SELECT * FROM orders
WHERE TRUNC(CAST(order_date AS date),'dd') =
TO_DATE(SYSDATE,'dd-mon-yyyy');
If the timestamp information is not available in an operational source system, you
will not always be able to modify the system to include timestamps. Such modification
would require, first, modifying the operational system's tables to include a new
timestamp column and then creating a trigger to update the timestamp column following
every operation that modifies a given row.
Partitioning
Some source systems might use range partitioning, such that the source tables are
partitioned along a date key, which allows for easy identification of new data. For
example, if you are extracting from an orders table, and the orders table is partitioned by
week, then it is easy to identify the current week's data.
Triggers
Triggers can be created in operational systems to keep track of recently updated
records. They can then be used in conjunction with timestamp columns to identify the
exact time and date when a given row was last modified. You do this by creating a trigger
on each source table that requires change data capture. Following each DML statement
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
that is executed on the source table, this trigger updates the timestamp column with the
current time. Thus, the timestamp column provides the exact time and date when a given
row was last modified.
A similar internalized trigger-based technique is used for Oracle materialized
view logs. These logs are used by materialized views to identify changed data, and these
logs are accessible to end users. However, the format of the materialized view logs is not
documented and might change over time.
If you want to use a trigger-based mechanism, use synchronous change data
capture. It is recommended that you use synchronous Change Data Capture for trigger
based change capture, because CDC provides an externalized interface for accessing the
change information and provides a framework for maintaining the distribution of this
information to various clients.
Materialized view logs rely on triggers, but they provide an advantage in that the
creation and maintenance of this change-data system is largely managed by the database.
However, Oracle Corporation recommends the usage of synchronous Change
Data Capture for trigger-based change capture, since CDC provides an externalized
interface for accessing the change information and provides a framework for maintaining
the distribution of this information to various clients
Trigger-based techniques might affect performance on the source systems, and
this impact should be carefully considered prior to implementation on a production
source system.
Data Warehousing Extraction Examples
You can extract data in two ways:
Extraction Using Data Files
Extraction Through Distributed Operations
Extraction Using Data Files
Most database systems provide mechanisms for exporting or unloading data from
the internal database format into flat files. Extracts from mainframe systems often use
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
COBOL programs, but many databases, as well as third-party software vendors, provide
export or unload utilities.
Data extraction does not necessarily mean that entire database structures are
unloaded in flat files. In many cases, it may be appropriate to unload entire database
tables or objects. In other cases, it may be more appropriate to unload only a subset of a
given table such as the changes on the source system since the last extraction or the
results of joining multiple tables together. Different extraction techniques vary in their
capabilities to support these two scenarios.
When the source system is an Oracle database, several alternatives are available for
extracting data into files:
Extracting into Flat Files Using SQL*Plus
Extracting into Flat Files Using OCI or Pro*C Programs
Exporting into Export Files Using the Export Utility
Extracting into Export Files Using External Tables
Extracting into Flat Files Using SQL*Plus
The most basic technique for extracting data is to execute a SQL query in
SQL*Plus and direct the output of the query to a file. For example, to extract a flat file,
country_city.log, with the pipe sign as delimiter between column values, containing a list
of the cities in the US in the tables countries and customers, the following SQL script
could be run:
SET echo off SET pagesize 0 SPOOL country_city.log
SELECT distinct t1.country_name ||'|'|| t2.cust_city
FROM countries t1, customers t2 WHERE t1.country_id = t2.country_id
AND t1.country_name= 'United States of America';
SPOOL off
The exact format of the output file can be specified using SQL*Plus system
variables.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
This extraction technique offers the advantage of storing the result in a
customized format. Note that using the external table data pump unload facility, you can
also extract the result of an arbitrary SQL operation. The example previously extracts the
results of a join.
This extraction technique can be parallelized by initiating multiple, concurrent
SQL*Plus sessions, each session running a separate query representing a different portion
of the data to be extracted. For example, suppose that you wish to extract data from an
orders table, and that the orders table has been range partitioned by month, with partitions
orders_jan1998, orders_feb1998, and so on. To extract a single year of data from the
orders table, you could initiate 12 concurrent SQL*Plus sessions, each extracting a single
partition. The SQL script for one such session could be:
SPOOL order_jan.dat
SELECT * FROM orders PARTITION (orders_jan1998);
SPOOL OFF
These 12 SQL*Plus processes would concurrently spool data to 12 separate files.
You can then concatenate them if necessary (using operating system utilities) following
the extraction. If you are planning to use SQL*Loader for loading into the target, these 12
files can be used as is for a parallel load with 12 SQL*Loader sessions. See Chapter 13, "
Transportation in Data Warehouses" for an example.
Even if the orders table is not partitioned, it is still possible to parallelize the
extraction either based on logical or physical criteria. The logical method is based on
logical ranges of column values, for example:
SELECT ... WHERE order_date
BETWEEN TO_DATE('01-JAN-99') AND TO_DATE('31-JAN-99');
The physical method is based on a range of values. By viewing the data
dictionary, it is possible to identify the Oracle Database data blocks that make up the
orders table. Using this information, you could then derive a set of rowid-range queries
for extracting data from the orders table:
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
SELECT * FROM orders WHERE rowid BETWEEN value1 and value2;
Parallelizing the extraction of complex SQL queries is sometimes possible,
although the process of breaking a single complex query into multiple components can be
challenging. In particular, the coordination of independent processes to guarantee a
globally consistent view can be difficult. Unlike the SQL*Plus approach, using the new
external table data pump unload functionality provides transparent parallel capabilities.
Note that all parallel techniques can use considerably more CPU and I/O
resources on the source system, and the impact on the source system should be evaluated
before parallelizing any extraction technique.
Extracting into Flat Files Using OCI or Pro*C Programs
OCI programs (or other programs using Oracle call interfaces, such as Pro*C
programs), can also be used to extract data. These techniques typically provide improved
performance over the SQL*Plus approach, although they also require additional
programming. Like the SQL*Plus approach, an OCI program can extract the results of
any SQL query. Furthermore, the parallelization techniques described for the SQL*Plus
approach can be readily applied to OCI programs as well.
When using OCI or SQL*Plus for extraction, you need additional information
besides the data itself. At minimum, you need information about the extracted columns. It
is also helpful to know the extraction format, which might be the separator between
distinct columns.
Exporting into Export Files Using the Export Utility
The Export utility allows tables (including data) to be exported into Oracle Database
export files. Unlike the SQL*Plus and OCI approaches, which describe the extraction of
the results of a SQL statement, Export provides a mechanism for extracting database
objects. Thus, Export differs from the previous approaches in several important ways:
The export files contain metadata as well as data. An export file contains not only
the raw data of a table, but also information on how to re-create the table,
potentially including any indexes, constraints, grants, and other attributes
associated with that table.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
A single export file may contain a subset of a single object, many database
objects, or even an entire schema.
Export cannot be directly used to export the results of a complex SQL query.
Export can be used only to extract subsets of distinct database objects.
The output of the Export utility must be processed using the Import utility.
Oracle provides the original Export and Import utilities for backward compatibility
and the data pump export/import infrastructure for high-performant, scalable and parallel
extraction. See Oracle Database Utilities for further details.
Extracting into Export Files Using External Tables
In addition to the Export Utility, you can use external tables to extract the results
from any SELECT operation. The data is stored in the platform independent, Oracle-
internal data pump format and can be processed as regular external table on the target
system. The following example extracts the result of a join operation in parallel into the
four specified files. The only allowed external table type for extracting data is the Oracle-
internal format ORACLE_DATAPUMP.
CREATE DIRECTORY def_dir AS '/net/dlsun48/private/hbaer/WORK/FEATURES/et';
DROP TABLE extract_cust;
CREATE TABLE extract_cust
ORGANIZATION EXTERNAL
(TYPE ORACLE_DATAPUMP DEFAULT DIRECTORY def_dir ACCESS
PARAMETERS
(NOBADFILE NOLOGFILE)
LOCATION ('extract_cust1.exp', 'extract_cust2.exp', 'extract_cust3.exp',
'extract_cust4.exp'))
PARALLEL 4 REJECT LIMIT UNLIMITED AS
SELECT c.*, co.country_name, co.country_subregion, co.country_region
FROM customers c, countries co where co.country_id=c.country_id;
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
The total number of extraction files specified limits the maximum degree of
parallelism for the write operation. Note that the parallelizing of the extraction does not
automatically parallelize the SELECT portion of the statement.
Unlike using any kind of export/import, the metadata for the external table is not
part of the created files when using the external table data pump unload. To extract the
appropriate metadata for the external table, use the DBMS_METADATA package, as
illustrated in the following statement:
SET LONG 2000
SELECT DBMS_METADATA.GET_DDL('TABLE','EXTRACT_CUST') FROM
DUAL;
Extraction Through Distributed Operations
Using distributed-query technology, one Oracle database can directly query tables
located in various different source systems, such as another Oracle database or a legacy
system connected with the Oracle gateway technology. Specifically, a data warehouse or
staging database can directly access tables and data located in a connected source system.
Gateways are another form of distributed-query technology. Gateways allow an Oracle
database (such as a data warehouse) to access database tables stored in remote, non-
Oracle databases. This is the simplest method for moving data between two Oracle
databases because it combines the extraction and transformation into a single step, and
requires minimal programming. However, this is not always feasible.
Suppose that you wanted to extract a list of employee names with department
names from a source database and store this data into the data warehouse. Using an
Oracle Net connection and distributed-query technology, this can be achieved using a
single SQL statement:
CREATE TABLE country_city AS SELECT distinct t1.country_name, t2.cust_city
FROM countries@source_db t1, customers@source_db t2
WHERE t1.country_id = t2.country_id
AND t1.country_name='United States of America';
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
This statement creates a local table in a data mart, country_city, and populates it
with data from the countries and customers tables on the source system.
This technique is ideal for moving small volumes of data. However, the data is
transported from the source system to the data warehouse through a single Oracle Net
connection. Thus, the scalability of this technique is limited. For larger data volumes,
file-based data extraction and transportation techniques are often more scalable and thus
more appropriate.
Q.4 Discuss the needs of developing OLAP tools in details?
MOLAP or ROLAP
OLAP tools take you a step beyond query and reporting tools. Via OLAP tools,
data is represented using a multidimensional model rather than the more traditional
tabular data model. The traditional model defines a database schema that focuses on
modeling a process of function, and the information is viewed as a set of transactions,
each which occurred at some single point in time. The multidimensional model usually
defines a star schema, viewing data not as a single event but rather as the cumulative
effect of events over some period of time, such as weeks, then months, then years. With
OLAP tools, the user generally vies the data in grids or corsstabs that can be pivoted to
offer different perspectives on the data. OLAP also enables interactive querying of the
data. For example, a user can look at information at one aggregation (such as a sales
region) and then drill down to more detail information, such as sales by state, then city,
then store.
OLAP tools do not indicate how the data is actually stored. Given that, it’s not
surprising that there are multiple ways to store the data, including storing the data in a
dedicated multidimensional database (also referred to as MOLAP or MDD). Examples
include Arbors Software’s Essbase and Oracle Express Server. The other choice involves
storing the data in relational databases and having an OLAP tool work directly against the
data, referred to as relational OLAP (also referred to as ROLAP or RDBMS). Examples
include MicroStrategy’s DSS server and related products, Informix’s Informix-
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
MetaCube, Information Advantage’s Decision Suite, and Platinum Technologies’
Plantinum InfoBeacon. (Some also include Red Brick’s Warehouse in this category, but it
isn’t really an OLAP tool. Rather, it is a relations database optimized for performing the
types of operations that ROLAP tools need.)
ROLAP versus MOLAPRelational OLAP (ROLAP) Multidimensional OLAP (MOLAP)Scale to terabytes Under 50 DB capacityManaging of summary tables /indexes Instant responsePlatform portability Easier to implementSMP and MPP SMP onlySecure Integrated meta dataProven technologyData modeling required
Data warehouses can be implemented on standard or extended relational DBMSs,
called relational OLAP (ROLAP) servers. these serves assume that data is stored in
relational databases and they support extensions to SQL and special access and
implementation methods to efficiently implement the multidimensional data model and
operations. In contrast, multidimensional OLAP (MOLAP) servers are servers that
directly store multidimensional data in special data structures (like arrays or cubs) and
implement OLAP operations over these data in free-form fashion (free-from within the
framework of the DMBS that holds the multidimensional data). MOLAP servers have
sparsely populated matrices, numeric data, and a rigid structure of data once the data
enters the MOLAP DBMS framework.
Relational Databases
ROLAP servers contain both numeric and textual data, serving a much wider
purpose than their MOLAP counterparts. Unlike MOLAP DBMSs (supported by
specialized database management systems). ROLAP DBMSs (or RDMBSs) are
supported by relational technology. RDBMSs support numeric, textual, spatial, audio,
graphic, and video data, general-purpose DSS analysis, freely structured data, numerous
indexes, and star schema’s. ROLAP servers can have both disciplined and ad hoc usage
and can contain both detailed and summarized data.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
ROLAP supports large databases while enabling good performance, platform
portability, exploitation of hardware advances such as parallel processing, robust
security, multi-user concurrent access (including read-write with locking), recognized
standards, and openness to multiple vendor’s tools. ROLAP is based on familiar, proven,
and already selected technologies.
ROLAP tools take advantage of parallel RDBMSs for those parts of the
application processed using SQL (SQL not being a multidimensional access or
processing language). SO, although it is always possible to store multidimensional data in
a number of relations tables (the star schema), SQL does not, by itself, support
multidimensional manipulation of calculations. Therefore, ROLAP products must do
these calculations either in the client software or intermediate server engine. Note,
however, that Informix has integrated the ROLAP calculation engine into the RDBMS,
effectively mitigating the above disadvantage.
Multidimensional Databases
MDDs deliver impressive query performance by pre-calculating or pre-
consolidating transactional data rather than calculating on-the-fly. (MDDs pre-calculate
and store every measure at every hierarchy summary level at load time and store them in
efficiently indexed cells for immediate retrieval.) However, to fully preconsolidate
incoming data, MDDs require an enormous amount of overhead both in processing time
and in storage. An input file of 200MB can easily expand to 5GB; obviously, a file this
size take many minutes to load and consolidate. As a result, MDDs do not scale, making
them a lackluster choice for the enterprise atomic-level data in the data warehouse.
However, MDDs are great candidates for the <50GB department data marts.
To manage large amounts of data, MDD servers aggregate data along hierarchies.
Not only do hierarchies provide a mechanism for aggregating data, they also provide a
technique for navigation. The ability to navigate data by zooming in and out of detail is
key. With MDDs, application design is essentially the definition of dimensions and
calculation rules, while the RDBMS requires that the database schema be a star or
snowflake. With MDDs, for example, it is common to see the structure of time separated
from the repletion of time. One dimension may be the structure of a year, month, quarter,
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
half-year, and year. A separate dimension might be different years: 1996, 1997, and so
on. Adding a new year to the MDD simply means adding a new member to the calendar
dimension. Adding a new year to a RDBMS usually requires that each month, quarter,
half-year and year also be added.
In General
Usually, a scaleable, parallel database is used for the large, atomic. organizationally-
structured data warehouse, and subsets or summarized data from the warehouse are
extracted and replicated to proprietary MDDs. Because MDD vendors have enabled drill-
through features, when a user reaches the limit of what is actually stored in the MDD and
seeks more detail data, he/she can drill through to the detail stored in the enterprise
database. However, the drill through functionality usually requires creating views for
every possible query.
As relational database vendors incorporate sophisticated analytical
multidimensional features into their core database technology, the resulting capacity for
higher performance salability and parallelism will enable more sophisticated analysis.
Proprietary database and nonitegrated relational OLAP query tool vendors will find it
difficult to compete with this integrated ROLAP solution.
Both storage methods have strengths and weaknesses -- the weaknesses, however,
are being rapidly addressed by the respective vendors. Currently, data warehouses are
predominantly built using RDBMSs. If you have a warehouse built on a relational
database and you want to perform OLAP analysis against it, ROLAP is a natural fit. This
isn’t to say that MDDs can’t be a part of your data warehouse solution. It’s just that
MDDs aren’t currently well-suited for large volumes of data (10-50GB is fine, but
anything over 50GB is stretching their capabilities). If your really want the functionality
benefits that come with MDD, consider subsetting the data into smaller MDD-based data
marts.
When deciding which technology to go for, consider:
1) Performance: How fast will the system appear to the end-user? MDD server vendors
believe this is a key point in their favor. MDD server databases typically contain indexes
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
that provide direct access to the data, making MDD servers quicker when trying to solve
a multidimensional business problem. However, MDDs have significant performance
differences due to the differing ability of data models to be held in memory, sparsely
handling, and use of data compression. And, the relational database vendors argue that
they have developed performance improvement techniques, such as IBM’s DB2 Starburst
optimizer and Red Brick’s Warehouse VPT STARindex capabilities. (Before you use
performance as an objective measure for selecting an OLAP server, remember that OLAP
systems are about effectiveness (how to make better decisions), not efficiency (how to
make faster decisions).)
2) Data volume and scalability: While MDD servers can handle up to 50GB of storage,
RDBMS servers can handle hundreds of gigabytes and terabytes. And, although MDD
servers can require up to 50% less disk space than relational databases to store the same
amount of data (because of relational indexes and overhead), relational databases have
more capacity. MDD advocates believe that you should perform multidimensional
modeling on summary, not detail, information, thus mitigating the need for large
databases.
in addition to performance, data volume, and scalabiltiy, you should consider which
architecture better supports systems management and data distribution, which vendors
have a better user interface and functionality, which architecture is easier to understand,
which architecture better handles aggregation and complex calculations, and your
perception of open versus proprietary architectures. Besides these issues, you must also
consider which architecture will be a more strategic technology. In fact, MDD servers
and RDBMS products can be used together -- one for fast reposes, the other for access to
large databases.
What if? IFA. You require write access for What if? analysisB. Your data is under 50 GBC. Your timetable to implement is 60-90 daysD. You don’t have a DBA or data modeler personnel
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
E. You’re developing a general-purpose application for inventory movement or assets management
THENConsider an MDD solution for your data mart (like Oracle Express, Arbor’s Essbase, and Pilot’s Lightship)
IFA. Your data is over 100 GBB. You have a "read-only" requirement
THENConsider an RDBMS for your data mart.
IFA. Your data is over 1TBB. Need data mining at a detail level
Consider an MPP hardware platform like IBM’s SP and DB2 RDBMS
If, you’ve decided to build a data mart using a MDD, you don’t need a data
modeler. Rather, you need an MDD data mart application builder who will design the
business model (identifying dimensions and defining business measures based on the
source systems identified.
Prior to building separate stove pipe data marts, understand that at some point you
will need to: 1) integrate and consolidate these data marts at the detail enterprise level; 2)
load the MDD data marts; and 3) drill through from the data marts to the detail. Note that
your data mart may outgrow the storage limitations an MDD, creating the need for an
RDMBS (in turn, requiring data modeling similar to constructing the detailed, atomic
enterprise-level RDBMS).
Q.5 what do you understand by the term statistical analysis? Discuss the most
important statistical techniques?
Data mining is a relatively new data analysis technique. It is very different from
query and reporting and multidimensional analysis in that is uses what is called a
discovery technique. That is, you do not ask a particular question of the data but rather
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
use specific algorithms that analyze the data and report what they have discovered.
Unlike query and reporting and multidimensional analysis where the user has to create
and execute queries based on hypotheses, data mining searches for answers to questions
that may have not been previously asked. This discovery could take the form of finding
significance in relationships between certain data elements, a clustering together of
specific data elements, or other patterns in the usage of specific sets of data elements.
After finding these patterns, the algorithms can infer rules. These rules can then be used
to generate a model that can predict a desired behavior, identify relationships among the
data, discover patterns, and group clusters of records with similar attributes.
Data mining is most typically used for statistical data analysis and knowledge discovery.
Statistical data analysis detects unusual patterns in data and applies statistical and
mathematical modeling techniques to explain the patterns. The models are then used to
forecast and predict. Types of statistical data analysis techniques include linear and
nonlinear analysis, regression analysis, multivariant analysis, and time series analysis.
Knowledge discovery extracts implicit, previously unknown information from the data.
This often results in uncovering unknown business facts.
Data mining is data driven (see Figure 4 on page 13). There is a high level of complexity
in stored data and data interrelations in the data warehouse that are difficult to discover
without data mining. Data mining offers new insights into the business that may not be
discovered with query and reporting or multidimensional analysis. Data mining can help
discover new insights about the business by giving us answers to questions we might
never have thought to ask.
Even within the scope of your data warehouse project, when mining data you want to
define a data scope, or possibly multiple data scopes. Because patterns are based on
various forms of statistical analysis, you must define a scope in which a statistically
significant pattern is likely to emerge. For example, buying patterns that show different
products being purchased together may differ greatly in different geographical locations.
To simply lump all of the data together may hide all of the patterns that exist in each
location. Of course, by imposing such a scope you are defining some, though not all, of
the business rules. It is therefore important that data scoping be done in concert with
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
someone knowledgeable in both the business and in statistical analysis so that artificial
patterns are not imposed and real patterns are not lost.
Data architecture modeling and advanced modeling techniques such as those suitable for
multimedia databases and statistical databases are beyond the scope
Q.6 what are the methods for determining the executive needs?
Implementing an Executive Information System (EIS)
An EIS is a tool that provides direct on-line access to relevant information about aspects
of a business that are of particular interest to the senior manager.
Contents of EIS
A general answer to the question of what data is appropriate for inclusion in an
Executive Information System is "whatever is interesting to executives." While this
advice is rather simplistic, it does reflect the variety of systems currently in use.
Executive Information Systems in government have been constructed to track data about
Ministerial correspondence, case management, worker productivity, finances, and human
resources to name only a few. Other sectors use EIS implementations to monitor
information about competitors in the news media and databases of public information in
addition to the traditional revenue, cost, volume, sales, market share and quality
applications.
Frequently, EIS implementations begin with just a few measures that are clearly
of interest to senior managers, and then expand in response to questions asked by those
managers as they use the system. Over time, the presentation of this information becomes
stale, and the information diverges from what is strategically important for the
organization. A "Critical Success Factors" approach is recommended by many
management theorists (Daniel, 1961, Crockett, 1992, Watson and Frolick, 1992).
Practitioners such as Vandenbosch (1993) found that:
While our efforts usually met with initial success, we often found that after six
months to a year, executives were almost as bored with the new information as they had
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
been with the old. A strategy we developed to rectify this problem required organizations
to create a report of the month. That is, in addition to the regular information provided for
management committee meetings, the CEO was charged with selecting a different
indicator to focus on each month (Vandenbosch, 1993, pp. 8-9).
While the above indicates that selection of data for inclusion in an EIS is difficult,
there are several guidelines that help to make that assessment. A practical set of
principles to guide the design of measures and indicators to be included in an EIS is
presented below (Kelly, 1992b). For a more detailed discussion of methods for selecting
measures that reflect organizational objectives, see the section "EIS and Organizational
Objectives."
EIS measures must be easy to understand and collect. Wherever possible, data
should be collected naturally as part of the process of work. An EIS should not add
substantially to the workload of managers or staff.
EIS measures must be based on a balanced view of the organization's objective.
Data in the system should reflect the objectives of the organization in the areas of
productivity, resource management, quality and customer service.
Performance indicators in an EIS must reflect everyone's contribution in a fair and
consistent manner. Indicators should be as independent as possible from variables outside
the control of managers.
EIS measures must encourage management and staff to share ownership of the
organization's objectives. Performance indicators must promote both team-work and
friendly competition. Measures will be meaningful for all staff; people must feel that
they, as individuals, can contribute to improving the performance of the organization.
EIS information must be available to everyone in the organization. The objective
is to provide everyone with useful information about the organization's performance.
Information that must remain confidential should not be part of the EIS or the
management system of the organization.
EIS measures must evolve to meet the changing needs of the organization.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Barriers to Effectiveness
There are many ways in which an EIS can fail. Dozens of high profile, high cost
EIS projects have been cancelled, implemented and rarely used, or implemented and used
with negative results. An EIS is a high risk project precisely because it is intended for use
by the most powerful people in an organization. Senior managers can easily misuse the
information in the system with strongly detrimental effects on the organization. Senior
managers can refuse to use a system if it does not respond to their immediate personal
needs or is too difficult to learn and use.
Unproductive Organizational Behaviour Norms
Issues of organizational behaviour and culture are perhaps the most deadly
barriers to effective Executive Information Systems. Because an EIS is typically
positioned at the top of an organization, it can create powerful learning experiences and
lead to drastic changes in organizational direction. However, there is also great potential
for misuse of the information. Green, Higgins and Irving (1988) found that performance
monitoring can promote bureaucratic and unproductive behaviour, can unduly focus
organizational attention to the point where other important aspects are ignored, and can
have a strongly negative impact on morale.
Technical Excellence
An interesting result from the Vandenbosch & Huff (1988) study was that the
technical excellence of an EIS has an inverse relationship with effectiveness. Systems
that are technical masterpieces tend to be inflexible, and thus discourage innovation,
experimentation and mental model development.
Flexibility is important because an EIS has such a powerful ability to direct
attention to specific issues in an organization. A technical masterpiece may accurately
direct management attention when the system is first implemented, but continue to direct
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
attention to issues that were important a year ago on its first anniversary. There is
substantial danger that the exploration of issues necessary for managerial learning will be
limited to those subjects that were important when the EIS was first developed. Managers
must understand that as the organization and its work changes, an EIS must continually
be updated to address the strategic issues of the day.
A number of explanations as to why technical masterpieces tend to be less
flexible are possible. Developers who create a masterpiece EIS may become attached to
the system and consciously or unconsciously dissuade managers from asking for changes.
Managers who are uncertain that the benefits outweigh the initial cost of a masterpiece
EIS may not want to spend more on system maintenance and improvements. The time
required to create a masterpiece EIS may mean that it is outdated before it is
implemented.
While usability and response time are important factors in determining whether
executives will use a system, cost and flexibility are paramount. A senior manager will be
more accepting of an inexpensive system that provides 20% of the needed information
within a month or two than with an expensive system that provides 80% of the needed
information after a year of development. The manager may also find that the inexpensive
system is easier to change and adapt to the evolving needs of the business. Changing a
large system would involve throwing away parts of a substantial investment. Changing
the inexpensive system means losing a few weeks of work. As a result, fast, cheap,
incremental approaches to developing an EIS increase the chance of success.
Methodology
Implementation of an effective EIS requires clear consensus on the objectives and
measures to be monitored in the system and a plan for obtaining the data on which those
measures are based. The sections below outline a methodology for achieving these two
results. As noted earlier, successful EIS implementations generally begin with a simple
prototype rather than a detailed planning process. For that reason, the proposed planning
methodologies are as simple and scope-limited as possible.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
EIS Project Team
The process of establishing organizational objectives and measures is intimately
linked with the task of locating relevant data in existing computer systems to support
those measures. Objectives must be specific and measurable, and data availability is
critical to measuring progress against objectives.
Since there is little use in defining measures for which data is not available, it is
recommended that an EIS project team including technical staff be established at the
outset. This cross-functional team can provide early warning if data is not available to
support objectives or if senior manager's expectations for the system are impractical.
A preliminary EIS project team might consist of as few as three people. An EIS
Project Leader organizes and directs the project. An Executive Sponsor promotes the
project in the organization, contributes senior management requirements on behalf of the
senior management team, and reviews project progress regularly. A Technical Leader
participates in requirements gathering, reviewing plans, and ensuring technical feasibility
of all proposals during EIS definition.
As the focus of the project becomes more technical, the EIS project team may be
complemented by additional technical staff who will be directly involved in extracting
data from legacy systems and constructing the EIS data repository and user interface.
Establishing Measures & EIS Requirements
Most organizations have a number of high-level objectives and direction
statements that help to shape organizational behaviour and priorities. In many cases,
however, these direction statements have not yet been linked to performance measures
and targets. As well, senior managers may have other critical information requirements
that would not be reflected in a simple analysis of existing direction statements.
Therefore it is essential that EIS requirements be derived directly from interaction with
the senior managers who will use the systems. It is also essential that practical measures
of progress towards organizational objectives be established during these interactions.
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143
SIKKIM MANIPAL UNIVERSITY
Measures and EIS requirements are best established through a three-stage process.
First, the EIS team solicits the input of the most senior executives in the organization in
order to establish a broad, top-down perspective on EIS requirements. Second, interviews
are conducted with the managers who will be most directly involved in the collection,
analysis, and monitoring of data in the system to assess bottom-up requirements. Third, a
summary of results and recommendations is presented to senior executives and
operational managers in a workshop where final decisions are made.
Interview Format
The focus of the interviews would be to establish all of the measures managers
require in the EIS. Questions would include the following:
What are the five most important pieces of information you need to do your job?
What expectations does the Board of Directors have for you?
What results do you think the general public expects you to accomplish?
On what basis would consumers and customers judge your effectiveness?
What expectations do other stakeholders impose on you?
What is it that you have to accomplish in your current position?
Senior Management Workshop
SANTOSH GOWDA.H Reg No.: 5210757283rd semester, Disha institute of management and technology Mobile No.: 9986840143