data warehouse

23
SOFT COPY OF THE SEMINAR TOPIC ON “DATA WAREHOUSE” SUBMITTED BY: IQxplorer

Upload: nayakslideshare

Post on 11-May-2015

21.128 views

Category:

Technology


0 download

DESCRIPTION

Collection of Data

TRANSCRIPT

Page 1: Data Warehouse

SOFT COPY OF THE SEMINAR TOPIC ON

“DATA WAREHOUSE”

SUBMITTED BY:

IQxplorer

Page 2: Data Warehouse
Page 3: Data Warehouse

What is Data What is Data Warehouse ?Warehouse ?A data warehouse is a repository of

information gathered from multiple sources stored under a unified schema,at a single site.

The data warehouse is a relational data base organised to hold information in a structure that best supports reporting and analysis.

Page 4: Data Warehouse

Characteristics of Data Characteristics of Data Warehouse :Warehouse :

The concept of a Data Warehouse given by Bill Inmon , the father of Data Warehouse is depicted in the figure below:

Subject Orientation.

Time variance.

Non-Volatile.

Integrated.

Page 5: Data Warehouse
Page 6: Data Warehouse

Architecture :Architecture :

A Data Warehouse Architecture (DWA) is a way of representing the overall structure of data, communication, processing and presentation that exists for end-user computing within the enterprise.

The architecture of data warehouse is as follows:

Page 7: Data Warehouse
Page 8: Data Warehouse

Load Manager :Load Manager :

Data flows into the data warehouse through the “load manager”.The data is extracted from the operational databases & supplemented by data imported from external sources.

Page 9: Data Warehouse

Query manager :Query manager :

It provides an interface between the warehouse& its users.It performs task like directing the queries to appropriate tables, monitoring the effectiveness of the indexes & summary data & query scheduling.

Page 10: Data Warehouse

The load manager primarily performs an extract Transform load(ETL) operation :

Data extraction.

Data transformation.

Data loading.

Page 11: Data Warehouse

Components of data Components of data warehouse :warehouse :

The primary components of data warehouses are :

Data Sources

Data Transformation

Reporting

Metadata

Operations

Optional Components

Page 12: Data Warehouse
Page 13: Data Warehouse

Data Sources:

Data sources refers to any electronic repository of information where data is passed from these systems to the data warehouse either on a transaction-by transaction basis for real-time data warehouses or on a regular cycle.

Data Transformation:

The Data Transformation layer receives data from the data sources, cleans and standardizes it, and loads it into the data repository.

Data Warehouse:

The data warehouse is a relational database organized to hold information in a structure that best supports reporting and analysis.

Page 14: Data Warehouse

Reporting:

The data in the data warehouse must be available to all the users if the data warehouse is to be useful.

Metadata:

Metadata or "data about data", is used to inform users of the data warehouse about its status and the information held within the data warehouse.

Operations:

Data warehouse operations comprises of the processes of loading, manipulating and extracting data from the data warehouse. Operations also covers user management, security, capacity management and related functions.

Page 15: Data Warehouse

Optional Components:

In addition, the following components also exist in some data warehouses:

1. Dependent Data Marts: A dependent data mart is a physical database (either on the same hardware as the data warehouse or on a separate hardware platform) that receives all its information from the data warehouse

2. Logical Data Marts: A logical data mart is a filtered view of the main data warehouse but does not physically exist as a separate data copy.

3. Operational Data Store: An ODS is an integrated database of operational data. Its sources include legacy systems and it contains current or near term data

Page 16: Data Warehouse

Design of data Design of data warehouse :warehouse : The key consideration involved in the design of a data ware house are:

Time span.

Granularity.

Dimensionality.

Aggregations.

Partitioning.

Page 17: Data Warehouse

Methods of storing Methods of storing data in a data data in a data warehouse :warehouse : The general principle used in the majority

of data warehouse is that data is stored at its most elemental level for use in reporting and information analysis.

There are two primary approaches to organising the data in a data warehouse:

Dimensional approach : Here, information is stored as "facts" which are numeric or text data that capture specific data about a single transaction or event, and "dimensions" which contain reference information that allows each transaction or event to be classified in various ways.

Page 18: Data Warehouse

Database normalization: In this style, the data in the data warehouse is stored in third normal form.

The main advantage of this approach is that it is quite straightforward to add new information into the database, while the primary disadvantage of this approach is that it can be quite slow to produce information and reports.

Page 19: Data Warehouse

Advantages of Advantages of using data using data warehouse:warehouse: Enhances end-user access to a wide

variety of data.

Increases data consistency.

Increases productivity and decreases computing

costs.

Is able to combine data from different sources, in one place.

It provides an infrastructure that could support changes to data and replication of the changed data back into the operational systems.

Page 20: Data Warehouse

Concerns in using Concerns in using data warehouse:data warehouse:

Extracting, cleaning and loading data could be time consuming.

Problems with compatibility with systems already in place e.g. transaction processing system.

Providing training to end-users, who end up not using the data warehouse.

Security could develop into a serious issue, especially if the data warehouse is web accessible.

Page 21: Data Warehouse

Future Future Developments:Developments:

Data Warehousing is such a new field that it is difficult to estimate what new developments are likely to most affect it. Clearly, the development of parallel DB servers with improved query engines is likely to be one of the most important. Parallel servers will make it possible to access huge data bases in much less time. 

Page 22: Data Warehouse

ConclusionConclusion::Data Warehousing is not a new

phenomenon. All large organizations already have data warehouses, but they are just not managing them. Over the next few years, the growth of data warehousing is going to be enormous with new products and technologies coming out frequently. In order to get the most out of this period, it is going to be important that data warehouse planners and developers have a clear idea of what they are looking for and then choose strategies and methods that will provide them with performance today and flexibility for tomorrow.

Page 23: Data Warehouse