data warehouse

Post on 12-Jul-2015

228 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DATA WAREHOUSE

Data Warehouse • Pool of data to support decision making.• Structured to be available in ready to use form• Subject Oriented • Integrated • Time-variant• Nonvolatile• Additional characteristics like

1.Web based2.Relational/multidimensional3.Client/Server4.Real time5.Include metadata

Types of Data warehouseDATA Mart

• Dependent

– Created from warehouse

– Replicated • Functional subset of warehouse

• Independent

– Scaled down, less expensive version of data warehouse

– Designed for a department or SBU

– Organization may have multiple data marts• Difficult to integrate

• Operational DATA Stores: Provides a fairly recent form of customer information file(CIF)

• Enterprise DATA Warehouses: Used across the enterprise for decision support

• METADATA: Describes the structure of and meaning about data, contributing to their effective use.

Data warehousing process overview

Major components

• Data sources

• Data extraction

• Data loading

• Comprehensive database

• Metadata

• Middleware tools

Data Warehousing Architectures • May have one or more tiers

– Determined by warehouse, data acquisition (back end), and client (front end)

• One tier, where all run on same platform, is rare

• Two tier usually combines DSS engine (client) with warehouse– More economical

• Three tier separates these functional parts

Architecture considerations

• Which DBMS to use?

• Parallel processing

• Partitioning

• Which data migration tools be used?

• What tools for data retrieval and analysis?

Alternative Architectures for data warehousing

Architecture Selection Factors

• Information interdependence• Senior management Info needs• Urgency for a DW• Nature of end user tasks• Constraints on resources• Strategic view• Compatibility with existing systems• Ability of in-house IT staff• Technical and Political factors

Enterprise Data Warehouse

Data Integration, Extraction And Load process

1.DATA INTEGRATION

Comprises three major processes

• Data Access: ability to access & extract data from any data source

• Data federation: Integration of business views across multiple data store

• Change capture: Based on the identification, capture, and delivery of the changes made to enterprise data source.

2.Extraction, Transformation And Load(ETL)

• Is an integral component in any data-centric project.

• ETL consists:

Extraction-From all relevant sources

Transformation-Converting extracted data in the form so it can place in data warehouse or another database

Load- Inserting the data in the data warehouse.

ETL Process

Transient Data

source DataWarehouse

DataMart

Packagedapplication

Legacysystem

Extract

Other Internal

applications

Transform Cleanse Load

Benefits of Data Warehouse

• Allows extensive analysis in numerous ways.

• A consolidated view of corporate data.

• Better and more timely information.

• Enhance system performance.

• Simplification of data access.

• Enhance business knowledge, enhance customer service and satisfaction, facilitate decision making.

Assignment

• Data warehousing vendors?

• Data warehousing case study found on the internet.

Data Warehouse development Approaches

The Inmon Model: The EDW Approach

• Emphasizes top-down development

• Employing established database development methodologies and tools

The Kimball Model: The Data Mart Approach

• Plan big, build small

• Subject oriented or department oriented

• Focus on the requests of a specific department.

Data Warehouse Structure(The Star Schema)

Star Schema

• Most important means of implementation of dimensional analysis

• Central fact table surrounded by dimension tables

• Grain – highest level of detail that is supported.

• Drill down – probing beyond a summarisedvalue

DW – Implementation Issues

• Establishment of service-level agreements and data-refresh requirements.

• Identification of data sources and their governance policies.

• Data quality planning & model designing.

• ETL tool selection.

• Relational database software and platform selection.

• Data transport and data conversion.

• Reconciliation process

• End-user support

Issues in implementation of data warehouse

• Starting with the wrong sponsorship chain.

• Setting expectation that you cannot meet and frustrating executives at the moment of truth.

• Engaging in politically native behavior.

• Loading the warehouse with information just because it is available.

• Believing that data warehousing database design is the same as transactional database design.

Continue……..

• Choosing a data warehouse manager who is technology oriented rather than user oriented

• Focusing on traditional internal record-oriented data and ignoring the value of external data of text, image, and perhaps, sound and video.

• Delivering data with overlapping and confusing definitions.

• Believing promise of performance, capacity and scalability.

• Believing that your problem are over when the data warehouse is up and running.

Risks in Data Warehouse Projects• No mission or objective

• Quality of source data

unknown

• Skills not in place

• Inadequate budget

• Lack of supporting software

• Source data not understood

• Weak sponsor

• Users not computer literate

• Geographically distributed

environment

• Unrealistic user expectations

• Architectural and design risks

• Scope creep and changing requirements

• Vendors out of control

• Multiple platforms

• Key people leaving project

• Loss of the sponsor

• Too much new technology

• Having to fix an operational system

• Team geography and language culture

Massive Data Warehouse And Scalability

• Data warehouse needs scalability.

• Good scalability means: queries and other data access functions grow ideally with the size of warehouse.

• Specialized methods have been developed to create scalable data warehouse.

• Scalability is difficult in managing hundreds of terabytes.

Issues pertaining to scalability

• The amount of data in warehouse.

• How quickly the warehouse is expected to grow.

• The number of concurrent users.

• The complexity of user queries.

Real-Time Data warehousing

• Also knows as active data warehousing.

• Process of loading & providing data via the data warehouse.

• Evolved from EDW (Enterprise Data Warehousing)

concept.

• Allows information-based decision making at finger tips.

• Positively affect almost all aspects of customer service, SCM, logistics.

Comparison between Traditional And Active Data Warehousing Environment

Traditional Data Warehouse Environment

• Strategic decisions only

• Result sometimes hard to measure

• Moderate user concurrency

• Highly restrictive reporting used to confirm or check existing processes and patterns.

• Power users, knowledge workers, internal users.

Active Data Warehouse Environment

• Strategic and tactical decision

• Result measured with operations

• High number of users accessing simultaneously

• Flexible ad hoc reporting, as well as machine-assisted modeling to discover new hypotheses.

• Operational staffs, call centers, external users.

Data Warehouse Administration

• Due to huge size, data warehouse requires strong monitoring.

• A data warehouse administrator(DWA) should posses following features-

1. Should be familiar with high performance software, hardware, and networking tech.

2. Should familiar with decision making process.

3. Significant to keep the existing requirement and capabilities of data warehouse.

4. Must posses excellent communication skills.

Data Warehouse Security issues

• Security and privacy of information is significant concern.

• Companies must create effective and flexible security procedures.

• Effective security in data warehouse focus on:1. Establishing effective corporate and security policies and

procedures.2. Implementing logical security procedures and techniques to

restrict access.3. Limiting physical access to the data center environment.4. Establishing an effective internal control review process with

an emphasis on security and privacy.

top related