data warehousing - synthesis.ipi.ac.rusynthesis.ipi.ac.ru/synthesis/student/bigdata/lectures/di-dw 8...

28
Data Warehousing ETL

Upload: hoangcong

Post on 21-May-2018

226 views

Category:

Documents


1 download

TRANSCRIPT

Data Warehousing ETL

Outline

2

The ETL Process

General ETL issues

Building dimensions

Building fact tables

Extract

Transformations/cleansing

Load

IBM InfoSphere DataStage

3

ETL

4

When should we ETL ?

Periodically (e.g., every night, every week) or after significant events

Refresh policy set by administrator based on user needs and traffic

Possibly different policies for different sources

ETL is used to integrate heterogeneous systems

With different DBMS, operating system, hardware, communication protocols

ETL challenges

Getting the data from the source to target as fast as possible

Allow recovery from failure without restarting the whole process

5

6

7

8

9

10

11

12

13

14

15

Data Integration

16

Schema Integration

17

Schema conflicts

18

Schema Integration

19

Schema Integration

20

21

IBM InfoSphere DataStage

22

23

24

data extractions (reads), data flows, data combinations, data

transformations, data constraints, data aggregations, and data loads

(writes) 25

26

27

28