data compression for multi-dimentional data warehouses

18
Data Compression for Large Multidimensional Data Warehouses Dr. K.M. Azharul Hasan Associate Professor, Head of the Department, Department of CSE, KUET Presented by: Supervisor: Abdullah Al Mahmud, Roll : 0507006 Md. Mushfiqur Rahman, Roll : 0507029 1 This slide is prepared by Muhammad Mushfiqur Rahman & Abdullah Al Mahmud for the presentation of Thesis

Upload: mushfiqur-rahman

Post on 16-Jul-2015

97 views

Category:

Documents


0 download

TRANSCRIPT

Data Compression for Large Multidimensional Data Warehouses

Dr. K.M. Azharul Hasan

Associate Professor,

Head of the Department,

Department of CSE, KUET

Presented by: Supervisor: Abdullah Al Mahmud,

Roll : 0507006

Md. Mushfiqur Rahman,

Roll : 0507029

1

This slide is prepared by Muhammad Mushfiqur Rahman & Abdullah Al Mahmud for the presentation of Thesis

Presentation Layout

Objectives

Existing Compression Schemes

Traditional Extendible Array

Proposed Compression Scheme

EXCS

(Extendible Array Based Compression Scheme)

Comparative Analysis

Conclusion

2

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

Data compression technology reduces:

effective price of logical data storage capacity

improves query performance

Multidimensional array is widely used in large number of scientific research.

An efficient compression of multidimensional array can handle large multidimensional data sets of data warehouses

3

Objectives

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

Existing Compression Schemes (1/ 3)

Bitmap compression

Run Length Encoding

Header compression

Compressed Column Storage

Compressed Row Storage

4

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

Existing Compression Schemes (2/ 3)

5

(a) A sparse array. (b) The CRS scheme

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

Existing Compression Schemes (3/ 3)

Classical methods cannot support updates without completely readjusting runs .

Compressing sparse array

Do not support extendibility

6

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

Traditional Extendible Array

TEA supports dynamic extension of dimension size.

7

0 1

2 3

4

5

6 7 8

9

10

11

0 1 4 9

0

2

6

0

0 1 3 5

2

4

Address Table

History Table

0 History Counter= 0 1 2 3 4 5

Figure 1: TEA Construction And Access

Position <1,3>

H1[1]<H2[3]

Address of Cell=Address1[3]+1=10

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

Proposed Compression Scheme

Multidimensional arrays are important for sparse array operations

Extendibility of multidimensional arrays

A compression technique that can work on multidimensional extendible array

Our proposed compression scheme is EXCS (Extendible array based Compression Scheme)

8

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

Extendible array based Compression Scheme (EXCS) 1/3

We implemented the multidimensional extendible array in secondary memory

We have considered dimension =3 in our experimental approach

The sub-arrays are distinguished to store

them individually in the secondary memory

9

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

Extendible array based Compression Scheme (EXCS) 2/3

The sub-arrays are of n-1(=2) dimension

A large no. of sub-arrays are generated to be compressed

Sub-arrays are dynamically taken as input

Only the max no of sub-arrays is to be given

10

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

11

Extendible array based Compression Scheme (EXCS) 3/3

Each sub-array is compressed individually

The compression technique used is similar to CRS

The compressed elements are written in the secondary memory as RO, CO, VL of subarray_1, subarray_2, … … subarray_N

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

Performance Measurement Performance is measured by measuring two

key factors of the compression schemes:

Data Density

Length of Dimension/ Number of Data

compression ratio=

(compressed data/ original data)

space savings = 1 – compression ratio

we have considered space savings in percent

12

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

Comparative Analysis (1/4)

13

-40

-20

0

20

40

60

80

100

64 729 4096 15625 46656

Sp

ac

e s

avin

gs

Header

Bitmap

CRS

EACRS

Offset

No. of data

Figure: Comparison with fixed density = 20%

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

14

-40

-20

0

20

40

60

80

64 729 4096 15625 46656

Sp

ac

e s

avin

gs

Header

Bitmap

CRS

EACRS

Offset

No. of data

Figure: Comparison with fixed density = 25%

Comparative Analysis (2/4)

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

Comparative Analysis (3/4)

15

-60

-40

-20

0

20

40

60

80

100

10 20 30 40 50co

mp

res

sio

n r

ati

o

Header

Bitmap

CRS

EACRS

Offset

Density of data

Figure: Comparison with fixed no. of data=64

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

Comparative Analysis (4/4)

16

-60

-40

-20

0

20

40

60

80

100

10 20 30 40 50co

mp

res

sio

n r

ati

o

Header

Bitmap

CRS

EACRS

Offset

Density of data

Figure: Comparison with fixed no. of data=4096

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

Performance Measurement

Extendibility of arrays

Using multidimensional arrays

Extendibility toward any dimension

EXCS allows dynamic extension of arrays.

In analysis, we can extend data up to n dimensions

Performance is good for large no. of data

17

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

Conclusion

Our proposed compression scheme is experimentally done up to 3 dimension data

It can be extended experimentally for compressing n dimension data in future.

EXCS is effective for large multidimensional data warehouses

18

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh