data compression by, keerthi gundapaneni. introduction data compression is an very effective means...

13
Data Compression By, Keerthi Gundapaneni

Upload: belinda-lawson

Post on 31-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large

Data Compression

By, Keerthi Gundapaneni

Page 2: Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large

Introduction

• Data Compression is an very effective means to save storage space and network bandwidth.

• A large number of compression schemes currently in the market have been based on character encoding or on detection of repetitive string.

• Many of these schemes achieve data reduction rates to 2.3-2.5 bits per character for English text.

Page 3: Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large

• Database performance strongly depends a great deal on the amount of available memory.

• Important to try and use the available memory as efficiently as possible.

Introduction

Page 4: Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large

Current Schemes

• Text compression schemes based on letter frequency. (pioneered by Huffman)

• Schemes based on string matching.• Schemes based on fast implementation of

algorithms, parallel algorithms and VLSI implementations.

• Many database uses prefix and postfix-truncation to save space and increase the fan-out of nodes, e.g. starburst.

Page 5: Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large

Using various schemes

• Compression rates of dataset depends on the attribute type and value distribution.

• It is difficult to compress binary floating point numbers but relatively easy to compress English test by a factor of 2 or 3.

• Optimal performance can only be obtained by judicious decisions which attributes to compress and which compression method to use.

Page 6: Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large

Advantages of Compression

• Reduce disk space required.• Seek distance and Seek times are

reduced.• More data fits into each disk page, track

and cylinder allowing more intelligent clustering of related objects into physically near locations.

• Unused disk space can be used for shadowing to increase reliability

Page 7: Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large

Advantages of Compression

• Compressed data can be transferred faster to and from disk.

• Data compression increases disk bandwidth.

• Due to the information density there is a decrease in the load there for less I/O bottleneck.

• Faster transfer rates across the network.

Page 8: Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large

Advantages of Compression

• Retaining more data in compression from in the I/O buffer allows more records to remain in the buffer, thus increases the buffer hit rate and reducing the number of I/Os.

• The log recorders can become shorter.

Page 9: Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large

Types of compression

• For a given table of “parts” the attribute “color” is replaced by a small integer, save the encoding in a separate relation, and join the larger table with the relatively small encoding table for queries that require string-values output of the color attribute. Since such encoding tables are typically small e.g. a few kilobytes, efficient hash-based algorithms can be used for the join.

Page 10: Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large

Huffman code example

Symbol : A B C D E

Frequency: 24 12 10 8 8

Total 186 bit (with 3 bit per code word)

Page 11: Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large

Huffman code example

Page 12: Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large

Results

Symbol Frequency Code Code Length Total Length

A 24 0 1 24

B 12 100 3 36

C 10 101 3 30

D 8 110 3 24

E 8 111 3 24

Initial. 186 bit Final. 138 bit (3 bit code)

Page 13: Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large

References:

• Seeck, Roger (2008). Binary Essence. Retrieved April 17, 2008, from About BinaryEssence Web site: http://www.binaryessence.com/dct/en000081.htm

• Graefe, Author's first name initialG, & Shapiro, L (1991). ACM/IEEE-CS Symp. Data Compression and Database Performance.

1, 1-10.