compression

30
Compression compression, Source: http://www.dspguide.com/datacomp.htm Original 10:1 Compression 45:1 Compression

Upload: lisandra-church

Post on 31-Dec-2015

67 views

Category:

Documents


3 download

DESCRIPTION

Original. 10:1 Compression. 45:1 Compression. Compression. JPG compression, Source: http://www.dspguide.com/datacomp.htm. Content. Introduction Techniques for compression Run-length Lempel-Ziv Huffman Mpeg-4 Conclusion. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Compression

Compression

JPG compression, Source: http://www.dspguide.com/datacomp.htm

Original 10:1 Compression 45:1 Compression

Page 2: Compression

Content

• Introduction

• Techniques for compression– Run-length– Lempel-Ziv– Huffman

• Mpeg-4

• Conclusion

Page 3: Compression

• In nature, science, and human affairs, where do we see compression and decompression?

• In nature, science, and human affairs, where do we see compression and decompression?

Page 4: Compression

Motivation for Compression

Compression is especially important in video, voice and fax applications where very large amounts of data is transmitted.Data compression can increase the throughput considerably.

ExampleIf there are 40,000 picture elements (pixels) per square inch.on a 8.5" x 11" page, there are 3,740,000 bits.Using a 56Kbps line, this transmission would take 67 seconds. If the data is compressed by a factor of 10, the transmission time is reduced to 6.7 seconds per page.

These days, data compression is commonly used by modems, fax machines, video conferencing equipment, your TIVO, etc.

Page 5: Compression

Realize cost savings in design of system:

Examples: • Modems, analog fax, compressed voice for cellular radio.• Digital voice• Compressed video, CD music, iPod

Without compression, these applications would not be feasible.

Practical applications of data compression

Device 2

Device 1

Bottleneck

Page 6: Compression

Principles behind CompressionTypes of techniques:

1. Redundancy reduction:

Remove redundancy from the message.• Usually lossless.

2. Reduce information content:

Reduce the total amount of information

in the message.

Leads to sacrifice of quality. • Usually lossy.

Page 7: Compression

Categories of compression

1. Data compressionUsed for data files and program files. Lossless. e.g., Winzip, gzip, compress.

2. Audio compression.Compresses digitized voice (e.g. cellular) and music. Lossy for voice, lossless for hi-fi music. e.g. Real Audio.

3. Image compressionRemoves redundancy within the frame. Different formats.BMP (bitmap file) is lossless but creates large files.GIF and JPEG lossy.

4. Video compression.Removes intra- and inter-frame redundancy. Lossy.Examples: MPEG, Quicktime, Real Video.

Page 8: Compression

Compressibility of different data patterns

0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0

0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0

0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0

0 1 0 1 1 1 0 0 0 1 1 0 0 1 0 1 In which set is the information content the highest?How will you store these patterns of information in the mosteconomical way?

0 - CLOUDY DAY1 - SUNNY DAY

SET 1:

SET 2:

SET 3:

SET 4:

SET 5:

Page 9: Compression

Compression Techniques

Common compression techniques

•“Seinfeld” method: yada, yada, yada...

•Run-length encoding

•Lempel-Ziv method

•Huffman coding

Marcy: Speaking of ex's, my old boyfriend came over late last night, and, yada yada yada, anyway. I'm really tired today.

Marcy: Speaking of ex's, my old boyfriend came over late last night, and, yada yada yada, anyway. I'm really tired today.

Page 10: Compression

Spot the difference…

That’s it. ImageCompressed 48 times

while you watched

Page 11: Compression

RUN-LENGTH ENCODING

Source: NY Times, June 18, 1998.

Page 12: Compression

RUN-LENGTH ENCODING

•Look for sequences of repeating characters•Replace a sequence of repeating characters with a

3-char code:•special character that indicates suppression•character to be suppressed•frequency (count of number of characters)

•Example: $******55.72 becomes $S*655.72 GunsbbbbbbbbbButter becomes GunsSb9Butter

What does the efficiency of this method depend on?

Page 13: Compression

Lempel-Ziv

Page 14: Compression

Lempel-Ziv Algorithm

This algorithm looks for repetitive sequences of patterns in a message and replaces them with a token which points back to the most recent occurrence.

The rain ain in spain ain falls mainly on the plain.

Token [a,b] means: go back a characters. copy b characters from there.

The rain ain [3,3]spain ain falls mainly on the plain.

Page 15: Compression

Lempel-Ziv Algorithm

This algorithm looks for repetitive sequences of patterns in a message and replaces them with a token which points back to the most recent occurrence.

The rain ain in spain ain falls mainly on the plain.

Token [a,b] means: go back a characters. copy b characters from there.

The rain ain [3,3]sp[9,4][9,4]falls mainly on the plain.

Page 16: Compression

Lempel-Ziv Algorithm

This algorithm looks for repetitive sequences of patterns in a message and replaces them with a token which points back to the most recent occurrence.

The rain ain in spain ain falls mainly on the plain.

Token [a,b] means: go back a characters. copy b characters from there.

The rain ain [3,3]sp[9,4][9,4]falls m[11,3]ly on the plain.

Page 17: Compression

Lempel-Ziv Algorithm

This algorithm looks for repetitive sequences of patterns in a message and replaces them with a token which points back to the most recent occurrence.

The rain ain in spain ain falls mainly on the plain.

Token [a,b] means: go back a characters. copy b characters from there.

The rain ain [3,3]sp[9,4][9,4]falls m[11,3]ly on [34,4]plain.

Page 18: Compression

Lempel-Ziv Algorithm

This algorithm looks for repetitive sequences of patterns in a message and replaces them with a token which points back to the most recent occurrence.

The rain ain in spain ain falls mainly on the plain.

Token [a,b] means: go back a characters. copy b characters from there. This message contains 27 characters and 5 tokens.Each token needs 2 bytes. Thus, space required is 37 bytes vs. original of 44 bytes.(Note: Since each token takes two bytes, this replacement is done only if the repeating pattern is more than two bytes long. )

The rain ain [3,3]sp[9,4][9,4]falls m[11,3]ly on [34,4]pl[15,3].

Page 19: Compression

Huffman coding

Consider a language with only 4 characters, T, E, L, K.

Here is a pattern in this language: T E E E L E E E K E

Probability of T = 0.1Probability of E = 0.7Probability of L = 0.1Probability of K = 0.1

If we use 2-bit codes for each character, say,00 - T; 01- E; 10- L; 11- K,then we need 20 bits to store this pattern.Question: Can we do better? i.e., store the pattern in fewer bits.

Page 20: Compression

HUFFMAN CODINGAlgorithm

Page 21: Compression

HUFFMAN CODING EXAMPLE

T

L

K

E

0.1

0.1

0.1

0.7

0.2

0.3

1.0

Codes:T: 000L: 001K: 01E: 1

Codes:T: 000L: 001K: 01E: 1

1. Treat each character or symbol as leaf node in a tree (ordered by probability and occurrence)

2. Merge two lowest probability nodes into a node whose probability is the sum of the two merged nodes.

3. Repeat this process until no unmerged nodes remain. The final node is the root of a tree.

4. Label each pair of branches starting from root with 0 and 1

5. The code word for a symbol is the string of labels from the root node to the original symbol.

1. Treat each character or symbol as leaf node in a tree (ordered by probability and occurrence)

2. Merge two lowest probability nodes into a node whose probability is the sum of the two merged nodes.

3. Repeat this process until no unmerged nodes remain. The final node is the root of a tree.

4. Label each pair of branches starting from root with 0 and 1

5. The code word for a symbol is the string of labels from the root node to the original symbol.

1

0

1

0

1 0

Page 22: Compression

Decoding a Message (start from left)Codes:T: 000L: 001K: 01E: 1

Codes:T: 000L: 001K: 01E: 1

0 1 1 1 1 0 1 0 0 1 0 0 0 1 1 1 0 1TEK KKE EE EL E

Page 23: Compression

SAVINGS FROM HUFFMAN CODING

Original string had 10 characters, each 2 bits long.Total length = 20 bits

Modified String:T once -----> 1 x 3 = 3 bitsK once -----> 1 x 3 = 3 bitsL once -----> 1 x 2 = 2 bitsE 7 times -----> 7 x 1 = 7 bitsTotal = 15 bits

Savings = (20-15) = 25 %

20

Page 24: Compression
Page 25: Compression

Applications and StandardsMNP Class 5 is a modem standard which uses run-length encoding.

V.42 bis is a newer modem standard for high-speed modemsThese modems use Lempel-Ziv compression method and can compress by a factor of 3.5 to 4 times. Video standards: H261, JPEG, MPEG-1 (for rates up to 1.5 Mbps), MPEG-2 (for rates up to 40 Mbps).

Audio compression standards: ADPCM, LPC (Linear Predictive Coding), MPEG Audio (e.g., MP3)

In general, compression ratio depends upon nature of data

Page 26: Compression

MPEG-4

•The “bane” of DVD?•A standard for transmitting video and sound•Meshes existing MPEG-2 inter- and intra-frame advancements with VRML•What about MPEG-7?

Page 27: Compression

MPEG-4

Page 28: Compression
Page 29: Compression
Page 30: Compression

Conclusion

Anything can be compressed more…

…but can the original form be recreated?

Big Bang: The ultimate decompression!

Image source: http://www.esa.int/esaKIDSen/SEMSZ5WJD1E_OurUniverse_0.html