understanding jpeg mit-ceti xi’an ‘99 lecture 10 ben walter, lan chen, wei hu
TRANSCRIPT
Understanding JPEG
MIT-CETI Xi’an ‘99
Lecture 10
Ben Walter, Lan Chen, Wei Hu
What is JPEG?
• JPEG is a method for compressing image data so it takes less space to store or transmit across a network.
• JPEG is very efficient. A file that was 1Mb in size could be compressed to as little 25Kb (1:40)!
• JPEG achieves such good compression ratios because it is lossy - but the loss is not visually perceptible.
Overview
• Images contain different frequencies; low frequencies correspond the slowly varying colors, high frequencies correspond to fine detail.
• The low frequencies are much more important than the high frequencies; we can throw away some high frequencies to compress our data!
0 1 2 3 4 5 6 7-1
0
1
0 1 2 3 4 5 6 7-1
0
1
0 1 2 3 4 5 6 7-1
0
1
Overview
• Note that we aren’t talking about the frequencies of light, but of the light and dark areas in the image!
• We need a way to go from the color of pixels, which is essentially a number, to frequencies…
• This way is called the Discrete Cosine Transform (DCT).
A JPEG Encoder
Entropy Encoder
DCT
Quantizer
The Discrete Cosine Transform
0 2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
16
X Position of Pixel
Col
or o
f Pix
el
0 2 4 6 8 10 12 14 16-20
-10
0
10
20
30
40
Inte
nsity
Frequency
=
0 10 20-2
0
2
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 2 4 6 8 10 12 14 16-20
-10
0
10
20
30
40
Inte
nsity
Frequency
The Discrete Cosine Transform
The Discrete Cosine Transform
0 2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
16
X Position of Pixel
Col
or o
f Pix
el
0 2 4 6 8 10 12 14 16-1
-0.5
0
0.5
1
1.5
0 2 4 6 8 10 12 14 16-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0 2 4 6 8 10 12 14 16-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0 2 4 6 8 10 12 14 16-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
= x1
+ x2
+ … + x15 + x16
0 2 4 6 8 10 12 14 16-20
-10
0
10
20
30
40
Inte
nsity
Frequency
The Discrete Cosine Transform
0 10 206
8
10
0 10 200
10
20
0 10 200
10
20
0 10 200
10
20
0 10 200
10
20
0 10 200
10
20
0 10 200
10
20
0 10 200
10
20
0 10 200
10
20
0 10 200
10
20
0 10 200
10
20
0 10 200
10
20
0 10 200
10
20
0 10 200
10
20
0 10 200
10
20
0 10 200
10
20
The 2D DCT
• So far we’ve been talking about one-dimensional images, just one line of the picture… but an image has two dimensions.
• We can talk about frequencies in two dimensions, although it’s much harder to visualize.
Basis
• Remember we saw that every 16-pixel line can be written as the sum of 16 different waves?
• Those 16 waves formed a basis for the set of 16-pixel lines.
0 10 20-2
0
2
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
0 10 20-0.5
0
0.5
Basis
• When we are compressing a JPEG, we work in blocks of 8x8 pixels. That’s 64 numbers, so there are 64 different basis images.
• This means we can describe any 8x8 image as a combination (a sum) of those 64 images.
Basis
The 2D DCT
0
2
4
6
8
0
2
4
6
8
0
200
400
0
2
4
6
8 02
46
8
-500
0
500
1000
1500
The 2D DCT
Summary
• The Discrete Cosine Transform (DCT) allows us to determine what frequencies make up an image.
• Into this stage we have 8x8 numbers that are the values of each pixel.
• Out of this stage we have 8x8 numbers that represent how much of each frequency (or how much of each basis) is in the image.
A JPEG Encoder
Entropy Encoder
DCT
Quantizer
Quantization
• So we still have 64 numbers to work with - we haven’t reduced the size at all!
• The reason we wanted the numbers as frequencies was because some frequencies are more important than others.
• The low frequencies are the most important, the high frequencies are not very important (think back to building up the image).
Quantization
• Before quantization, each frequency can be between 0 and 255.
• To quantize, we divide frequencies by a number so that the range is reduced. For example, it becomes 0 to 31. For high frequencies we divide by a higher number.
Quantization
• Before we had, say: 134,113,145,117,32,11,17,5… 4.
• After quantization, we might have: 116, 55, 55, 30, 1, 0, 0, … 0.
Quantization
124
56
113
17
34
27
49
25
110
2119
5
7
15
710
97
1 3
Quantization
300 kB 75 kB
Original Medium Quality JPEG
Quantization
300 kB 35 kB
Original Low Quality JPEG
Summary
• The degree of quantization, dictates the amount of information “thrown away”.
• If you throw away more information, you will get better compression, but the picture will start to look bad.
• When you adjust the quality of a JPEG save from Photoshop, you are changing the quantization!
A JPEG Encoder
Entropy Encoder
DCT
Quantizer
Entropy Encoding
• Entropy encoding is another stage of compression, that relies on statistical properties of the data, e.g. most frequently occuring numbers, lots of the same number in a row.
• So the take the 64 numbers, do Run Length Encoding, then follow that with Huffman Coding! (Remember yesterday?)
Entropy Encoding
• These compression schemes now work very well, because quantization turns numbers like 132, 117, 78 into numbers more like 31, 31, 15.
• After quantization, the range of numbers is smaller, and there are often large runs of numbers - so it can be highly compressed!
• This is where all of the compression happens!
Summary
Entropy Encoder
DCT
Quantizer
Summary
• We break up the image into 8x8 blocks.
• We calculate the frequencies in each block, this allows us to identify the important and less important data.
• We throw away some less important data.
• We compress the resulting data.
• The result: ~ 1:40 compression!