quiz on ch.2 convert 20102 3 to decimal. chapter 03 data representation
TRANSCRIPT
Data and Computers
Computers are multimedia devices,dealing with a vast array of information categoriesComputers store, present, and help us modify
• Numbers• Text• Audio• Images and graphics• Video
3
Data compression
Reduction in the amount of space (memory) needed to store the dataMeasured by the Compression ratio = The size of the compressed data divided by the size of the original data
Example: Two files are compressed with the ZIP utility:• One is originally 200 MB, and becomes 150 MB after compression• The other is originally 15 MB, and 11 MB after compression
Which file is better compressed?
4
Quiz
A video file is originally 3.5 GB long.We compress it to 490 MB. What is the compression ratio?
5
Data compression The Compression ratio is always between 0 and 1 (between 0% and 100%)
Compression techniques can beLossless → the data can be retrieved without any
loss of the original informationLossy → some information may be lost in the
process (but it doesn’t matter for the purposes of the intended application)
6
Analog vs. DigitalMany quantities of interest in the real-world are infinite and continuous• Zeno’s paradox: “That which is in locomotion must arrive at the half-way stage before it arrives at the goal. ” — Aristotle, Physics VI:9
But computers are finite and discrete! How do we represent an infinite and continuous quantity in a computer?
Answer: We approximate → represent only enough to satisfy our computational needs and our senses of sight and sound.
7
Information can be represented in one of two ways: analog or digital
Analog data A continuous representation, similar to the
actual information it represents
Digital data A discrete representation, breaking the
information up into separate elements
8
Analog and Digital InformationComputers cannot work well with analog data, so we digitize the data Digitizing = Breaking data into pieces and representing those pieces separately, by using a finite number of binary digits
Fine distinction: there are two operations performed:• one in time (a.k.a. sampling)• the other in amplitude (a.k.a. quantizing) Explain on thermometer, using a waveform example!
9
Quiz
A digital thermometer has a scale from 50 to 100 degrees (F). The temperature is represented on 7 bits. What is the smallest temperature difference that it can measure?
10
Analog and Digital Information
Why do we use binary to represent digitized data?• Price: transistors are cheap to produce
–Remember Babbage!• Reliability: transistors don’t get jammed
–Remember Babbage!
11
Electronic SignalsImportant facts about electronic signals• An analog signal continually fluctuates in voltage up
and down • A digital signal has only a high or low state,
corresponding to the two binary digits
12
Figure 3.2 An analog and a digital signal
• All electronic signals (both analog and digital) degrade due to absorption in transmission lines
• The amplitude (voltage) of electronic signals (both analog and digital) fluctuates due to environmental effects, a.k.a. noise
13
Figure 3.3 Degradation of analog and digital signals
The difference is that digital signals can be
regenerated!
Binary Representations
One bit can be either 0 or 1 •One bit can represent two things
Two bits can represent four things (Why?)
How many things can three bits represent?
How many things can four bits represent?
14
Binary Representations
How many things can n bits represent?
What happens every time you increase the number of bits by one?
16EoL
Binary Representations
How many things can n bits represent?
Reversing the problem: How many bits are needed to represent N things?• All desktops in this lab
17
3.2 Representing Numeric Data
Negative integers
Signed-magnitude representation The sign represents the ordering, and the digits represent the magnitude of the number
19
Negative Integers
There is a problem with the sign-magnitude representation: plus zero and minus zero.• More complex hardware is required!
Solution: Let’s not represent the sign explicitly!“Complement” representation
20
Ten’s complement
Using two decimal digits, represent 100 numbers• If unsigned, the range would be 0…?
let 1 through 49 represent 1 … 49 let 50 through 99 represent -50 … -1
21
Ten’s complement
22
Top: representations (the “label on the jar)
Bottom: the actual numbers that are being
represented (the “content of the jar)
QUIZ
Given the following representations, find in each case what actual number is being represented:• 51• 52• 96 • 47
23
Why the “complement” in ten’s complement?
100 – 50 = 50100 – 49 = 51…………………….. 100 – 1 = 99
In general:
100 – I = representation of – I
25
Let’s use ten’s complement!To perform addition, add the numbers and discard any carry
27
Now you try it
48 (signed-magnitude)- 147
How does it work inthe new scheme?
Solve in notebook for next time:
• End of chapter 27 through 31• All examples and exercises presented in the
slides so far (text up to p.61, incl.) – Make sure to ask next time if you have questions!
29
Two’s complementAddition and subtraction are the same as in unsigned:
-127 1000 0001
+ 1 0000 0001
-126 1000 0010
Ignore any Carry out of the MSB:
-1 1111 1111 +
-1 1111 1111
-2 1111 1110
32
Two’s complementFormula to compute the negative of a number on k digits:• for ten’s comp: Negative(I) → 10k - I• for two’s comp: Negative(I) → 2k - I
Practice: find the 8-bit two’s comp. representations of:
-2
-51+ 128 (trick question!)- 129 (trick question!)0
33
Warning: The example on p.62 has a typo:
28 is 256, not 128.
“Fast” two’s complementEasier way to change the sign of a number:Flip all bits, then add 1
Try it out! Find the negatives of the followingtwo’s complement numbers:
0000 00111000 00001000 00011000 00111001 01101111 1111
35
This is how subtraction is
implemented in computer hardware!
A – B = A + (-B)
What happens if the computed value won't fit in the given number of bits k?
Overflow
If k = 8 bits, adding 127 to 3 overflows 1111 1111
+ 0000 0011 0 1000 0010
… but adding -1 to 3 doesn’t!
Conclusion: overflow is specific to the representation (unsigned, sign-mag., two’s comp., floating point etc.)
37
Representing Real Numbers
Real numbers A number with a whole part and a fractional part
104.32, 0.999999, 357.0, and 3.14159
Positions to the right of the decimal point are the fractional part (tenths, hundredths, thousandths etc.): 10-1, 10-2 , 10-3 …
41
Binary fractions
Same rules apply in binary as in decimalDecimal point is actually the radix point Positions to the right of the radix point in binary are
2-1 (one half) = 0.52-2 (one quarter)= 0.252-3 (one eighth) = 0.125…
42
Converting fractions from binary to decimal
Easy! Just multiply with the powers of 2, as we did for unsigned binary. Only difference is that now the powers are negative.
Example: .10012 = 0. 10
43
Converting fractions from decimal to binary
Remember the repeated division algorithm?We apply it for the integer part of the number.
To covert the fractional part, we use the repeated multiplication algorithm!
Example: 0.43510 = 0. 2
45
QUIZ
Finite decimal fractions may have infinite binary representation!
0.310 = 0. 0100110011 2
47
Stop after 8 bits!
Representing Real Numbers
48
In general, we use “sign-mantissa-exponent”R = mantissa * 10exp
R = mantissa * 2exp
Depending on the form of the mantissa, we have:• Floating-point notation• Scientific notation
Floating point
49
This representation is called floating point because, although the number of digits is fixed (5 in the example above),the radix point floats (according to the exponent)
In the mantissa, the radix point is
always at the extreme right!
Floating point in binary?
50
It does exist - remember floating point numbers in Python!
It can be implemented either through software, or directly inthe hardware of the computer.
Until 1989, all desktops had an optional chip, separate fromthe main CPU, called FPU (a.k.a. math coprocessor).The Intel 80486, introduced that year, had the first integratedCPU + FPU.
Not in text!
80486 was also the first chip with over 1 million
transistors!
Floating point in binary?
51
Motorola launched their first integrated CPU+FPU, the68040 the next year, in 1990.
The detailed binary implementation of floating point isbeyond the scope of our class.
If you want to learn more:•Search for “IEEE 754” on the web•Take CS 343 or CS 344
Not in text!
Scientific notation
Particular form of floating point, in which the decimal point is kept to the right of the leftmost digit.
12001.32708 is represented 1.200132708 E+4
52
3.3 Representing Text
Basic idea:There are finite number of characters to represent, so list them all and assign each a (binary) number, a.k.a. code.
Character set A list of characters and the codes used to represent each one Computer manufacturers (eventually) agreed to standardize
– Read “Character Set Maze” on p.6755
The ASCII Character Set
ASCII = American Standard Code for Information Interchange
ASCII originally used seven bits to represent each character, allowing for 128 unique characters
Later extended ASCII evolved so that all eight bits were used
• How many characters can be represented?
56
8-bit ASCII Character Set
58
Extended ASCII is a superset of 7-bit ASCII: The first 128 characters correspond exactly to 7-bit ASCII
The ASCII Character Set
The first 32 characters in the ASCII character chart do not have a simple character representation to print to the screen.
They are called control characters
61
The Unicode Character Set
Extended ASCII is not enough for international use
Unicode uses 16 bits per character How many characters can UNICODE represent?
Unicode is a superset of ASCII: The first 256 characters correspond exactly to the extended ASCII character set
63
Miscellaneous Characters in Unicode
66
See more online at the official Unicode site
Text Compression
Sometimes, assigning 8 or 16 bits to each character in a document uses too much memory
We need ways to store and transmit text efficiently
Text compression techniques:– keyword encoding– run-length encoding– Huffman encoding
67
Keyword EncodingReplace frequently used words with a single character, for example here’s a substitution chart:
68
Keyword EncodingGiven the following paragraph:
We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness.
69
Keyword EncodingThe encoded paragraph is:
We hold # truths to be self-evident, $ all men are created equal, $ ~y are endowed by ~ir Creator with certain unalienable Rights, $ among # are Life, Liberty + ~ pursuit of Happiness. $ to secure # rights, Governments are instituted among Men, deriving ~ir just powers from ~ consent of ~ governed, $ whenever any Form of Government becomes destructive of # ends, it is ~ Right of ~ People to alter or to abolish it, + to institute new Government, laying its foundation on such principles + organizing its powers in such form, ^ to ~m shall seem most likely to effect ~ir Safety + Happiness.
70
Keyword EncodingHow much did we compress?
Original paragraph 656 characters
Encoded paragraph 596 characters
Characters saved60 characters
Compression ratio596/656 = 0.9085
Could we use this substitution chart for all text?
71
Keyword EncodingMore advanced idea:
Encode parts of words, like common prefixes and suffixes, e.g. “pre”, “ex”, “ing”, “tion”
72
Run-Length EncodingA single character may be repeated over and over again in a long sequence.Replace a repeated sequence with
– a flag character– repeated character– number of repetitions
*n8– * is the flag character– n is the repeated character– 8 is the number of times n is repeated
73
Run-Length EncodingEncoding example:Original text is
bbbbbbbbjjjkllqqqqqq+++++Encoded text is
*b8jjjkll*q6*+5 (Why isn't l encoded? J?)The compression ratio is 15/25 = .6
Q: This type of repetition doesn’t occur in English text; can you think of a situation where it is very likely to occur? 74
Run-Length EncodingMore advanced idea (not required for exam):
The number of repeated characters cannot be 0, 1, 2 or 3 (Why?)
We can shift the range 0 – 255 to represent run lengths between 4 - 259
Perform the encoding and decoding in the prev. examples with the convention above!
76
Huffman CodesConclusion: each language and each topic have specific frequencies of characters and groups of characters (digraphs, trigraphs etc.)
Why should the characters “X" or "z" take up the same number of bits as "e" or "t"?
Huffman codes use variable-length bit strings to represent each character. More frequently-used letters have shorter strings to represent them, and vice-versa!
78
Huffman encoding example
“ballboard” would be1010001001001010110001111011
compression ratio28/56
QUIZ: Encode “roadbed”
79
Huffman decoding
In Huffman encoding no character's bit string is the prefix of any other character's bit string. Codes with this property are called prefix codes.
To decodelook for match left to right, bit by bitrecord letter when the first match is foundcontinue where you left off, going left to right
80
3.4 Representing Audio Data
83
We perceive sound when a series of air waves cause
to vibrate a membrane in our ear (eardrum), which
sends signals to our brain.
Analog Audio
Record players and stereos send analog signals to speakers to produce sound.These signals are analog representations of the sound waves.The voltage in the signal varies in direct proportion to the amplitude of the sound wave.
84
From Analog to Digital Audio
Digitize the signal by sampling and quantizing – periodically measure the voltage – record the numeric value
How often should we sample?
A sampling rate of about 40,000 times per second is enough to create a reasonable sound reproduction
85
44,000 for audio CD, to be exact
Digital Audio on a CD
On the surface of the CD are microscopic pits and lands that represent binary digits
A low intensity laser is pointed as the disc. The
laser light reflects strongly if the surface is
smooth and poorly if the surface is pitted ???
88
90
Both halves of the laser beam reflect off pit or both halves off land.The two halves are “in phase”.
Half of the laser beam reflects off pit and half off land.The 2 halves are “out of phase”.
Audio Formats
Audio Formats– WAV, AU, AIFF, VQF, and MP3
MP3 (MPEG-2, audio layer 3 file) is dominant – analyzes the frequency spread and discards information
that can’t be heard by humans (>16 kHz) – bit stream is compressed using a form of Huffman
encoding to achieve additional compression
Is this a lossy or lossless compression?
91
3.5 Representing Images and Graphics
Color
Perception of the frequencies of light that reach the retinas of our eyes
Retinas have three types of color photoreceptor cone cells that correspond to the colors of red, green, and blue
92
Color is expressed as an RGB (red-green-blue) value = three numbers that indicate the relative contribution of each of these three primary colors
An RGB value of (255, 255, 0) maximizes the contribution of red and green, and minimizes the contribution of blue, which results in a bright yellow.
93
94
Look at the snow and the barn!
Source: Wikipedia – RGB color model
Representing Images and Graphics
Can you understand this HTML code?
<font color="#FF0000">
Blah blah …
</font>
95
RGB Color Chart in hex
Representing Images and Graphics
color depthThe amount of data that is used to represent a color HiColor A 16-bit color depth: five bits used for each number in an RGB value with the extra bit sometimes used to represent transparency TrueColorA 24-bit color depth: eight bits used for each number in an RGB value
98
Extra-credit question
TrueColorA 24-bit color depth: eight bits used for each number in an RGB value
How many different colors can be represented in TrueColor? Please show your work.
99
Indexed Color
A browser may support only a certain number of specific colors, creating a palette from which to choose
101
Figure 3.11 The Netscape color palette
EOL
Extra-credit question
How many bits are needed to represent this palette? Please show your work.
102
How to digitize a picture •Sample it → Represent it as a collection of individual dots called pixels•Quantize it → Represent each pixel as one of 224 possible colors (TrueColor)
Resolution = The # of pixels used to represent a picture
103
Digitized Images and Graphics
104
Figure 3.12 A digitized picture composed of many individual pixels
Wholepicture
Digitized Images and Graphics
105
Figure 3.12 A digitized picture composed of many individual pixels
Magnified portionof the picture
See the pixels?
Hands-on: paste the high-res image fromthe previous slide inPaint, then chooseZOOM = 800
Raster Graphics = Storage of data on a pixel-by-pixel basis
•Bitmap (BMP), GIF, JPEG, and PNG are the most popular raster-graphics formats
GIF formatEach image is made up of only 256 colors (indexed color)• But they can be a different 256 for each image!Supports animation! ExampleOptimal for line art
PNG format (“ping” = Portable Network Graphics) Like GIF but achieves greater compression with wider range of color depthNo animations
106
Bitmap formatContains the pixel color values of the image from left to right and from top to bottom• Great candidate for run-length compression!• Losless
JPEG formatAverages color hues over short distances• Lossy compressionOptimal for color photographs
107
Vector Graphics
A format that describes an image in terms of lines and geometric shapes
A vector graphic is a series of commands that describe a line’s direction, thickness, and color
The file sizes tend to be smaller because not every pixel is described
Example: Flash
108
Vector Graphics
The good side:Vector graphics can be resized mathematically and changes can be calculated dynamically as needed
The bad side:Vector graphics are not good for representing real-world images
109
3.6 Representing VideoThe problem: huge amount of data!
Example: In HDTV, the Frame size is defined as the number of horizontal pixels × number of vertical pixels:• 1280 × 720 • 1920 × 1080
Calculate: 1] Data rate (bits per second) for 25 fps 2] Size (bytes) of 2-hour movie
110
Video codec = COmpressor/DECompressor Methods used to shrink the size of a movie to allow it to be played on a computer or over a network
Almost all video codecs use lossy compression to minimize the huge amounts of data associated with video
111
Compressing Video
Temporal compression A technique based on differences between consecutive frames: If most of an image in two frames hasn’t changed, why should we waste space to duplicate all of the similar information?Spatial compression A technique based on removing redundant information within a frame: This problem is essentially the same as that faced when compressing still images.
112
Ethical Issues
MGM Studios, Inc. v. Grokster, Ltd.Describe the background for this
lawsuitWhat role did Napster play in this case?What was the decision in this case?Has this case affected you personally?
114
Do you know?
116
What is JamBayes?
How many computer character sets existedin 1960?
Who described the telegraph as a kind of very long cat?
Chapter Review Questions
• Distinguish between analog and digital information• Explain data compression and calculate compression
ratios• Explain the binary formats for negative and floating-
point values• Describe the characteristics of the ASCII and Unicode
character sets
• Perform various types of text compression
117
Chapter Review Questions
• Explain the nature of sound and its representation• Explain how RGB values define a color• Distinguish between raster and vector graphics• Explain temporal and spatial video compression
118