data representation - mcgill...

87
Data Representation Real-world information Analog vs. Digital (binary) representation

Upload: nguyentram

Post on 19-Apr-2018

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Data Representation

● Real-world information

● Analog vs. Digital (binary) representation

Page 2: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

1 1 1 1 1 1 1 10 0 0 0 0 0 0 0

Digital Signal

Analog Signal

Digital Signal Degradation

Analog Signal Degradation

Analog vs. Digital data/signals

Page 3: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Analog vs. Digital data/signals

● Storage and processing units for digital (binary) data are

● Reliable● Cheap● Simple

● Beware: analog data has infinite precision and wide range; Digital data is a finite approximation

with a limited range (Real number vs. real/float/double/...)

Page 4: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

How to digitally (binary) represent/encode data?

● Numbers (12, -134, 12.34, ...)

● Characters/Strings ('a', '©', ' ش',' ('א

● Images

● Sound

...

Page 5: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Binary representation

● in computing and telecommunications● a bit is a basic unit of information storage● “binary digit”● the maximum amount of information that can be stored in only two distinct states

0 or 1

Page 6: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Bit sequences (bit) “string” (vs. char string)

0 1 bit bit 0111 4 bits nibble 01100001 8 bits byte0101010110110111 16 bits half-word ... 32 bits word ... 64 bits word

A word is a natural unit of data used by a particular processor design(8, 16, 24, 32, or 64 bits)

Page 7: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Binary representation ++

With 1 bit, can represent 2 distinct entities

With 2 bits, can represent 4 distinct entities...

With N bits, can represent 2N

distinct entities

Example: {Red, Green, Blue} encoded as {00, 01, 10}

Page 8: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Binary representation of numbers● Binary Coded Decimal (BCD) representation of Unsigned Integer/Real numbers

● Binary representation of Unsigned Integers

● Binary representation of Signed Integer Numbers

● Signed magnitude● One's complement● Two's complement

● Fixed-Point binary approximation/representation of Real numbers

● Floating Point binary approximation/representation of Real numbers

Page 9: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

n-bit string “xn-1

xn-1

...x1x

0 “ has/encodes value x

00

11

2n2n

1n1n 2x2x2x2xx

Range: 0 to +2n – 1 Example

0000 0000 0000 0000 0000 0000 0000 10112

= 0 + … + 1×23 + 0×22 +1×21 +1×20

= 0 + … + 8 + 0 + 2 + 1 = 1110

Binary representation/encoding of Unsigned Integers

x0

xn-1

Least Significant Bit (LSB)

Most Significant Bit (MSB)

Page 10: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Powers of 2 0 1 1 2 2 4 3 8 4 16 5 32 6 64 7 128 8 256 9 51210 1,02411 2,04812 4,09613 8,19214 16,38415 32,76816 65,536

...

32 4,294,967,296

Page 11: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Binary Coded Decimal (BCD)representation of unsigned int/realDecimal: 0 1 2 3 4 5 6 7 8 9BCD: 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001

Every Decimal digit gets represented by 4 bits (Binary digits)

Decimal: 127 : 1 2 7BCD: 000100100111 : 0001 0010 0111

Used mostly in

● Mainframes (financial applications)● Embedded microcontrollers/small processors (computation <<<)● Only digital logic (no processor), display (7-segment)

Word size often 10 or 12 decimal digits (e.g., in calculators)

Page 12: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Binary Coded Decimal (BCD)

Advantages:● Easy to print, display, ... thanks to per-digit conversion● Numbers such as decimal 0.2 have an infinite place-value representation in binary (0.001100110011...) but have a finite place-value in binary-coded decimal (0000.0010)● Scaling by factors of 10 by shifting (clever compiler ...)● 10-based rounding is easy

Disadvantages:● More complex to implement +,-,*,/ (+: 15-20% more circuitry)● Slower ● More storage space required (15: 1111 vs. 00010101)

Page 13: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

n-bit string “xn-1

xn-1

...x1x

0 “ has/encodes value x

00

11

2n2n

1n1n 2x2x2x2xx

Range: 0 to +2n – 1 Example

0000 0000 0000 0000 0000 0000 0000 10112

= 0 + … + 1×23 + 0×22 +1×21 +1×20

= 0 + … + 8 + 0 + 2 + 1 = 1110

Binary representation of Unsigned Integers

x0

xn-1

Least Significant Bit (LSB)

Most Significant Bit (MSB)

Page 14: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

(intermezzo) Binary addition

Page 15: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

The base, or radix of a number system defines the range of possible values that a digit may have: 0 – 9 for decimal; 0 –1 for binary. 0 – (N-1) for base N.The general form for determining the value of a number is given by:

Example:

541.2510 = 5 102 + 4 101 + 1 100 + 2 10-1 + 5 10-2

= (500)10 + (40)10 + (1)10 + (2/10)10 + (5/100)10

= (541.25)10

Weighted Position Code

More later: “fixed point” encoding of Rational number as Approximation of a Real number

Page 16: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Base 2,8,10,16 Number Systems

Page 17: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Simple base N operations ...

Multiply by powers of N by ...

Divide by powers of N by ...

Page 18: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Base conversion (binary “encoding”):remainder method

Example: Convert 23.37510 to base 2.

Start by converting the integer portion:

23 = 2*11 + 1

= 2*(2*5 + 1) + 1

= 2*(2*(2*2 + 1) + 1) + 1

= 2*(2*(2*(2*1 + 0) + 1) + 1) + 1

= 2*(2*(2*(2*(2*0 + 1) + 0) + 1) + 1) + 1

Page 19: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Base conversion: multiplication method

• Now, convert the fraction:

• Putting it all together, 23.37510 = 10111.0112

0.375 = 1/2*(0 + 0.75)

0.375 = 1/2*(0 + 1/2*(1 + 0.5))

0.375 = 1/2*(0 + 1/2*(1 + 1/2*(1 + 0)))

Page 20: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Non-terminating Base 2 Fraction

We can’t always convert a terminating base 10 fraction into an equivalent terminating base 2 fraction:

0.001100110011...

Page 21: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

● Signed magnitude

● One's complement

● Two's complement

● Biased (Excess)

Binary representation ofSigned Integer numbers

Page 22: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Signed Magnitude

• Also know as “sign and magnitude,”

the leftmost bit is the sign (0 = positive, 1 = negative) and

the remaining bits are the magnitude.

• Example:

+2510 = 000110012

-2510 = 100110012

• Two representations for zero:

+0 = 000000002, -0 = 100000002.

• Largest number is +127, smallest number is -12710, using

an 8-bit representation.

Page 23: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

One's complement• The leftmost bit is the sign (0 = positive, 1 = negative).

Negative of a number is obtained by complementing each bit from 0 to 1 or from 1 to 0. This goes both ways: converting positive numbers to negative numbers, and converting negative numbers to positive numbers.

• Example: +2510 = 000110012

-2510 = 111001102

• Two representations for zero: +0 = 000000002, -0 = 111111112.

• Largest number is +12710, smallest number is -12710, using an 8-bit representation.

Page 24: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

n-bit string 'xn-1

xn-1

...x1x

0 ' has value x

00

11

2n2n

1n1n 2x2x2x2xx

2's complement

000 0 0001 1 1010 2 2011 3 3100 4 -4101 5 -3110 6 -2111 7 -1

100 -4101 -3110 -2111 -1000 0001 1010 2011 3

Page 25: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

2's complement arithmetic

100 -4101 -3110 -2111 -1000 0001 1010 2011 3

101: -4+0+1 = -3

000

001

010

011100

101

110

111

-4

-3

-2

-1 0

1

2

3

00

11

2n2n

1n1n 2x2x2x2xx

+1

Page 26: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

● Given an n-bit number0

01

12n

2n1n

1n 2x2x2x2xx

Range: –2n – 1 to +2n – 1 – 1 Example

1111 1111 1111 1111 1111 1111 1111 11002

= –1×231 + 1×230 + … + 1×22 +0×21 +0×20

= –2,147,483,648 + 2,147,483,644 = –410

Using 32 bits –2,147,483,648 to +2,147,483,647

2's complement

Page 27: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

● Bit 31 is sign bit (easy to test for sign)● 1 for negative numbers● 0 for non-negative numbers

● 2n – 1 can not be represented (-2n – 1 can)● Non-negative numbers have the same unsigned and

2s-complement representation● One representation for zero:

+0 = 000000002, -0 = 000000002

● Some specific numbers● 0: 0000 0000 … 0000● –1: 1111 1111 … 1111● Most-negative: 1000 0000 … 0000● Most-positive: 0111 1111 … 1111● 8-bit: largest number is +12710, smallest number is -12810

2's complement

Page 28: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

● Complement and add 1● Complement means 1 → 0, 0 → 1

x1x

11111...111xx 2

Example: negate +2 +2 = 0000 0000 … 00102

–2 = 1111 1111 … 11012 + 1 = 1111 1111 … 11102

Negation for 2's complement

Page 29: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

● Fast comparison through non-negative number comparison (of exponents of numbers in Floating Point representation):

Negative number looks larger than positive● add bias B

to make >= 0

Biased (Excess B) representation

2's compl B=4 binary-4 100 -4+4 0 000-3 101 -3+4 1 001-2 110 -2+4 2 010-1 111 -1+4 3 011 0 000 0+4 4 100 1 001 1+4 5 101 2 010 2+4 6 110 3 011 3+4 7 111

Excess B representation: represents -B as “0N“ and -B + 2N-1 as “1N“ maps 0N to -B, and 1N to -B + 2N-1To cover full range, -B + 2N-1 should be = B-1

⇒ B should be 2N-1

(e.g, N=3 B=4; N=8 B=128)

Can we use unsigned arithmetic (e.g., -2+1)?

Page 30: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

● add bias B

to make > 0● 3 bit: bias 3

8 bit: bias 12711 bit: bias 1023

Biased (Excess B) representationIn normalized Floating Point exponent “0N“ and “1N“ are reserved

B=3 binary-3 -3+3 0 000-2 -2+3 1 001-1 -1+3 2 010 0 0+3 3 011 1 1+3 4 100 2 2+3 5 101 3 3+3 6 110 4 4+3 7 111

FP exponent bias: B is 2N-1-1 (e.g, N=3 B=3)

Page 31: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

3-bit representations

Page 32: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Fixed-Point Representation of Real Numbersfixed-point real number is an integer that is scaled by a certain factor

E.g., 1.23 = 123/100, scaling factor 100 The scaling factor is the same for all numbers of a certain fixed-point type.Floating-point types on the other hand store the scaling factor as part of individual numbers.

Upper bound of a fixed-point type = upper bound of underlying integer type / scaling factor

Lower bound of a fixed-point type = lower bound of underlying integer type / scaling factor

E.g., binary fixed-point type in two's complement format, with f fractional bits and a total of b bits iii.fffff

Smallest representable number: − (2b − 1) / 2f

Largest representable number: (2b − 1 − 1) / 2f

Page 33: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Calculating with Real Numbers in Fixed-Point representation

To add or subtract two fixed-point numbers (of the same fixed-point type):● add or subtract the underlying integers

To multiply or divide two fixed-point numbers:● multiply or divide the underlying integers● need to re-scale the result

for multiplication: result needs to be divided by the scaling factor

for division: result needs to be multiplied by the scaling factor

Example: multiply two fixed-point numbers afp and bfp, stored as fixed-point numbers ai and bi with scaling factor S afp · bfp = ai/S · bi/S = (ai · bi) / S2

If we construct ai · bi, its fixed-point value is (ai · bi) / Sso we need to divide this by S to get the correct value.

·

Page 34: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Real Numbers in Fixed Point representation

Advantages:

● Needed if no Floating Point Unit (FPU) available● Less hardware needed (embedded)● Less power consumed (embedded)● More control over error/rounding (cfr. BCD) than floating point● All representable numbers “equidistant” (what's the distance?)● Smallest/largest number that can be represented?

Page 35: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

• Floating point numbers allow very large and very small numbers to be represented using only a few digits, at the expense of precision. The precision is primarily determined by the number of digits in the fraction (or significand, which has integer and fractional parts), and the range is primarily determined by the number of digits in the exponent.

• Example (+6.023 x 1023):

Base 10 Floating Point Numbers

Page 36: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Normalization• The base 10 number 254 can be represented in floating point form as

254 x 100, or equivalently as:

25.4 x 101, or 2.54 x 102, or

.254 x 103, or .0254 x 104, or ...

infinitely many other ways, which creates problems when making comparisons, with many representations of the same number.

Hence, choose a canonical representation, the unique representative of the set of mathematically equivalent numbers.

• Floating point numbers are usually normalized, in which the radix point is located in only one possible position for a given number.

• Typically, the normalized representation places the radix point● (A) immediately to the left of the leftmost, nonzero digit in the

fraction, as in .254 x 103 , or● (B) immediately to the right of the leftmost, nonzero digit in the

fraction, as in 2.54 x 103

Page 37: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Floating Point Example• Represent .254 x103 in

16 bit Floating Point, in a normalized, base 8 floating point format , with a sign bit, followed by a 3-bit 2's complement exponent, followed by four base 8 digits (possible values: 0-7). The full Floating Point number must be represented as a 16 bit binary string.

• Step #1: Convert to the target base.

.254 x 103 = 25410.

Using the remainder method, we find that 25410 = 3768

254/8 = 31 R 6

31/8 = 3 R 7

3/8 = 0 R 3

• Step #2: Normalize (choice (A)): 3768 = .376 x 83

• Step #3: Fill in the bit fields, with a positive sign (sign bit = 0),

an exponent of 3 (in 2's complement), and 4-digit fraction = .37608:

0 011 . 011 111 110 000

Page 38: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Representations

• binary representation• binary representation

0011011111110000

• octal representation

000 011 011 111 110 000

0 3 3 7 6 0

• hexadecimal representation

0011 0111 1111 0000

3 7 F 0

Page 39: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Range, Precision/Error (~ gap size)• In the previous example, we have the base b = 8, the number of

significant digits (not bits!) in the fraction s = 4, the largest exponent value (not bit pattern) M = 3, and the smallest exponent value m = -4.

• In the previous example, there is no explicit representation of 0, but there needs to be a special bit pattern reserved for 0 otherwise there would be no way to represent 0 without violating the normalization rule (choices (A) and (B)). We will assume a bit pattern of 0 000 000 000 000 000 represents 0.Note that 0 110 000 000 000 000 etc. are not normalized numbers!

Hence, not all bit combinations are used !!!!!!

• Using b, s, M, and m, we would like to characterize this floating point representation in terms of the largest positive representable number, the range of representable numbers, the smallest (nonzero) positive representable number, the smallest gap between two successive numbers, the largest gap between two successive numbers, and the total number of numbers that can be represented (not all bit combinations are used!).

Page 40: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Range, Precision/Error (~gap size)using choice (A)

• Largest representable number: bM x (1 - b-s) = 83 x (1 – 8-4)

= 511.875

• Range = [-bM x (1 – b-s), bM x (1 – b-s)]

• Smallest representable number: bm x b-1 = 8-4 - 1 = 8-5

= 0.000030517578125

• Largest gap: bM x b-s = 83 - 4 = 8-1

= 0.125

• Smallest gap: bm x b-s = 8-4 - 4= 8-8

= 0.0000000596046447754

Page 41: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

• Number of representable numbers: There are 5 components: (A) sign bit; for each number except 0 for this case, there is both a positive and negative version; (B) M - (m - 1) exponents; (C) b - 1 values for the first digit (0 is disallowed for the first normalized digit); (D) bs-1 values for the s-1 remaining digits, plus (E) a special representation for 0.

For this example: 2 x ((3 - (-4 - 1) x (8 - 1) x 84-1 + 1 = 57345 numbers that can be represented. Notice how this number is smaller than the number of possible bit patterns that are possible in 16 bits, which is 216

(=65536).

Not all bit combinations are used

Page 42: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Example Floating Point Format

• Number of bits = 1 + 2 + 3 = 6. Number of possible bit patterns = 64

• Largest non-zero positive number = bM x (1 – b-s) = 7/4

• Range = [- bM x (1 – b-s), bM x (1 – b-s)] = [-7/4, 7/4]

• Smallest non-zero positive number = bm x b-1 = 1/8

• Largest gap = bM x b-s = 1/4

• Smallest gap = bm x b-s = 1/32

• Number of representable numbers = 2x((M-m)+1)x(b-1)xbs-1+1 = 33● Note for later: fill the gap around 0: de-normalized

Page 43: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Gap Size Follows Exponent Size• The relative error ( remember: error ~ gap size)

is approximately the same for all numbers.

• If we take the ratio of a large gap to a large number, and compare that to the ratio of a small gap to a small number, then the ratios are the same:

Page 44: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Conversion Example• Example: Convert (9.375 x 10-2)10 to 8 bit, base 2, 4 bit excess 7 exponent

normalized scientific notation

(using choice (B): 1 non-0 digit before the radix point)

• Start by expanding the -2 exponent in base 10: .09375.

• Next, convert from base 10 fixed point to base 2:

.09375 x 2 = 0.1875

.1875 x 2 = 0.375

.375 x 2 = 0.75

.75 x 2 = 1.5

.5 x 2 = 1.0

• Thus, (.09375)10 = (.00011)2.

• Finally, convert to normalized base 2 Floating Point scientific notation:

.00011 = .00011 x 20 = 1.1 x 2-4

0 0011 100

(“engineering” notation: exp. multiple of 3)

Page 45: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

IEEE-754 Floating Point Formats

Notes: ● Single precision exponent: bias 127● Double precision exponent: bias 1023

http://dx.doi.org/10.1109/IEEESTD.2008.4610935

Page 46: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

IEEE-754 Conversion Example

• Represent -12.62510 in single precision IEEE-754 format.

• Step #1: Convert to target base. -12.62510 = -1100.1012

• Step #2: Normalize. -1100.1012 = -1.1001012 x 23

• Step #3: Fill in bit fields.

Sign is negative, so sign bit is 1. Exponent is in excess 127 (not excess 128!), so exponent is represented as the unsigned integer 3 + 127 = 130. Leading 1 of significand is hidden, so final bit pattern is:

1 1000 0010 . 1001 0100 0000 0000 0000 000

Page 47: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

IEEE-754 Floating Point Formats

(“18”)

(“18”)

(“111”)

(“111”)

Denormalized number (exponent 0) – aka “subnormal” numbers (with less precision, i.e., fewer digits): Single precision: fraction x 2-126 Double precision: fraction x 2-1022

Page 48: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

IEEE-754 Examples

Notes: ● (g): de-normalized● (g) vs. (i)

Page 49: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Beware of Limited Precision!

of elements of R (math) as single/double precision Floating Point (computer)

>>> A = 500.0>>> B = A + 1e-15>>> C = B - 500.0

>>> print(C)0.0

>>> A = 500.0>>> B = A – 500.0>>> C = B + 1e-15

>>> print(C)1e-15

Page 50: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Limited Precision (ctd.)

1/49 * 49 = 1/51 * 51 = 1 (in R)

>>> v1 = 1/49.0 * 49>>> v2 = 1/51.0 * 51>>> print('%.16f %.16f' % (v1, v2))

0.9999999999999999 1.0000000000000000

Base 10: 0.2Base 2: 0.001100110011...

Remember also that some numbers have an exact decimal representation, but no exact binary representation (i.e., need an infinite number of bits)

Page 51: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Comparing two FP numbersif number1 == number2: …

versus

if abs(number1 – number2) < epsilon: …

>>> from sys import float_info>>> float_infosys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

>>> epsilon = float_info[8]>>> epsilon2.220446049250313e-16

Page 52: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

http://docs.python.org/3.3/library/decimal.html

Beyond BCD:Decimal Fixed Point and Floating Point arithmetic,including Arbitrary Precision

Page 53: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

1994 Pentium FDIV bug in the Intel P5 Pentium floating point unit (FPU)

Page 54: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Effect of Loss of Precision

According to the General Accounting Office of the U.S. Government, a loss of precision in converting 24-bit integers into 24-bit floating point numbers was responsible for the failure of a Patriot anti-missile battery.

Page 55: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Representation of textAlphanumeric Data

3 standards for the representation of letters (alpha) and digits (numerical)

➢ ASCII –

American Standard Code for Information Interchange➢ EBCDIC –

Extended Binary-Coded Decimal Interchange Code – IBM mainframes

➢ Unicode

Page 56: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

ASCII = American Standard Code for Information Interchange.

• ASCII-code originally used only 7 bits to represent a character.

• 27 = 128 unique characters.

• 8th bit unused (parity bit for error checking)

• ASCII evolved to use all 8 bits.

Able to represent 256 unique characters.

Representation of textAlphanumeric Data

Page 57: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Most significant bit

Least significant bit

ASCII Character Code

Page 58: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

ASCII Character Code• ASCII is a 7-bit code,

commonly stored in 8-bit bytes.

• 'A' is at 4116. To convert upper case letters to lower case letters, add 2016. Thus “a” is at 4116 + 2016 = 6116.

• The character '5' at position 3516 is different from the number 5! To convert character-numbers into number-numbers, subtract 3016: 3516 - 3016 = 5.

Page 59: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

‘a’ = 11000012 = 1418= 6116 = 9710

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

ASCII Character Code

Page 60: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

33 Control codes

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

ASCII Character Code

Page 61: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

CR & LF

Page 62: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

95 display-able codes

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 63: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Alphabetic codes

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 64: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

============

Binary010010000110010101101100011011000110111100101100001000000111011101101111011100100110110001100100

Hexadecimal48656C6C6F2C207767726C64

Decimal72

101108108111

4432

119103114108100

============

============

Note: 12 characters – requires 12 bytesEach character requires 1 byte (8bits)

Hell

o,

worl

d

“Hello, world”

Page 65: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

“Hello, world”

Intermezzo: representing character strings

● C approach

● Java approach

Page 66: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Numeric codes

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 67: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

====

Binary00110100001010110011000100110101

Hexadecimal342B3135

Decimal52434953

4+15

====

====

“4+15”char

ASCII 00110100 00101011 00110001 001101012

34 2B 31 3516

Page 68: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Punctuation, etc.

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 69: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

• EBCDIC is an 8-bit code.

Page 70: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

def lower2Upper(character): distanceLower2Upper = ord("A") - ord("a") if ord("a") <= ord(character) <= ord("z"): upperOrd = ord(character) + distanceLower2Upper return(chr(upperOrd)) else: # does not catch the “hole” in EBCDIC encoding print("lower2Upper only accepts lower case alphabet letters") sys.exit(1)

print(ord(“a”)) # gives different results for ASCII and EBCDICprint(chr(100)) # gives different results for ASCII and EBCDIC

print(lower2Upper("c"))print(lower2Upper("C"))

Output:

97dClower2Upper only accepts lower case alphabet letters

# for all lower case characters c: invariant upper2Lower(lower2Upper(c)) == c

Page 71: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Extended (8bit) ASCII

Page 72: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Unicode Character CodeA character is the smallest possible component of a text (e.g., ‘A’, ‘B’, ‘È’ and ‘Í’)that has semantic value.

Even the extended (8 bit) version of ASCII is not enough for international use.

The Unicode standard (http://www.unicode.org/) describes how characters are represented by unique code points. A code point is an integer value, usually denoted in base 16. Values range from 0 through 0x10FFFF (1,114,111 decimal).

The notation U+12CA is used to denote the character with value 0x12ca (4,810 decimal).

The Unicode standard contains tables listing characters and their corresponding code points:

0061 'a'; LATIN SMALL LETTER A0062 'b'; LATIN SMALL LETTER B0063 'c'; LATIN SMALL LETTER C...007B '{'; LEFT CURLY BRACKET

Unicode was designed to be an ASCII-super set: the first 256 characters in the Unicode character set are identical to those in the extended ASCII code.

http://www.unicode.org/charts/

Page 73: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

U+211D

in Python(3):

>>> print("\N{DOUBLE-STRUCK CAPITAL R}")ℝ>>> print("\u211D")ℝ

>>> ord("\u211D")8477>>> chr(8477) 'ℝ'

Page 74: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

>>> ord('€')8364

>>> hex(ord('€'))'0x20ac'

>>> chr(8364)'€'

>>> import unicodedata>>> unicodedata.name('€')'EURO SIGN'

>>> unicodedata.lookup('EURO SIGN')'€'

>>> unicodedata.category('€') # http://www.fileformat.info/info/unicode/category/index.htm'Sc' # [S]ymbol [c]urrency

Unicode code points

Page 75: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital
Page 76: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital
Page 77: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

A Unicode code point represents a character

Characters are defined by their meaning in a language, Glyphs are defined by their appearance.

A text-to-speech reader should pronounce “a 339 Ω resistor” “a three hundred and thirty nine Ohm resistor” and not “a three hundred and thirty nine uppercase omega resistor”

The glyph Ω is represented by unicode character U+03A9 when it represents the Greek letter omega U+2126 when it represents Ohms, the unit of electrical resistance.

The glyph M is represented by unicode character U+004D when it represents a Latin letter U+216F when it represents the Roman numeral for 1,000.

Glyphs are handled by font renderers

Page 78: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

http://designrfix.com/fonts/arial-helvetica

anatomy of a glyph

Page 79: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

typeface vs. font Back in the good old days of analog printing, every page was laboriously set out in frames with metal letters. That was rolled in ink, and then it was pressed down onto a clean piece of paper. That was a page layout. Printers needed thousands of physical metal blocks, each with the character it was meant to represent set out in relief (the type face). If you wanted to print Garamond, for example, you needed different blocks for every different size (10 point, 12 point, 14 point, and so on) and weight (bold, light, medium).

A typeface (also known as font family) is a set of one or more fonts each composed of glyphs that share common design features. Each font of a typeface has a specific weight, style, condensation, width, slant, italicization, ornamentation, and designer or foundry (and formerly size, in metal fonts).

A font described a subset of blocks in a typeface—but each font embodied a particular size and weight. For example, bolded Garamond in 12 point was considered a different font than normal Garamond in 8 point, and italicized Times New Roman at 24 point would be considered a different font than italicized Times New Roman at 28 point.

http://whsdesignandphoto.weebly.com/typefaces.html

Page 80: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

A scalable font is a font that is created in the required point size when needed for display or printing. The dot patterns (bitmaps) are generated from a set of outline fonts, or base fonts, which contain a mathematical representation of the typeface. The two major scalable fonts are Adobe's Type 1 PostScript and Apple/Microsoft's TrueType.

A bitmapped font designed from scratch for a particular font size always looks the best.Scalable fonts however eliminate storing hundreds of different sizes of fonts on disk. In most cases, only the trained eye can tell the difference. Scaling does not always retain all properties.

http://www.pcmag.com/encyclopedia/term/50836/scalable-font

scalable vs. bitmap font

Page 81: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

character vs. glyph ligatures

A ligature glyph is the joining together of one or more glyphs into one continuous glyph.The ligature for aesthetically combining fi is one glyph, but two characters.

A ligature character (unicode standard):"The existing ligatures exist basically for compatibility and round-tripping with non-Unicode character sets.Their use is discouraged."

alif lām

Page 82: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital
Page 83: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

Unicode string encodings

A Unicode string is a sequence of code points (each representing a character).

This sequence needs to be represented as a set of bytes (unsigned integer values from 0 through 255) in memory. The rules for translating a Unicode string into a sequence of bytes are called an encoding.

Encodings don’t have to handle every possible Unicode character, and most encodings don’t.

ASCII encoding:

If a code point is < 128, each byte is the same as the value of the code point. If a code point is >= 128, the Unicode string can not be represented in this encoding.

Latin-1, also known as ISO-8859-1 encoding:

Unicode code points 0–255 are identical to the Latin-1 values, so converting to this encoding simply requires converting code points to byte values; if a code point larger than 255 is encountered, the string can’t be encoded into Latin-1.

>>> ord('a'.encode('ASCII'))97

>>> '€'.encode('ASCII')Traceback (most recent call last): File "<stdin>", line 1, in <module>UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 0: ordinal not in range(128)

>>> ord('a'.encode('Latin-1'))97

>>> '€'.encode('Latin-1')Traceback (most recent call last): File "<stdin>", line 1, in <module>UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 0: ordinal not in range(256)

Page 84: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

UTF-8 is one of the most commonly used encodings. UTF stands for “Unicode Transformation Format”, and the ‘8’ means that (one to four) 8-bit numbers are used in the encoding (i.e., a “variable length encoding”).

UTF-8 has several convenient properties:

● It can handle any Unicode code point.

● A Unicode string is turned into a string of bytes containing no embedded zero bytes. Hence,UTF-8 strings can be processed by C functions such as strcpy() and sent through (e.g., network) protocols that can’t handle zero bytes.

● A string of ASCII text is also valid UTF-8 text.

● UTF-8 is fairly compact: most commonly used characters can be represented with one or two bytes.

● If bytes are corrupted or lost, it’s possible to determine the start of the next UTF-8-encoded code point and resynchronize. It’s also unlikely that random 8-bit data will look like valid UTF-8.

Unicode string encodings

Page 85: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

>>> ord('a'.encode('UTF-8'))97

>>> '€'.encode('UTF-8')b'\xe2\x82\xac'

>>> '€'.encode('UTF-16')b'\xff\xfe\xac '

>>> '€'.encode('UTF-32')b'\xff\xfe\x00\x00\xac \x00\x00'

>>> b'\xE2\x82\xAC'.decode('UTF-8')'€'

>>> b'\xff\xfe\xac '.decode('UTF-16')'€'

>>> b'\xff\xfe\x00\x00\xac \x00\x00'.decode('UTF-32')'€'

Unicode string (en/de)coding

Page 86: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

In-browser UTF-8 test: http://www.fileformat.info/info/unicode/utf8test.htmUTF-8 format description: http://www.fileformat.info/info/unicode/utf8.htm

Page 87: Data Representation - McGill Universitymsdl.cs.mcgill.ca/.../lectures/presentation.dataRepresentation.pdf · Analog vs. Digital data/signals Storage and processing units for digital

The big picture