itnp23: foundations of information technology …to convert decimal to another base: • repeatedly...

35
ITNP023 - Autumn 2016 ITNP23: Foundations of Information Technology Data representation Prof Leslie Smith [email protected]

Upload: others

Post on 30-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

ITNP23: Foundations of Information Technology

Data representation

Prof Leslie Smith

[email protected]

Page 2: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

aims §  understand the basic mechanism for representing information

§  understand binary number system and binary arithmetic

§  understand the representation of numerical (integers and real numbers) and textual (characters and strings) data in memory

2.2

Page 3: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

machine data §  All data handled by modern computers are held in bits: building-blocks

that are either •  1 (“on”, “high”, “true”, “present”, “yes”, ...), or •  0 (“off”, “low”, “false”, “absent”, “no”, ...)

§  This is true of all data: numbers, text, pictures, sounds, programs …

§  The physical representation is usually electrical •  Maybe 0v for 0 and 3v for 1 (as in conventional RAM) •  but this is not essential: e.g. a CD ROM/DVD is optical

§  Binary representation seems initially unpromising (can we really store, say, music in on/off form?), however it has conquered the world: e.g. an audio CD and digital TV are binary.

2.3

Page 4: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

machine data: bytes §  We usually consider bits as groups: bytes and words

§  Earlier machines had a number of different character and word lengths: but now we invariably think of a byte as being eight bits and a word as two, four or eight bytes (16,32 or 64 bits) •  A half-byte (four bits) is sometimes called a nybble.

§  How numbers are stored: think of an odometer having only 0 and 1 on each wheel:

0 0000 1 0001 2 0010 3 0011 4 0100 5 0101

2.4

Page 5: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

understanding decimal numbers §  Think first about decimal (“base 10” numbers). Consider the decimal

number 543. We can view this as a sum:

3 = 3 x 1 + 40 = 4 x 10 + 500 = 5 x 100

§  But 1 = 100 (x0 is always 1) 10 = 101 100 = 102 etc

§  So we could write the sum as

3 = 3 x 100 + 40 = 4 x 101 + 500 = 5 x 102

2.5

Page 6: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

understanding binary numbers §  We can view numbers in any base in this way. Consider the binary

(“base 2”) number 1101 1 = 1 x 20 = 1 x 1 + 00 = 0 x 21 = 0 x 2 + 100 = 1 x 22 = 1 x 4 + 1000 = 1 x 23 = 1 x 8

So 1101 binary is (8 + 4 +1) decimal, i.e. 13 §  A useful trick is to write the positions above the number: consider

the binary number 10010100 §  76543210

10010100

So this binary number is 27 + 24 + 22 = 128 + 16 + 4 = 148 decimal 2.6

Page 7: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

powers of 2, for reference (and memorisation?)

n 2n

0 1 1 2 2 4 3 8 4 16 5 32

In computing, 1K is 210 = 1,024 (not 1,000)

and 1M is 220 = 1,048,576 (not 1,000,000) 1G is 230 ≅ 1,073 million (not 1,000 million)

n 2n

6 64 7 128 8 256 9 512 10 1024

2.7

Page 8: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

binary (continued) §  Binary numbers are represented as a series of 1’s and 0’s

§  Depending on the storage available, we can store 8-bit binary numbers, 4-bit, 16-bit, 32-bit etc

§  X-bit refers to the total number of 1’s and 0’s we can have and therefore the number of different patterns we can make, where a pattern might represent a number, a character, a coloured pixel on a screen.

§  Eg 8-bit : 8 1’s and 0’s §  11110000 §  10101010 §  01010111 §  etc

2.8

Page 9: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

binary (continued) §  Usually we think of numbers in their base 10 representation

•  45610 = (4 * 100’s) + (5 * 10’s) + (6 * 1’s)

§  In binary, we think of numbers in their base 2 representation •  1102 = (1 * 4) + (1 * 2) + (0 * 1)

§  In base 10, the maximum number we can have in any position (ie in the 100’s column, the 10’s column, the 1’s column etc) is 9. (One number less than the base)

§  In base 2, the maximum number we can have in any position (i.e. in the 1’s column, the 2’s column the 4’s column) is 1 (One number less than the base)

2.9

Page 10: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

the columns in base 10

In base 10, the columns under which we put the numbers are: … 1000 100 10 1 … 2 3 5 7 = 235710

i.e. … 103 102 101 100 (as we saw before)

… 2 3 5 7

2.10

Page 11: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

the columns in base 2

In base 2, the columns under which we put the numbers are:

… 8 4 2 1 1 1 0 1 = 11012

i.e. … 23 22 21 20

… 1 1 0 1

2.11

Page 12: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

binary 111001 = (1 * 25) + (1 * 24) + (1 * 23) + (0*22) + (0*21) + (1 * 20)

= 32 +16 + 8 + 0 + 0 + 1 = 57

Try..

000111 110101

2.12

Page 13: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

converting decimal to another base §  To convert decimal to another base:

•  Repeatedly divide the base into the number •  Then write down the remainders in reverse order

§  For example, convert 13 to binary: §  13 / 2 -> 6 remainder 1

6 / 2 -> 3 remainder 0 3 / 2 -> 1 remainder 1 1 / 2 -> 0 remainder 1

§  So 13 decimal is 1101 binary

2.13

Page 14: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

converting decimal to binary §  Steps

§  1. Take the decimal number and divide it by the highest 2x possible.

§  2. Place a 1 in this column and calculate the remainder

§  3. Repeat step 1 with the remainder until you are left with zero.

2.14

Page 15: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

Converting decimal to binary (the “proof”) Convert 1410 to an 8-bit binary number 128 64 32 16 8 4 2 1

1 (remainder 6) 1 (remainder 2) 1 (remainder 0)

0 0 0 0 1 1 1 0 = 000011102

2.15

Page 16: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

converting binary (base 2) to decimal (base 10) §  Steps:

§  1. Given a binary number, position the 1’s and 0’s under their corresponding column.

§  2. Add together all columns which have a 1 in them.

2.16

Page 17: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

converting binary to decimal (example) §  Convert 11010101 to its base 10 equivalent

§  1. 128 64 32 16 8 4 2 1 §  1 1 0 1 0 1 0 1

§  2. 128 + 64 + 16 + 4 + 1 = 213

§  3. 21310 (which is equivalent to 110101012)

2.17

Page 18: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

binary to decimal conversion Binary decimal binary decimal 0 0 10000 16 1 1 10001 17 10 2 10010 18 11 3 10011 19 100 4 10100 20 101 5 10101 21 …and so on

2.18

Page 19: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

adding numbers in binary §  Addition just follows the familiar rules

0010 1010 42 0001 1011 27 0100 0101 69

§  But what happens if the result is too large? E.g. suppose that we are holding numbers in one byte each, and we add (e.g.)

1000 0000 128 1000 0000 128 1 0000 0000 256

•  the result is too big to hold in one byte. •  This is called integer overflow (see later).

2.19

Page 20: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

adding 2 binary numbers

The rules: §  0 + 0 (+ 0 + 0 + ….. + 0) = 0 §  1 + 0 (+ 0 + 0 + ….. + 0) = 1 §  0 + 1 (+ 0 + 0 + ….. + 0) = 1 §  1 + 1 (+ 0 + 0 + ….. + 0) = 0 (carry 1) §  1 + 1 + 1 (+ 0 + …. + 0) = 1 (carry 1)

2.20

Page 21: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

adding 2 binary numbers : some examples

0000 0101 0001 0011 +0001 +0000 +0001 +0011 _____ _____ _____ ______

2.21

Page 22: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

integer overflow §  This is an attempt to represent an integer that exceeds the maximum

allowable value

§  Surprisingly, the machine does not treat integer overflow as an error, and just stores the truncated result.

§  But the machine sets the carry flag (a particular bit on one of the registers) so that the programmer can test for integer overflow if they wish.

2.22

Page 23: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

operations on bits and bytes §  We have seen how to do binary addition. (This is how the machine

instruction ADD operates.)

§  Multiplication can be done by repeated addition. Subtraction is done by adding the negative of the number to be subtracted. All of these can be implemented in terms of other, more basic instructions. We shall not look at the details of how this is done.

§  next lecture we shall briefly look at some other basic instructions that the CPU can perform.

§  …. but now we will look at character representation

2.23

Page 24: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

data representation §  We have seen how to store unsigned integers (cardinals) §  But what about

§  characters and text §  signed whole numbers (integers) §  numbers with fractional parts (reals like 1/2, 1/9, √2 or π) §  programs §  sounds and pictures ?

§  We shall not cover all these, just characters

§  Characters used to be straightforward (not quite so true now) §  The system assigns a different number (pattern) to each character

(and when it prints them, remembers to print the character not the number)

2.24

Page 25: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

storing characters: as it used to be §  How many characters are there?

§  A .. Z, a. .. z, 0 .. 9 gives us 62 §  ….and then there are some punctuation characters ..

§  So we need somewhere between 64 and 127 different codes

§  So we can hold them all in seven bits

§  There are a few main systems: The American Standard Code for Information Interchange (ASCII), Extended Binary Coded Decimal Interchange Code (EBCDIC) and UNICODE. We will look at ASCII ...

2.25

Page 26: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

collating order §  ASCII is a 7-bit code. Extended ASCII is an 8-bit code that includes

strange characters like the corners of boxes. This tends to be the version of ASCII in use these days.

§  ASCII is an international standard for representing textual information in the majority of computers.

§  ASCII has the following desirable properties: •  ‘A’ before ‘B’ before ... ‘Z’ •  ‘a’ before ‘b’ before ... ‘z’ •  ‘0’ before ‘1’ before ... ‘9’ (notice that the character ‘0’ is not the

same thing as the number zero!)

§  Example §  ‘A’ is 65, 0100 0001 binary ‘a’ is 97, 0110 0001 binary

2.26

Page 27: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

11 August 2004 IT21 Basic Computer Science 31

Extended ASCII 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 16 32 sp ! " # $ % & ' ( ) * + , - . / 48 0 1 2 3 4 5 6 7 8 9 : ; < = > ? 64 A B C D E F G H I J K L M N O 80 P Q R S T U V W X Y Z [ \ ] ^ _ 96 ` a b c d e f g h i j k l m n o112 p q r s t u v w x y z { | } ~ �128 Ç ü é â ä à å ç ê ë è ï î ì Ä Å144 É æ Æ ô ö ò û ù ÿ Ö Ü ø £ Ø × ƒ160 á í ó ú ñ Ñ ª º ¿ ® ¬ ½ ¼ ¡ « »176 _ _ _ ¦ ¦ Á Â À © ¦ ¦ + + ¢ ¥ +192 + - - + - + ã Ã + + - - ¦ - + ¤208 ð Ð Ê Ë È i Í Î Ï + + _ _ ¦ Ì _224 Ó ß Ô Ò õ Õ µ þ Þ Ú Û Ù ý Ý ¯ ´240 - ± _ ¾ ¶ § ÷ ¸ ° ¨ · ¹ ³ ² _

2.27

Page 28: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

characters …. concluded §  The ASCII characters in the range 0 .. 31 are “non-printing”: they

are “control characters”

§  If you hold down the Ctrl key and a letter-key together, you get a character whose code is the letter’s code minus 64

‘A’ is 65, ^A is 1 ‘Z’ is 90, ^Z is 26

§  These characters have names like CR, LF, TAB, EOF, EOT, BEL

CR is 13, ^M LF is 10, ^J TAB is 9, ^I BEL is 7, ^G

2.28

Page 29: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

strings of characters §  Text is just characters that are stored in adjacent locations

§  Suppose we have a 32-bit (4 bytes) value in memory, with binary representation

01000110 01110010 01100101 01100100

§  This could be a single integer value (a big one!), or it could be a series of characters

01000110 is 70, the code for ‘F’ 01110010 is 114, the code for ‘r’ 01100101 is 101, the code for ‘e’ 01100100 is 100, the code for ‘d’

§  String are usually either counted or null-terminated

2.29

Page 30: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

What’s wrong with this?

2.30

Page 31: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

What’s wrong with this? §  How about symbols in other European languages? §  Ä, á, à, â, æ, ã, å, ā for example §  Or symbols from other languages altogether? §  Πβγδ §  Or arabic or hebrew, or cyrillic… §  Or pictographic languages like Chinese?

§  The solution is to produce a character coding which can cope with this •  Which ASCII cannot!

•  Unicode.

2.31

Page 32: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

Unicode §  As we have seen, there are many more symbols than just in English.

Many European languages have additional marks (e.g. őçäéè etc.). And many languages use a different script (Greek, Russian, Arabic, Korean, Japanese, Hebrew). In addition, many languages use pictograms (e.g. Chinese)

§  Unicode is an international standard to be able to represent all of the characters used in the world’s major languages: •  see http://www.unicode.org

Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. (from http://www.unicode.org)

§  More or less universally adopted §  See http://www.unicode.org/consortium/members.html

§  A major task: look at Unicode version 8, for example. •  http://www.unicode.org/versions/Unicode8.0.0/

Page 33: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

Unicode continued §  It was originally a 16-bit code (so 216 symbols possible)

•  Is this enough? §  Now exists in three forms, UTF8, UTF16, UTF32

•  In UTF32, each code point is embedded in a 32 bit word. §  Unicode code space is 0 to 10FFFF16 §  Simplest, but not space efficient

•  In UTF 16, basic multilingual plane (codes from 0 to FFFF16) are represented in 16 bits §  Supplementary characters use two 16 bit elements

•  In UTF8 a variable width coding is used that maintains transparency with ASCII

§  eg •  05D016 20AC16

§  Java uses Unicode to represent characters; C does not. see http://www.unicode.org/

Page 34: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP023 - Autumn 2016

programs are data too §  Programs are simply another kind of data

§  Source code (which cannot be executed directly) is usually just a set of text files.

§  Object code (also called executables or binaries) is a sequence of machine code instructions.

§  Programs are represented as bits and stored in main memory just like other forms of data.

§  Just like other forms of data, programs can be processed or modified by other programs.

§  In fact, it is possible to write programs which modify themselves!

2.34

Page 35: ITNP23: Foundations of Information Technology …To convert decimal to another base: • Repeatedly divide the base into the number • Then write down the remainders in reverse order

ITNP21 - Autumn 2009

end of lecture

3.35