itec 1011 introduction to information technologies 2. data formats chapt. 3

54
ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

Upload: thomasine-poole

Post on 23-Dec-2015

234 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

2. Data Formats

Chapt. 3

Page 2: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Introduction

• Examples

pp. 59.-61

Real World

Data

Computer

DataInput device

Dear Mom: Keyboard 10110010…

Digitalcamera

10110010…

Page 3: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Format must be appropriate

• The internal representation must be appropriate for the type of processing to take place (e.g., text, images, sound)

Page 4: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Rules/Conventions

• Proprietary formats– Unique to a product or company

– E.g., Microsoft Word, Corel Word Perfect, IBM Lotus Notes

• Standards– Evolve two ways:

• Proprietary formats become de facto standards (e.g., Adobe PostScript, Apple Quick Time)

• Committee is struck to solve a problem (Motion Pictures Experts Group, MPEG)

pp. 61-62

Page 5: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Standards Organizations

• ISO – International Standards Organization

• CSA – Canadian Standards Association

• ANSI – American National Standards Institute

• IEEE – Institute for Electrical and Electronics Engineers

• Etc.

Page 6: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Examples of Standards

Type of Data Standards

Alphanumeric ASCII, EBCDIC, Unicode

Image JPEG, GIF, PCX, TIFF

Motion picture MPEG-2, Quick Time

Sound Sound Blaster, WAV, AU

Outline graphics/fonts PostScript, TrueType, PDF

Page 7: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Why Standards?

• Standard are “arbitrary”

• They exist because they are– Convenient– Efficient– Flexible– Appropriate– Etc.

Page 8: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Alphanumeric Data

• Problem: Distinguishing between the number 123 (one hundred and twenty-three) and the characters “123” (one, two, three)

• Four standards for representing letters (alpha) and numbers– BCD – Binary-coded decimal– ASCII – American standard code for information

interchange– EBCDIC – Extended binary-coded decimal interchange

code– Unicode

pp. 63-69

Page 9: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Next 2 slides

Standard Alphanumeric Formats

• BCD

• ASCII

• EBCDIC

• Unicode

Page 10: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Binary-Coded Decimal (BCD)

• Four bits per digit Digit Bit pattern

0 0000

1 0001

2 0010

3 0011

4 0100

5 0101

6 0110

7 0111

8 1000

9 1001

Note: the following bit patterns are not used:

101010111100110111101111

Page 11: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Example

• 709310 = ? (in BCD)

7 0 9 3

0111 0000 1001 0011

Page 12: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Next 22 slides

Standard Alphanumeric Formats

• BCD

• ASCII

• EBCDIC

• Unicode

Page 13: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

The Problem

• Representing text strings, such as “Hello, world”, in a computer

Page 14: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Codes and Characters

• Each character is coded as a byte

• Most common coding system is ASCII (Pronounced ass-key)

• ASCII = American National Standard Code for Information Interchange

• Defined in ANSI document X3.4-1977

Page 15: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

ASCII Features

• 7-bit code• 8th bit is unused (or used for a parity bit)• 27 = 128 codes• Two general types of codes:

– 95 are “Graphic” codes (displayable on a console)

– 33 are “Control” codes (control features of the console or communications channel)

Page 16: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

ASCII Chart

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 17: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 18: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Most significant bit

Least significant bit

Page 19: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

e.g., ‘a’ = 1100001

Page 20: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

95 Graphic codes

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 21: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

33 Control codes

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 22: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Alphabetic codes

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 23: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Numeric codes

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 24: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Punctuation, etc.

Page 25: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

“Hello, world” Example

============

Binary010010000110010101101100011011000110111100101100001000000111011101100111011100100110110001100100

Hexadecimal48656C6C6F2C207767726C64

Decimal72

1011081081114432

119103114108100

Hello, world

============

============

Page 26: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Common Control Codes

• CR 0D carriage return

• LF 0A line feed

• HT 09 horizontal tab

• DEL 7F delete

• NULL 00 null

Hexadecimal code

Page 27: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 28: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Terminology

• Learn the names of the special symbols– [ ] brackets– { } braces– ( ) parentheses– @ commercial ‘at’ sign– & ampersand– ~ tilde

Page 29: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL

Page 30: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Escape Sequences

• Extend the capability of the ASCII code set• For controlling terminals and formatting output• Defined by ANSI in documents X3.41-1974 and

X3.64-1977

• The escape code is ESC = 1B16

• An escape sequence begins with two codes:

ESC [

1B16 5B16

Page 31: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Examples

• Erase display: ESC [ 2 J

• Erase line: ESC [ K

Page 32: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Next 1 slides

Standard Alphanumeric Formats

• BCD

• ASCII

• EBCDIC

• Unicode

Page 33: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

EBCDIC

• Extended BCD Interchange Code (pronounced ebb’-se-dick)

• 8-bit code

• Developed by IBM

• Rarely used today

• IBM mainframes only

Page 34: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Next 2 slides

Standard Alphanumeric Formats

• BCD

• ASCII

• EBCDIC

• Unicode

Page 35: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Unicode

• 16-bit standard

• Developed by a consortia

• Intended to supercede older 7- and 8-bit codes

Page 36: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Unicode Version 2.1

• 1998

• Improves on version 2.0

• Includes the Euro sign (20AC16 = )

• From the standard:…contains 38,887 distinct coded characters derived from the supported scripts. These characters cover the principal written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica.

http://www.unicode.org

Page 37: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Keyboard Input

• Key (“scan”) codes are converted to ASCII

• ASCII code sent to host computer

• Received by the host as a “stream” of data

• Stored in buffer

• Processed

• Etc.

pp. 69

Page 38: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Shift Key

• inhibits bit 5 in the ASCII code

Key(s)

ASCII code

6 5 4 3 2 1 0 Character

1 1 0 0 0 0 1

1 0 0 0 0 0 1

a

A

a

aShift

Page 39: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Control Key

• inhibits bits 5 & 6 in the ASCII code

Key(s)

ASCII code

6 5 4 3 2 1 0 Character

1 1 0 0 0 1 1

0 0 0 0 0 1 1

c

ETX

c

cCtrl

Controlcode

Page 40: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Other Input

• OCR – optical character recognition

• Bar code readers

• Voice/audio input

• Punched cards

• Images / objects

• Pointing devices

pp. 69-86

Page 41: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

OCR

Hello, world

Page of text

Optical scan 10110110…

Computer file

Page 42: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Other Input

• OCR – optical character recognition

• Bar code readers

• Voice/audio input

• Punched cards

• Images / objects

• Pointing devices

pp. 69-86

Page 43: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Bar Codes

• An automatic identification (Auto ID) technology that streamlines identification and data collection

• See http://www.digital.net/barcoder/barcode.html

Page 44: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Other Input

• OCR – optical character recognition

• Bar code readers

• Voice/audio input

• Punched cards

• Images / objects

• Pointing devices

pp. 69-86

Page 45: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Voice/audio Input

• Input device: microphone

• Audio input is “digitized” and stored

• Processed in two ways– As is (no recognition)– Recognized and converted to alphanumeric data

(ASCII)

Digitize 10110010…

Page 46: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Other Input

• OCR – optical character recognition

• Bar code readers

• Voice/audio input

• Punched cards

• Images / objects

• Pointing devices

pp. 69-86

Page 47: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Punched Cards

• Invented by Herman Hollerith (founder of IBM)

• Each card holds 80 characters

Page 48: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Other Input

• OCR – optical character recognition

• Bar code readers

• Voice/audio input

• Punched cards

• Images / objects

• Pointing devices

pp. 69-86

Page 49: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Images

• Typically images are pictures that are optically scanned and saved as a “bit map” or in some other format

• Many formats– gif, jpeg, …

Page 50: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Typical “Save As” Dialog

Page 51: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Objects

• Images made of geometrically definable shapes

• Offer efficiency, flexibility, small size, etc.

Page 52: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Other Input

• OCR – optical character recognition

• Bar code readers

• Voice/audio input

• Punched cards

• Images / objects

• Pointing devices

pp. 69-86

Page 53: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Pointing Devices

• Originally used for specifying coordinates (x, y) for graphical input

• Today used as general purpose device for “graphical user interfaces” (GUIs)

Page 54: ITEC 1011 Introduction to Information Technologies 2. Data Formats Chapt. 3

ITEC 1011 Introduction to Information Technologies

Thank you