itec 1011 introduction to information technologies 2. data formats chapt. 3
TRANSCRIPT
ITEC 1011 Introduction to Information Technologies
2. Data Formats
Chapt. 3
ITEC 1011 Introduction to Information Technologies
Introduction
• Examples
pp. 59.-61
Real World
Data
Computer
DataInput device
Dear Mom: Keyboard 10110010…
Digitalcamera
10110010…
ITEC 1011 Introduction to Information Technologies
Format must be appropriate
• The internal representation must be appropriate for the type of processing to take place (e.g., text, images, sound)
ITEC 1011 Introduction to Information Technologies
Rules/Conventions
• Proprietary formats– Unique to a product or company
– E.g., Microsoft Word, Corel Word Perfect, IBM Lotus Notes
• Standards– Evolve two ways:
• Proprietary formats become de facto standards (e.g., Adobe PostScript, Apple Quick Time)
• Committee is struck to solve a problem (Motion Pictures Experts Group, MPEG)
pp. 61-62
ITEC 1011 Introduction to Information Technologies
Standards Organizations
• ISO – International Standards Organization
• CSA – Canadian Standards Association
• ANSI – American National Standards Institute
• IEEE – Institute for Electrical and Electronics Engineers
• Etc.
ITEC 1011 Introduction to Information Technologies
Examples of Standards
Type of Data Standards
Alphanumeric ASCII, EBCDIC, Unicode
Image JPEG, GIF, PCX, TIFF
Motion picture MPEG-2, Quick Time
Sound Sound Blaster, WAV, AU
Outline graphics/fonts PostScript, TrueType, PDF
ITEC 1011 Introduction to Information Technologies
Why Standards?
• Standard are “arbitrary”
• They exist because they are– Convenient– Efficient– Flexible– Appropriate– Etc.
ITEC 1011 Introduction to Information Technologies
Alphanumeric Data
• Problem: Distinguishing between the number 123 (one hundred and twenty-three) and the characters “123” (one, two, three)
• Four standards for representing letters (alpha) and numbers– BCD – Binary-coded decimal– ASCII – American standard code for information
interchange– EBCDIC – Extended binary-coded decimal interchange
code– Unicode
pp. 63-69
ITEC 1011 Introduction to Information Technologies
Next 2 slides
Standard Alphanumeric Formats
• BCD
• ASCII
• EBCDIC
• Unicode
ITEC 1011 Introduction to Information Technologies
Binary-Coded Decimal (BCD)
• Four bits per digit Digit Bit pattern
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
8 1000
9 1001
Note: the following bit patterns are not used:
101010111100110111101111
ITEC 1011 Introduction to Information Technologies
Example
• 709310 = ? (in BCD)
7 0 9 3
0111 0000 1001 0011
ITEC 1011 Introduction to Information Technologies
Next 22 slides
Standard Alphanumeric Formats
• BCD
• ASCII
• EBCDIC
• Unicode
ITEC 1011 Introduction to Information Technologies
The Problem
• Representing text strings, such as “Hello, world”, in a computer
ITEC 1011 Introduction to Information Technologies
Codes and Characters
• Each character is coded as a byte
• Most common coding system is ASCII (Pronounced ass-key)
• ASCII = American National Standard Code for Information Interchange
• Defined in ANSI document X3.4-1977
ITEC 1011 Introduction to Information Technologies
ASCII Features
• 7-bit code• 8th bit is unused (or used for a parity bit)• 27 = 128 codes• Two general types of codes:
– 95 are “Graphic” codes (displayable on a console)
– 33 are “Control” codes (control features of the console or communications channel)
ITEC 1011 Introduction to Information Technologies
ASCII Chart
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
ITEC 1011 Introduction to Information Technologies
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
ITEC 1011 Introduction to Information Technologies
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
Most significant bit
Least significant bit
ITEC 1011 Introduction to Information Technologies
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
e.g., ‘a’ = 1100001
ITEC 1011 Introduction to Information Technologies
95 Graphic codes
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
ITEC 1011 Introduction to Information Technologies
33 Control codes
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
ITEC 1011 Introduction to Information Technologies
Alphabetic codes
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
ITEC 1011 Introduction to Information Technologies
Numeric codes
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
ITEC 1011 Introduction to Information Technologies
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
Punctuation, etc.
ITEC 1011 Introduction to Information Technologies
“Hello, world” Example
============
Binary010010000110010101101100011011000110111100101100001000000111011101100111011100100110110001100100
Hexadecimal48656C6C6F2C207767726C64
Decimal72
1011081081114432
119103114108100
Hello, world
============
============
ITEC 1011 Introduction to Information Technologies
Common Control Codes
• CR 0D carriage return
• LF 0A line feed
• HT 09 horizontal tab
• DEL 7F delete
• NULL 00 null
Hexadecimal code
ITEC 1011 Introduction to Information Technologies
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
ITEC 1011 Introduction to Information Technologies
Terminology
• Learn the names of the special symbols– [ ] brackets– { } braces– ( ) parentheses– @ commercial ‘at’ sign– & ampersand– ~ tilde
ITEC 1011 Introduction to Information Technologies
000 001 010 011 100 101 110 1110000 NULL DLE 0 @ P ` p0001 SOH DC1 ! 1 A Q a q0010 STX DC2 " 2 B R b r0011 ETX DC3 # 3 C S c s0100 EDT DC4 $ 4 D T d t0101 ENQ NAK % 5 E U e u0110 ACK SYN & 6 F V f v0111 BEL ETB ' 7 G W g w1000 BS CAN ( 8 H X h x1001 HT EM ) 9 I Y i y1010 LF SUB * : J Z j z1011 VT ESC + ; K [ k {1100 FF FS , < L \ l |1101 CR GS - = M ] m }1110 SO RS . > N ^ n ~1111 SI US / ? O _ o DEL
ITEC 1011 Introduction to Information Technologies
Escape Sequences
• Extend the capability of the ASCII code set• For controlling terminals and formatting output• Defined by ANSI in documents X3.41-1974 and
X3.64-1977
• The escape code is ESC = 1B16
• An escape sequence begins with two codes:
ESC [
1B16 5B16
ITEC 1011 Introduction to Information Technologies
Examples
• Erase display: ESC [ 2 J
• Erase line: ESC [ K
ITEC 1011 Introduction to Information Technologies
Next 1 slides
Standard Alphanumeric Formats
• BCD
• ASCII
• EBCDIC
• Unicode
ITEC 1011 Introduction to Information Technologies
EBCDIC
• Extended BCD Interchange Code (pronounced ebb’-se-dick)
• 8-bit code
• Developed by IBM
• Rarely used today
• IBM mainframes only
ITEC 1011 Introduction to Information Technologies
Next 2 slides
Standard Alphanumeric Formats
• BCD
• ASCII
• EBCDIC
• Unicode
ITEC 1011 Introduction to Information Technologies
Unicode
• 16-bit standard
• Developed by a consortia
• Intended to supercede older 7- and 8-bit codes
ITEC 1011 Introduction to Information Technologies
Unicode Version 2.1
• 1998
• Improves on version 2.0
• Includes the Euro sign (20AC16 = )
• From the standard:…contains 38,887 distinct coded characters derived from the supported scripts. These characters cover the principal written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica.
http://www.unicode.org
ITEC 1011 Introduction to Information Technologies
Keyboard Input
• Key (“scan”) codes are converted to ASCII
• ASCII code sent to host computer
• Received by the host as a “stream” of data
• Stored in buffer
• Processed
• Etc.
pp. 69
ITEC 1011 Introduction to Information Technologies
Shift Key
• inhibits bit 5 in the ASCII code
Key(s)
ASCII code
6 5 4 3 2 1 0 Character
1 1 0 0 0 0 1
1 0 0 0 0 0 1
a
A
a
aShift
ITEC 1011 Introduction to Information Technologies
Control Key
• inhibits bits 5 & 6 in the ASCII code
Key(s)
ASCII code
6 5 4 3 2 1 0 Character
1 1 0 0 0 1 1
0 0 0 0 0 1 1
c
ETX
c
cCtrl
Controlcode
ITEC 1011 Introduction to Information Technologies
Other Input
• OCR – optical character recognition
• Bar code readers
• Voice/audio input
• Punched cards
• Images / objects
• Pointing devices
pp. 69-86
ITEC 1011 Introduction to Information Technologies
OCR
Hello, world
Page of text
Optical scan 10110110…
Computer file
ITEC 1011 Introduction to Information Technologies
Other Input
• OCR – optical character recognition
• Bar code readers
• Voice/audio input
• Punched cards
• Images / objects
• Pointing devices
pp. 69-86
ITEC 1011 Introduction to Information Technologies
Bar Codes
• An automatic identification (Auto ID) technology that streamlines identification and data collection
• See http://www.digital.net/barcoder/barcode.html
ITEC 1011 Introduction to Information Technologies
Other Input
• OCR – optical character recognition
• Bar code readers
• Voice/audio input
• Punched cards
• Images / objects
• Pointing devices
pp. 69-86
ITEC 1011 Introduction to Information Technologies
Voice/audio Input
• Input device: microphone
• Audio input is “digitized” and stored
• Processed in two ways– As is (no recognition)– Recognized and converted to alphanumeric data
(ASCII)
Digitize 10110010…
ITEC 1011 Introduction to Information Technologies
Other Input
• OCR – optical character recognition
• Bar code readers
• Voice/audio input
• Punched cards
• Images / objects
• Pointing devices
pp. 69-86
ITEC 1011 Introduction to Information Technologies
Punched Cards
• Invented by Herman Hollerith (founder of IBM)
• Each card holds 80 characters
ITEC 1011 Introduction to Information Technologies
Other Input
• OCR – optical character recognition
• Bar code readers
• Voice/audio input
• Punched cards
• Images / objects
• Pointing devices
pp. 69-86
ITEC 1011 Introduction to Information Technologies
Images
• Typically images are pictures that are optically scanned and saved as a “bit map” or in some other format
• Many formats– gif, jpeg, …
ITEC 1011 Introduction to Information Technologies
Typical “Save As” Dialog
ITEC 1011 Introduction to Information Technologies
Objects
• Images made of geometrically definable shapes
• Offer efficiency, flexibility, small size, etc.
ITEC 1011 Introduction to Information Technologies
Other Input
• OCR – optical character recognition
• Bar code readers
• Voice/audio input
• Punched cards
• Images / objects
• Pointing devices
pp. 69-86
ITEC 1011 Introduction to Information Technologies
Pointing Devices
• Originally used for specifying coordinates (x, y) for graphical input
• Today used as general purpose device for “graphical user interfaces” (GUIs)
ITEC 1011 Introduction to Information Technologies
Thank you