coa unit 1 notes

1

UNIT 1

BASIC STRUCTURE OF COMPUTERS

INTRODUCTION

COMPUTER A Computer is a machine which accepts input information in the digitized form, processes the

input according to a set of stored instructions and produces the resulting output information.

PROGRAM AND DATA The set of stored instructions written using a computer to solve the task is called program and

input and output information is called data.

The internal storage where programs are stored is called Memory.

Characteristics of computer 1 Speed: Computers perform various operations at a very high speed.

2 Accuracy: Computers are very accurate. Do not make mistakes in calculations.

3 Reliability: Computers gives correct and consistent results always even if they are used in adverse

conditions. Many times errors are caused by human interventions not by computer. Computer output is

reliable, subject to the condition that the input data and the instructions (programs) are correct. Incorrect

input data and unreliable programs give us wrong results.

4 Storage Capacity: The computer can store large amount of data and can be retrieved at any time in

fractions of a second. This data can be stored in permanent storage devices like hard disk, CDs etc.

2

5 Versatility: Computers can do a variety of jobs based on the instructions given to them. They are

used in each and every field, making the tasks easier.

Limitations of a Computer:- 1) Not intelligent 2) Inactive

Computer = Hardware + Software

Hardware:

• Hardware is the physical aspect of computers, telecommunications, and other device.

• Hardware implies permanence and invariability

• The components include keyboard, floppy drive, hard disk, monitor, CPU, printer, wires,

transistors, circuits etc.

Software: It is a set of programs used to perform certain tasks. Program is set of instructions to carry out a particular task

Hardware and Software

Hardware Software

The physical components making up the

system are termed as Hardware.

Software is a set of programs used to

perform certain tasks(logical

component)

The components include keyboard,

floppy drive, hard disk, monitor, CPU,

Software’s include compliers,

loaders, Banking s/w, library s/w,

3

printer, wires, transistors, circuits etc. payroll s/w etc.

Hardware works based on instructions Software tell the hardware what to

do

TYPES OF COMPUTER

Computers are classified according to size, cost, power of the processor, and type of usage.

Some of types of computer are

Personal computers(PC)

Widely used in homes, schools, and business offices.

Notebook computers

It’s a compact version of the PC with all the components are packed together into a

single unit.

Workstations

It has a High resolution –Graphics input / output capabilities.

Desktop Computers

They have processing and storage units, visual display and audio output displays.

Enterprise systems or mainframes and Servers

Mainframes are used for business data processing in medium and large corporations that

require more computing power and storage capacity.

Servers:

Servers contain sizable database storage units and are capable of handling large volumes

of request to access the data.

4

The request and responses are transported over internet communication facilities.

Supercomputers

They are used for large – scale numerical calculations required in applications such as

weather forecasting and aircraft design and simulation.

I. FUNCTIONAL UNITS A computer consists of five functionally independent main parts. They are,

Input

Memory

Arithmetic and logic

Output

Control unit

Basic functional units of a computer

CU

ALU

5

Figure: The operation of a computer can be summarized as follows

The computer accepts programs and the data through an input and stores them in the memory.

The stored data are processed by the arithmetic and logic unit under program control.

The processed data is delivered through the output unit.

All above activities are directed by control unit.

The information is stored either in the computer’s memory for later use or

immediately used by ALU to perform the desired operations.

Instructions are explicit commands that

Manage the transfer of information within a computer as well as between the computer

and its I/O devices.

MEMORY UNIT

MAIN

O/P UNIT I/P UNIT

SECONDARY

6

Specify the arithmetic and logic operations to be performed.

To execute a program, the processor fetches the instructions one after another, and performs the

desired operations.

The processor accepts only the machine language program.

To get the machine language program, Complier is used.

Note: Compiler is software (Translator) which converts the High Level Language program (source

program) into Machine language program (object program)

Input unit

The computer accepts coded information through input unit. The input can be from human

operators, electromechanical devices such as keyboards or from other computer over

communication lines.

Examples of input devices are

Keyboard, joysticks, trackballs and mouse are used as graphic input devices in

conjunction with display.

Microphones can be used to capture audio input which is then sampled and converted

into digital code for storage and processing.

Keyboard

• It is a common input device.

• Whenever a key is pressed, the corresponding letter or digit is automatically translated into its

corresponding binary code and transmitted over cable to the memory of the computer.

Memory unit

Memory unit is used to store programs as well as data.

7

Memory is classified into primary and secondary storage.

Primary storage

It also called main memory.

It operates at high speed and it is expensive.

It is made up of large number of semiconductor storage cells, each capable of storing one bit of

information.

These cells are grouped together in a fixed size called word. This facilitates reading and writing

the content of one word (n bits) in single basic operation instead of reading and writing one bit

for each operation

Each word is associated with a distinct address that identifies word location. A given word is

accessed by specifying its address.

Word length

The number of bits in each word is called word length of the computer.

Typical word lengths range from 16 to 64bits.

Programs must reside in the primary memory during execution.

RAM

It stands for Random Access Memory. Memory in which any location can be reached in

a short and fixed amount of time by specifying its address is called random-access

memory.

Memory access time

• Time required to access one word is called Memory access time.

• This time is fixed and independent of the word being accessed.

8

• It typically ranges from few nano seconds (ns) to about 100ns.

Caches

They are small and fast RAM units.

They are tightly coupled with the processor.

They are often contained on the same integrated circuits(IC) chip to achieve high

performance.

Secondary storage

It is slow in speed.

It is cheaper than primary memory.

Its capacity is high.

It is used to store information that is not accessed frequently.

Various secondary devices are magnetic tapes and disks, optical disks (CD-ROMs), floppy etc.

Arithmetic and logic unit

Arithmetic and logic unit (ALU) and control unit together form a processor.

Actual execution of most computer operations takes place in arithmetic and logic unit of the processor.

Example:

Suppose two numbers located in the memory are to be added. They are brought into the

processor, and the actual addition is carried out by the ALU.

Registers:

Registers are high speed storage elements available in the processor.

Each register can store one word of data.

When operands are brought into the processor for any operation, they are stored in the registers.

Accessing data from register is faster than that of the memory.

9

Output unit

The function of output unit is to produce processed result to the outside world in human

understandable form.

Examples of output devices are Graphical display, Printers such as inkjet, laser, dot matrix and

so on. The laser printer works faster.

Control unit

Control unit coordinates the operation of memory, arithmetic and logic unit, input unit, and

output unit in some proper way. Control unit sends control signals to other units and senses their

states.

Example:

Data transfers between the processor and the memory are controlled by the control unit

through timing signals.

Timing signals are the signals that determine when a given action is to take place.

Control units are well defined, physically separate unit that interact with other parts of

the machine.

A set of control lines carries the signals used for timing and synchronization of events in

all units

Differences between:

Primary Memory Secondary Memory

Also called as Main memory. Also called as Auxiliary memory.

Accessing the data is faster. Accessing the data is slow.

CPU can access directly CPU cannot access directly

Semiconductor memory. Magnetic memory.

Data storage capacity is less. Data storage capacity is more or huge.

Expensive. Not expensive.

It is Internal memory. It is External memory.

Examples : RAM, ROM Examples: hard disk, floppy disk, magnetic

tape etc.

RAM ROM

Random Access Memory. Read Only Memory.

Volatile memory.

The contents of the RAM are lost when

power is turned off.

Non-volatile memory.

The contents of the ROM are not lost when

power is turned off.

Temporary storage medium. Permanent storage medium.

The data can be read and written. The data can only be read, but the data

cannot be written.

The programs are brought into RAM just

before execution.

BIOS and monitor programs are stored.

10

11

Categories of Software:

System software Application software

Collection of programs written by expert

programmers/manufacturers.

Collection of programs written by users

(programmers). System software can be used to control the

computer system.

Application software is written to perform

particular task.

System software helps in executing other

programs.

Application software are not used for

executing other programs.

Examples include compilers, loaders

.Operating System etc. Examples include Banking s/w, library s/w,

payroll s/w etc

II.BASIC OPERATIONAL CONCEPTS

To perform a given task on computer, an appropriate program is to be stored in the memory.

Individual instructions are brought from the memory into the processor, which executes the specified

operations. Data to be used as operands are also stored in the memory.

Consider an instruction

Add LOCA, R0

This instruction adds operand at memory location LOCA to the operand in a register R0 in the

processor and the result get stored in the register R0.

12

The original content of LOCA is preserved, whereas the content of R0 is overwritten.

This instruction requires the following steps

1) The instruction is fetched from memory into the processor.

2) The operand at LOCA is fetched and added to the content of R0.

3) Resulting sum is stored in register R0.

The above add instruction combines a memory access operation with an ALU operation.

Same can be performed using two instruction sequences

Load LOCA, R1

ADD R1, R0

• Here the first instruction, transfer the contents of memory location LOCA into register R1.

• The second instruction adds the contents of R1 and R0 and places the sum into R0.

• The first instruction destroys the content of R1 and preserve the value of LOCA, the second

instruction destroys the content of R0.

Connection between memory and the processor

Transfer between the memory and the processor are started by sending the address of the

memory to be accessed to the memory unit and issuing the appropriate control signals. The data are

then transferred to or from the memory.

The below figure shows how the memory and the processor can be connected

Memory

13

Processor contains number of registers in addition to the ALU and the Control unit for different

purposes. Various registers are

Instruction register(IR)

Program counter(PC)

Memory address register(MAR)

Memory data register(MDR)

General purpose registers (R0 to Rn-1 )

Processor

MAR MDR Control

Ro

ALU . .

R1

Rn-1

PC

IR

N general purpose registers

14

Instruction register (IR): IR holds the instruction that is currently being executed by the processor. Its

output is available to the control circuits, which generates the timing signals that controls various

processing elements involved in executing the instruction.

Program counter (PC): It is a special purpose register that contains the address of the next instruction

to be fetched and executed.

During the execution of one instruction PC is updated to point the address of the next instruction

to be fetched and executed. It keeps track of the execution of a program

Memory address register (MAR): The MAR holds the address of the memory location to be

accessed.

Memory data register(MDR): The MDR contains the data to be written into or read from the memory

location, that is being pointed by MAR.These two registers MAR and MDR facilitates communication

between memory and the processor.

Operating steps

Initially program resides in the memory (usually get through the input Unit.) and PC is set to

point to the first instruction of the program.

The contents of PC are transferred to MAR and Read control signal is sent to the memory. The

addressed word (in this case the first instruction of the program) is read out of the memory and

located into the MDR .register.

Next, the contents of MDR are transferred to the IR. At this point the instruction is ready to be

decoded and executed.

If the instruction involves an operation to be performed by the ALU, it is necessary to obtain the

required operands. If the operands resides in the memory ( it could also be in a general purpose

register in the processor), Then the operands required are fetched from the memory to the MDR

15

by sending its address to the MAR and initiating a read cycle. The fetched operands are then

transferred to the ALU. After one or more operands are fetched dint his way the ALU can

perform the desired operation..

If the result of this operation is to be stored in the memory, then the result is sent to MDR and

address of the memory location where the result is to be stored is sent to MAR and write cycle is

initiated.

During the execution of current instruction the contents of the PC are incremented to point to

next instruction to be executed. Thus as soon as the execution of the current instruction is

completed, a new instruction fetch may be started.

Note: in addition to transferring data between the memory and the processor, the computer

accepts data from input devices and sends data to the output devices. Thus some machine

instructions with ability to handle IO transfers are provided.

Interruption

Normal execution of the program may be interrupted if some other device requires urgent

service of the processor. For example, a monitoring device in a computer controlled industrial process

may detect a dangerous condition. In order to deal with that situation immediately, the normal execution

of the current program must be interrupted. To do this the device raises an interrupt signal. An interrupt

is the request from an I/O device for the service by the processor. The processor provides the requested

service by executing an appropriate interrupt service routine.

When the interrupt service routine is completed, the execution of the interrupted program is

continued by the processor. Because of such changes, it may alter the internal state of the processor. So

its state must be saved in memory location before servicing the interrupt. Normally PC will be used.

III. BUS STRUCTURES Bus is a group of lines that serves as a connection path for several individual parts of a

computer to transfer the data between them.

To achieve a reasonable speed of the operation, a computer must be organized so that,

All its units can handle one full word of data at a given time.

When a word of data is transferred in a bus, all its bits are transferred in parallel, that is, the

bits are transferred simultaneously over many wires, or lines, one bit per

line.

Input Processor Memory Output

Figure: Single Bus Structure

A group of lines serves as a connecting path of several devices is called a BUS. The bus must

have a separate line for carrying data, address and control signals. Single bus is used to interconnect all

the units as shown above and hence the bus can be used for only one transfer at a time, only two units

can actively use the bus at any given time.

16

17

Advantage of using single bus structure is its low cost and its flexibility for attaching

peripheral devices.

Multiple Bus structure

System using multiple buses results in concurrency as it allows two or more transfer at the same

time. This leads to high performance but at increased cost.

The use of Buffer registers

Speed of operation of various devices connected to a common bus varies. Input and output

devices such as keyboard and printers are relatively slow compared to processor and storage devices

such as optical disks.

Consider an example the transfer of encoded character from a processor to a character printer.

The efficiency of using processor will be reduced because of variance in their speed.

To overcome this problem a buffer register is used.

Buffer registers

Buffer register is an electronic register that is included with the devices to hold the information

during transfer.

When the processor sends a set of characters to a printer, those contents is transferred to the

printer buffer (buffer register for a printer). Once printer buffer is loaded processor and the bus is no

longer needed and the processor can be released for other activity.

Purpose of Buffer Register:

Buffer register prevent a high speed processor from being locked to a slow I/O devices.

Buffer register is used which smooth out timing differences among slow and the fast

devices.

18

It allows the processor to switch rapidly from one device to another.

IV. PERFORMANCE AND METRICS

Performance of a computer can be measured by speed with which it can execute the program.

Speed of the computer is affected by

Hardware design

Machine language instruction of the computer. Because the programs are usually written in high

level language.

Compiler, which translates high-level language into machine language.

For best performance, it is necessary to design a complier, machine instruction set, and the hardware

in a coordinated way.

Consider a Time line diagram to describe how the operating system overlaps processing, disk

transfers, and printing for several programs to make the best possible use of the resources available. The

total time required to execute the program is t5 - t0. This is called elapsed time and it is the measure of

the performance of the entire computer system.

It is affected by the speed of the processor, the disk and the printer. To discuss the performance of the

processor we should only the periods during which the processor is active.

User program and OS routine sharing of the processor

Pointer Disk OS routines Program

19

t0 t1 t2 t3 t4 t4 Time

Elapsed time for the execution of the program depends on hardware involved in the execution of the

program. This hardware includes processor and the memory which are usually connected by a BUS (As

shown in the bus structure diagram.).

Processor

Cache memory

Main memory

The Processor Cache

Bus

When the execution of the program starts, all program instructions and the required data are stored

in the main memory. As execution proceeds, instructions are fetched from the main memory one by one

by the processor, and a copy is placed in the cache. When execution of the instruction calls for the data

located in the main memory, the data are fetched and a copy is placed in the cache. If the same

instruction or data is needed later, it is read directly from the cache. The processor and a small cache

memory are fabricated into a single IC chip. The speed of such chip is relatively faster than the speed at

which instruction and data can be fetched from the main memory. A program can be executed faster if

the movement of the instructions and data between the main memory and the processor is minimized,

which is achieved by using the cache.

To evaluate the performance, we can discuss about,

Processor clock

Basic performance equation

Pipelining and Superscalar operation

Clock Rate

Instruction Set: CISC and RISC

Compiler

Performance Measurement

Processor clock

20

21

Processor circuits are controlled by timing signal called a clock. The clock defines regular time

intervals, called clock cycle. To execute a machine instruction, the processor divides the action to be

performed into sequence of basic steps, such that each step can be completed in one clock cycle.

Length of one clock cycle is P and this parameter P affects processor performance. It is

inversely proportional to clock rate

R=1/P

This is measured in cycles per second.

Processors used in today’s personal computers and workstations have clock rates from a few hundred

millions to over a billion cycles per second is called hertz (Hz). The term “million” is denoted by the

prefix Mega (M) and “billion” is denoted by prefix Giga ( G). Hence, 500 million cycles per second is

usually abbreviated to 500Mega Hertz (MHz). And 1250 million cycles per second is abbreviated to

1.25 Giga Hertz (GHz). The corresponding clock periods are 2 and 0.8 nano seconds (ns) respectively.

Basic performance equation

Let T be the time required for the processor to execute a program in high level language. The

compiler generates machine language object program corresponding to the source program.

Assume that complete execution of the program requires the execution of N machine language

instructions.

Assume that average number of basic steps needed to execute one machine instruction is S,

where each basic step is completed in one clock cycle.

If the clock rate is R cycles per second, the program execution time is given by

T = (N x S) / R

This is often called Basic performance equation.

22

To achieve high performance, the performance parameter T should be reduced. T value can be

reduced by reducing N and S, and increasing R.

Value of N is reduced if the source program is compiled into fewer number of machine

instructions.

Value of S is reduced if instruction has a smaller no of basic steps to perform or if the execution

of the instructions is overlapped.

Value of R can be increased by using high frequency clock, ie. Time required to complete a

basic execution step is reduced.

N, S and R are dependent factors. Changing one may affect another.

Pipelining and Superscalar operation

Pipelining

It is a technique of overlapping the execution of successive instructions. This technique

improves performance.

Consider the instruction

Add R1, R2, R3

The above instruction adds the contents of registers R1 and R2, and places the sum to R3. The

contents of R1 and R2 are first transferred to the inputs of the ALU. After addition is performed the

result is transferred to register R3 from the processor.

Here processor can read the next instruction to be executed while performing addition operation

of the current instruction and while transferring the result of addition to ALU, the operands required for

the next instruction can be transferred to the processor. This process of overlapping the instruction

execution is called Pipelining.

23

If all the instructions are overlapped to the maximum degree, the effective value of S is 1. It is

impossible always.

Individual instructions require several clock cycles to complete but for the pupose of computing

T, effective value of S is 1.

Superscalar operation

A higher degree of concurrency can be achieved if multiple instruction pipelines are

implemented in the processor. This means that multiple functional units are used, creating parallel paths

through which different instruction can be executed in parallel. With such an arrangement, it becomes

possible to start the execution of several instructions in every clock cycle. This mode of operation is

called superscalar execution. So there is possibility of reducing the S value even less than 1.

Parallel execution should preserve the logical correctness of programs. That is the result

produced must be same as those produced by serial execution of program executions.

Clock Rate

There are two possibilities for increasing the clock rate, R.

First, improving the integrated-circuit (IC) technology makes logic circuits faster, which reduces

the time needed to complete a basic step. This allows the clock period, P, to be reduced and the

clock rate, R, to be increased.

Second, reducing the amount of processing done in one basic step also makes it possible to

reduce the clock period, P. However, if the actions that have to be performed by an instruction

remain the same, the number of basic steps needed may increase.

Increases in the value of R by improvements in IC technology affect all aspects of the processor's

operation equally with the exception of the time it takes to access the main memory. In the presence of

a cache, the percentage of accesses to the main memory is small. Hence, much of the performance can

24

be improved.

The value of T will be reduced by the same factor as R is increased because S and N are not

affected.

Instruction Set: CISC and RISC

CISC: Complex Instructional Set Computers

RISC: Reduced Instructional Set Computers

Simple instructions require a small number of basic steps to execute.

Complex instructions involve a large number of steps.

For a processor that has only simple instructions, a large number of instructions may be needed to

perform a given programming task. This could lead to a large value for N and a small value for

S.

On the other hand, if individual instructions perform more complex operations, fewer instructions

will be needed, leading to a lower value of N and a larger value of S. It is not obvious if one

choice is better than the other.

Processors with simple instructions are called Reduced Instruction Set Computers (RISC) and

processors with more complex instructions are referred to as Complex Instruction Set Computers

(CISC)

The decision for choosing the instruction set is done with the use of pipelining. Because the

effective value of S is close 1.

Compiler

A compiler translates a high-level language program into a sequence of machine instructions. To

reduce N, we need to have a suitable machine instruction set and a compiler that makes good use of it.

An optimizing compiler takes advantage of various features of the target processor to reduce

the product N x S, which is the total number of clock cycles needed to execute a program. The number

of cycles is dependent not only on the choice of instructions, but also on the order in which they appear

in the program. The compiler may rearrange program instructions to achieve better performance

without changing the logic of the program.

Complier and processor must be closely linked in their architecture. They should be designed at the

same time.

Performance Measurement

The computer community adopted the idea of measuring computer performance using

benchmark programs. To make comparisons possible, standardized programs must be used. The

performance measure is the time it takes a computer to execute a given benchmark program.

A nonprofit organization called System Performance Evaluation Corporation (SPEC).

Running time on the reference computer SPEC rating = ------------------------------------------------

Running time on the computer under test

The test is repeated for all the programs in the SPEC suite, and the geometric means of the

results are computed. Let SPECi be the rating for program i in the suite. The overall SPEC rating for the

computer is given by

SPEC rating =

nn

iiSPEC

/1

1⎟⎟⎠

⎞⎜⎜⎝

⎛∏=

where n is the number of programs in the suite.

25

26

V. INSTRUCTION AND INSTRUCTION SEQUENCING A computer must have instruction capable of performing four types of basic operations such as

Data transfer between the memory and the processor registers.

Arithmetic and logic operation on data

Program sequencing and control

I/O transfers

To understand the first two types of instruction, we need to know some notations..

Register Transfer Notation (RTN)

Data transfer can be represented by standard notations given below..Processor registers are

represented by notations R0, R1, R2…Address of the memory locations are represented by names such

as LOC, PLACE, MEM etc..I/O registers are represented by names such as DATAIN, DATAOUT. The

content of memory locations are denoted by placing square bracket around the name of the register.

Example 1 : R1 [ LOC ]

This expression states that the contents of memory location LOC are transferred into the

processor register R1.

Example 2: R3 [ R1 ] + [ R2 ]

27

This expression states that the contents of processor registers R1 and R2 are added and the result

is stored into the processor register R3.

This type of notation is known as Register Transfer Notation (RTN).

Note: that the right-hand of an RTN expression always denotes a value, and left-hand side is name of a

location where the value is to be placed, overwriting the old contents of that location.

Assembly Language Notation

To represent machine instructions, assembly language uses statements as shown below

To transfer the data from memory location LOC to processor register R1

Move LOC,R1

To add two numbers in register R1 and R2 and to place their sum in register R3

ADD R1,R2,R3

_____________________________________________________________ BASIC INSTRUCTION TYPES

The operation of addition of two numbers is a fundamental capability in any computer. The

statement

C= A + B

in a high-level language program is a command to the computer to add the current values of the two

variables called A and B, and to assign the sum to a third variable, C.

When the program containing this statement is compiled, the three variables, A,B,C are assigned

to distinct location in the memory.

Hence the above high-level language statement requires the action

28

C [A] + [B]

to take place in the computer. Here [A] and [B] represents contents of A and B respectively.

To carry out this action, the contents of memory locations A and B are fetched from the memory and

transferred into the processor where their sum is computed. This result is then sent back to the memory

and stored in location C.

Performing a basic instruction is represented in many ways:

They are

• 3-address instruction

• 2 -address instruction



Let us first assume that this action is to be accomplished by a single machine instruction.

Furthermore, assume that this instruction contains the memory addresses of the three operands - A, B,

and C. This three-address instruction can be represented symbolically as

Add A,B,C

Operands A and B are called the source operands, C is called the destination operand, and Add

is the operation to be performed on the operands. A general instruction of this type has the format

Operation Source1,Source2,Destination

If k bits are needed to specify the memory address of each operand, the encoded form of the

29

above instruction must contain 3k bits for addressing purposes in addition to the bits needed to

denote the Add operation.

For a modern processor with a 32-bit address space, a 3-address instruction is too large to fit in

one word for a reasonable word length. Thus, a format that allows multiple words to be used for

a single instruction would be needed to represent an instruction of this type.

An alternative approach is to use a sequence of simpler instructions to perform the same task,

with each instruction having only one or two operands. Suppose that two-address instructions of

the form are available.

Operation Source,Destination

An Add instruction of this type is

Add A,B

which performs the operation B [A] + [B].

When the sum is calculated, the result is sent to the memory and stored in location B, replacing

the original contents of this location. This means that operand B is both a source and a

destination.

A single two-address instruction cannot be used to solve our original problem, which is to add

the contents of locations A and B, without destroying either of them, and to place the sum in

location C.

The problem can be solved by using another two address instruction that copies the contents of

one memory location into another. Such an instruction is

Move B,C

30

which performs the operation C [B], leaving the contents of location B unchanged. The word

"Move" is a misnomer here; it should be "Copy."

However, this instruction name is deeply entrenched in computer nomenclature. The operation C [A]

+ [B] can now be performed by the two-instruction sequence

Move B,C

Add A,C

In all the instructions given above, the source operands are specified first, followed by the

destination. This order is used in the assembly language expressions for machine instructions in

many computers.

But there are also many computers in which the order of the source and destination operands is

reversed. It is unfortunate that no single convention has been adopted by all manufacturers.

In fact, even for a particular computer, its assembly language may use a different order for

different instructions. We have defined three- and two-address instructions. But, even two-

address instructions will not normally fit into one word for usual word lengths and address sizes.

Another possibility is to have machine instructions that specify only one memory operand.

When a second operand is needed, as in the case of an Add instruction, it is understood

implicitly to be in a unique location. A processor register, usually called the accumulator, may

be used for this purpose. Thus, the one-address instruction

Add A

means the following: Add the contents of memory location A to the contents of the accumulator

register and place the sum back into the accumulator. Let us also introduce the one-address

31

instructions

Load A

and

Store A

The Load instruction copies the contents of memory location A into the accumulator, and the

Store instruction copies the contents of the accumulator into memory location A. Using only

one-address instructions, the operation C ( [A] + [B] can be performed by executing the

sequence of instructions

Load A

Add B

Store C

Note that the operand specified in the instruction may be a source or a destination, depending on

the instruction.

In the Load instruction, address A specifies the source operand, and the destination location, the

accumulator, is implied.

On the other hand, C denotes the destination location in the Store instruction, whereas the

source, the accumulator, is implied.

Some early computers were designed around a single accumulator structure. Most modern

computers have a number of general-purpose processor registers - typically 8 to 32, and even

considerably more in some cases.

Access to data in these registers is much faster than to data stored in memory locations because

the registers are inside the processor. Because the number of registers is relatively small, only a

few bits are needed to specify which register takes part in an operation. For example, for 32

32

registers, only 5 bits are needed.

This is much less than the number of bits needed to give the address of a location in the

memory. Because the use of registers allows faster processing and results in shorter instructions,

registers are used to store data temporarily in the processor during processing.

Let Ri represent a general-purpose register. The instructions

Load A,Ri

Store Ri,A

and

Add A,Ri

are generalizations of the Load, Store, and Add instructions for the single-accumulator case, in which

register Ri performs the function of the accumulator.

Even in these cases, when only one memory address is directly specified in an instruction, the

instruction may not fit into one word.

When a processor has several general-purpose registers, many instructions involve only

operands that are in the registers. In fact, in many modem processors, computations can be

performed directly only on data held in processor registers. Instructions such as

Add Ri,Rj

or

Add Ri,Rj,Rk

are of this type.

In both of these instructions, the source operands are the contents of registers Ri and Rj. In the

first instruction, Rj also serves as the destination register, whereas in the second instruction, a

third register, Rk, is used as the destination. Such instructions, where only register names are

33

contained in the instruction, will normally fit into one word.

It is often necessary to transfer data between different locations. This is achieved with the

instruction

Move Source,Destination

which places a copy of the contents of Source into Destination.

When data are moved to or from a processor register, the Move instruction can be used rather

than the Load or Store instructions because the order of the source and destination operands

determines which operation is intended. Thus,

Move A,Ri

is the same as

Load A,Ri

and

Move Ri,A

is the same as

Store Ri ,A

In processors where arithmetic operations are allowed only on operands that are in processor

registers, the C = A + B task can be performed by the instruction sequence

Move A,Ri

Move B,Rj

Add Ri ,Rj

Move Rj ,C

In processors where one operand may be in the memory but the other must be in a register, an

34

instruction sequence for the required task would be

Move A,Ri

Add B,Ri

Move Ri,C

The speed with which a given task is carried out depends on the time it takes to transfer

instructions from memory into the processor and to access the operands referenced by these

instructions.

Transfers that involve the memory are much slower than transfers within the processor. Hence, a

substantial increase in speed is achieved when several operations are performed in succession on

data in processor registers without the need to copy data to or from the memory.

When machine language programs are generated by compilers from high-level languages, it is

important to minimize the frequency with which data is moved back and forth between the

memory and processor registers.

We used the task C [A] + [B] as an example instruction format. The diagram shows a possible

program segment for this task as it appears in the memory of a computer. We have assumed that the

computer allows one memory operand per instruction and has a number of processor registers. We

assume that the word length is 32 bits and the memory is byte addressable. The three instructions of the

program are in successive word locations, starting at location i. Since each instruction is 4 bytes long,

the second and third instructions start at addresses i + 4 and i + 8.

For simplicity, we also assume that a full memory address can be directly specified in

a single-word instruction, although this is not usually possible for address space sizes

and word lengths of current processors.

Fig: A program for C [A] + [B]

Execution steps of an above program:

The processor contains a register called the program counter (PC), which holds the address of

the instruction to be executed next.

35

36

To begin executing a program, the address of its first instruction (i in our example) must be

placed into the PC.

Then, the processor control circuits use the information in the PC to fetch and execute

instructions, one at a time, in the order of increasing addresses. This is called straight-line

sequencing.

During the execution of each instruction, the PC is incremented by 4 to point to the next

instruction.

Thus, after the Move instruction at location i + 8 is executed, the PC contains the value i + 12,

which is the address of the first instruction of the next program segment.

Executing a given instruction is a two-phase procedure.

In the first phase, called instruction fetch, the instruction is fetched from the memory location

whose address is in the PC. This instruction is placed in the instruction register (IR) in the

processor.

At the start of the second phase, called instruction execute, the instruction in IR is examined to

determine which operation is to be performed.

The specified operation is then performed by the processor. This often involves fetching

operands from the memory or from processor registers, performing an arithmetic or logic

operation, and storing the result in the destination location.

At some point during this two-phase procedure, the contents of the PC are advanced to point to

the next instruction. When the execute phase of an instruction is completed, the PC contains the

address of the next instruction, and a new instruction fetch phase can begin.

In most processors, the execute phase itself is divided into a small number of distinct phases

corresponding to fetching operands, performing the operation, and storing the result.

37

BRANCHING

Consider the task of adding a list of n numbers. The addresses of the memory locations

containing the n numbers are symbolically given as NUM1, NUM2, . . . , NUMn, and a separate Add

instruction is used to add each number to the contents of register R0. After all the numbers have been

added, the result is placed in memory location SUM. Instead of using a long list of Add instructions, it

is possible to place a single Add instruction in a program loop. The loop is a straight-line sequence of

instructions executed as many times as needed. It starts at location LOOP and ends at the instruction

Branch>0. During each pass through this loop, the address of the next list entry is determined, and that

entry is fetched and added to R0. Now, we concentrate on how to create and control a program loop.

Assume that the number of entries in the list, n, is stored in memory location N. Register R1 is used as a

counter to determine the number of times the loop is executed. Hence, the contents of location N are

loaded into register R1 at the beginning of the program. Then, within the body of the loop, the

instruction

Decrement R1

reduces the contents of R1 by 1 each time through the loop. (A similar type of operation is performed

by an Increment instruction, which adds 1 to its operand.) Execution of the loop is repeated as long as

the result of the decrement operation is greater than zero.

Fig: A straight-line program for adding n numbers.

Fig: Using a loop to add n numbers

39

40

We now introduce branch instructions. This type of instruction loads a new value into the program

counter. As a result, the processor fetches and executes the instruction at this new address, called the

branch target, instead of the instruction at the location that follows the branch instruction in sequential

address order. A conditional branch instruction causes a branch only if a specified condition is

satisfied. If the condition is not satisfied, the PC is incremented in the normal way, and the next

instruction in sequential address order is fetched and executed.

In the above program , the instruction,

Branch>0 LOOP

(branch if greater than 0) is a conditional branch instruction that causes a branch to

location LOOP if the result of the immediately preceding instruction, which is the

decremented value in register R1, is greater than zero. This means that the loop is

repeated as long as there are entries in the list that are yet to be added to R0. At the

end of the nth pass through the loop, the Decrement instruction produces a value of

zero, and, hence, branching does not occur. Instead, the Move instruction is fetched

and executed. It moves the final result from R0 into memory location SUM.

The capability to test conditions and subsequently choose one of a set of alternative

ways to continue computation has many more applications than just loop control. Such

a capability is found in the instruction sets of all computers and is fundamental to the

programming of most nontrivial tasks.

41

CONDITION CODES

The processor keeps track of information about the results of various operations for use by

subsequent conditional branch instructions. This is accomplished by recording the required information

in individual bits, often called condition code flags. These flags

are usually grouped together in a special processor register called the condition code

register or status register. Individual condition code flags are set to 1 or cleared to 0,

depending on the outcome of the operation performed.

Four commonly used flags are

N (negative) Set to 1 if the result is negative; otherwise, cleared to 0

Z (zero) Set to 1 if the result is 0; otherwise, cleared to 0

V (overflow) Set to 1 if arithmetic overflow occurs; otherwise, cleared to 0

C (carry) Set to 1 if a carry-out results from the operation; otherwise, cleared to 0

• The N and Z flags indicate whether the result of an arithmetic or logic operation is negative or

zero.

• The N and Z flags may also be affected by instructions that transfer data, such as Move, Load,

or Store.

• This makes it possible for a later conditional branch instruction to cause a branch based on the

sign and value of the operand that was moved.

• Some computers also provide a special Test instruction that examines a value in a register or in

the memory and sets or clears the N and Z flags accordingly.

42

• The V flag indicates whether overflow has taken place. Overflow occurs when the result of an

arithmetic operation is outside the range of values that can be represented by the number of bits

available for the operands.

• The processor sets the V flag to allow the programmer to test whether overflow has occurred

and branch to an appropriate routine that corrects the problem.

• Instructions such as BranchIfOverflow are provided for this purpose. A program interrupt may

occur automatically as a result of the V bit being set, and the operating system will resolve what

to do.

• The C flag is set to 1 if a carry occurs from the most significant bit position during an

arithmetic operation. This flag makes it possible to perform arithmetic operations on operands

that are longer than the word length of the processor. Such operations are used in multiple-

precision arithmetic.

• The instruction Branch>0, an example of a branch instruction that tests one or more of the

condition flags.

• It causes a branch if the value tested is neither negative nor equal to zero. That is, the branch is

taken if neither N nor Z is 1.

• Many other conditional branch instructions are provided to enable a variety of conditions to be

tested. The conditions are given as logic expressions involving the condition code flags.

• In some computers, the condition code flags are affected automatically by instructions that

perform arithmetic or logic operations. However, this is not always the case.

• A number of computers have two versions of an Add instruction, for example. One version,

Add, does not affect the flags, but a second version, AddSetCC, does.

43

• This provides the programmer—and the compiler—with more flexibility when preparing

programs for pipelined execution.

GENERATING MEMORY ADDRESSES

The purpose of the instruction block at LOOP is to add a different number from the list during

each pass through the loop. Hence, the Add instruction in that block must refer to a different address

during each pass. How are the addresses to be specified? The memory operand address cannot be given

directly in a single Add instruction in the loop. Otherwise, it would need to be modified on each pass

through the loop. As one possibility, suppose that a processor register, Ri, is used to hold the memory

address of an operand. If it is initially loaded with the address NUM1 before the loop is entered and is

then incremented by 4 on each pass through the loop, it can provide the needed capability.

This situation, and many others like it, gives rise to the need for flexible ways to

specify the address of an operand. The instruction set of a computer typically provides

a number of such methods, called addressing modes. While the details differ from one

computer to another, the underlying concepts are the same.

VI. HARDWARE The traffic-light controller is a very simple special-purpose computer system requiring only a

few of the physical hardware components that constitute a general-purpose computer system. The four

major hardware blocks of a general purpose computer system are its memory unit (MU), arithmetic and

logic unit (ALU), input=output unit (IOU), and control unit (CU). Input=output (I / O) devices input

and output data into and out of the memory unit. In some systems, I / O devices send and receive data

into and from the ALU rather than the MU. Programs reside in the memory unit. The ALU processes

the data taken from the memory unit (or the ALU) and stores the processed data back in the memory

unit (or the ALU). The control unit coordinates the activities of the other three units. It retrieves

instructions from programs resident in the MU, decodes these instructions, and directs the ALU to

perform corresponding processing steps. It also oversees I / O operations. A keyboard and a mouse are

the most common input devices nowadays. A video display and a printer

Figure : Typical computer system.

are the most common output devices. Scanners are used to input data from hardcopy sources. Magnetic

tapes and disks are used as I / O devices. These devices are also used as memory devices to increase the

44

45

capacity of the MU. The console is a special-purpose I/O device that permits the system operator to

interact with the computer system. In modern-day computer systems, the console is typically a

dedicated terminal.

VII. SOFTWARE The hardware components of a computer system are electronic devices in which the basic unit of

information is either a 0 or a 1, corresponding to two states of an electronic signal. For instance, in one

of the popular hardware technologies a 0 is represented by 0V while a 1 is represented by 5 V.

Programs and data must therefore be expressed using this binary alphabet consisting of 0 and 1.

Programs written using only these binary digits are machine language programs. At this level of

programming, operations such as ADD and SUBTRACT are each represented by a unique pattern of 0s

and 1s, and the computer hardware is designed to interpret these sequences. Programming at this level

is tedious since the programmer has to work with sequences of 0s and 1s and needs to have very

detailed knowledge of the computer structure. The tedium of machine language programming is

partially alleviated by using symbols such as ADD and SUB rather than patterns of 0s and 1s for these

operations. Programming at the symbolic level is called assembly language programming. An assembly

language programmer also is required to have a detailed knowledge of the machine structure, because

the operations permitted in the assembly language are primitive and the instruction format and

capabilities depend on the hardware organization of the machine. An assembler program is used to

translate assembly language programs into machine language. Use of high-level programming

languages such as FORTRAN, COBOL, C, and JAVA further reduces the requirement of an intimate

knowledge of the machine organization. A compiler program is needed to translate a high-level

language program into the machine language. A separate compiler is needed for each high-level

language used in programming the computer system. Note that the assembler and the compiler are also

programs written in one of those languages and can translate an assembly or high-level language

program, respectively, into the machine language.

The below figure shows the sequence of operations that occurs once a program is developed. A

program written in either the assembly language or a high-level language is called a source program. An

assembly language source program is translated by the assembler into the machine language program.

This machine language program is the object code. A compiler converts a high-level language source

into object code. The object code ordinarily resides on an intermediate device such as a magnetic disk

or tape. A loader program loads the object code from the intermediate device into the memory unit. The

data required by the program will be either available in the memory or supplied by an input device

during the execution of the program. The effect of program execution is the production of processed

data or results.

46

47

Figure : Program translation and execution.

System Operations such as selecting the appropriate compiler for translating the source into object code;

loading the object code into the memory unit; and starting, stopping, and accounting for the computer

system usage are automatically done by the system. A set of supervisory programs that permit such

automatic operation is usually provided by the computer system manufacturer. This set, called the

operating system, receives the information it needs through a set of command language statements from

the user and manages the overall operation of the computer system. Operating system and other utility

programs used in the system may reside in a memory block that is typically read-only. Special devices

are needed to write these programs into read-only memory. Such programs and commonly used data are

termed firmware. The below Figure is a simple rendering of the complete hardware– software

environment of a general-purpose computer system.

Figure: Hardware and software components.

Definition:

Software is a collection of program written to solve the problem using computer. Software is

two types.

• System software

• Applications software

The differences between System software and Applications software

System software Application software

It’s a collection of programs that are It’s a collection of programs that

48

49

responsible for the coordination of all

activities in a computing system

are focus on the particular

application( problem) to be solved

Purely machine dependent machine independent

Examples: Complier, assemble,

linker, debugger, text editor, loader,

OS and so on..

MS office, accounting system,

ticket reservation etc..

System software perform the following functions

Receiving and interpreting user commands.

Entering and editing application programs and storing them as files in secondary storage

devices. Eg., Text editors

Managing the storage and retrieval of files in secondary storage devices.

Running standard application program such as word processor or spreadsheet, with data

supplied by the user.

Controlling I/O units to receive input and produce output.

Translating source program into object program. Eg., Compiler.

Linking and running user written programs.

Compiler

Compiler is a system software that translating high-level language program (source program)

such as C, C++ into machine language program (object program).

Text editor

50

It is used for entering and editing application programs. The user can use the commands that

allow statements of a source program and saved as a file in secondary storage memory. A file can be

referred to by a name chosen by the user.

Operating System

Operating system is a large program with a collection of routines. It is used to control the

sharing of and interaction among various computers units as they execute application programs.

Other tasks of OS are,

To assign memory and magnetic disk space to program and data files.

To move data between memory and disk units

To handle I/O operations

Steps involved in running an application program

1) Transfer the program to be executed from secondary storage into main memory.

2) Start executing the program.

3) Read the required data for program from memory and perform the specified computation on

the data.

4) Print the result.

Role of operating system in running the program

1) When the executing program requires some data from the memory then it sends request to

operating system. The operating system fetches the requested data and passes the control back to

program which then proceed to perform the required computation.

2) When the computation is completed and the results are ready to be printed, the program again

sends a request to the operating system. An OS routine makes the printer to print the result.

The below time line diagram illustrates the sharing of the processor execution time. In this

diagram during time period t0 to t1, OS initiates loading the application program from disk to main

memory wait unit loading and then passes execution control to the application program. Same activity

occurs during period t2 to t3 and period t4 to t5.During t1 to t2 and t3 to t4 processor performs actual

execution of program.

Pointer Disk OS routines Program

51

t0 t1 t2 t3 t4 t4 Time

Figure : User program and OS routine sharing of the processor

52

From t4 to t5 OS transfers the file from main memory to printer to print the result. During this

period processor is free which can execute next program till printing is completed. Thus operating

system manages the concurrent execution of several programs to make the best possible use of

computer resources and it is called multiprogramming or multitasking.

MEMORY LOCATIONS AND ADDRESS

Computer memory consists of millions of storage cells. Each cell can store a bit (0 or 1) of

information. Usually n bits are grouped, so that such group of bits can be stored and retrieved in a single

basic operation. Each group of n bits is called a word of information, and n is called word length. Thus

memory of a computer can be schematically represented as a collection of words.

Figure : Memory words

Characteristics of word length:

• World length of modern computers ranges from 16 to 64 bits.

53

• If word length of a computer is 32 bits, then a single word can store 32 bit 2’s complement

number or four ASCII characters, each occupying 8 bits (a unit of 8 bits called a byte).

Machine instructions may require one or more words for their representation.

The format for encoding the machine instructions into memory word

Address and name representations to store an information:

Accessing the memory to store or retrieve a single item of information, either a word or a byte,

requires distinct names or addresses for each item location.

54

55

Normally numbers will be represented from 0 through 2k - 1, for some suitable value of k, as the

addresses of successive locations in the memory. The 2k addresses constitutes the address space of

the computer, and the memory can have up to 2k addressable locations.

For example:

A 24-bit address generates an address space of 224(16,777,216) locations. This number is

usually written as 16M (16 mega), where 1M is the number 220(1,048,576).

A 32-bit address creates an address space of 232 or 4G (4 giga) locations, where 1G is 230.

Other notational convention that are commonly used are K (kilo) for the number 210 (1,024),

and T (tera) for the number 240.

Byte Addressability

A byte is always 8 bits, but the word length typically ranges from 16 to 64 bits. It is impractical

to assign distinct addresses tio individual bit locations in the memory. The practical assignment is to

have successive addresses refer to successive byte locations in the memory.

Byte addressable memory is one in which successive addresses refer to successive byte location in

the memory. Thus each byte in a memory is addresses as 0,1,2… and if the word length of the machine

is 32 bits, successive words are located at addresses 0, 4, 8…., with each word consisting of four bytes.

Big-Endian And Little Endian Assignments

There are two ways of assigning byte addresses. They are

Big-endian assignment

Little-endian assignment

Big-endian

Big-endian is used when lower byte addresses are used for the more significant bytes (the left

most bytes) of the word.

Little-endian

Little-endian is used when lower byte addresses are used for the less significant bytes (the

rightmost bytes) of the word.

In both Big-endian and Little-endian assignments byte addresses 0,4,8,…, are taken as the address of

successive words (ie, word length is 4 bytes) in the memory and are the addresses are used when

specifying memory read and write operations for words.

Word alignment

There are two kind of address

56

57

Aligned address

Unaligned address

Aligned address

Words are said to be aligned in memory if they begin at a byte address that is a multiple of

number of bytes in a word.

For example,

• In a 32 bit (4 bytes) word length machine, number of bytes in a word is 4. In this case

words are said to have aligned address, if words begin at address 0, 4, 8, 16… i.e.,

multiple of number of byte in a word.

• Similarly if word length is 16 (2 bytes), aligned words begin at byte

addresses 0,2,4…

Unaligned address

Words are said to be unaligned in memory if they do not begin at a byte address that is a

multiple of number of bytes in a word.

Accessing numbers, characters, and character strings

A number occupies one word. It can be accessed in the memory by specifying its word address.

A Character occupies one Byte. It can be accessed in the memory by specifying its Byte address.

Accessing Strings

The beginning of the string is indicated by giving the byte address of its first character.

Successive byte locations contain successive characters of a string.

There are two ways to indicate the length of the string

58

• A special control character with the meaning “end of string” can be used as the

last character in the string.

• A separate memory word location or processor register can contain a number

indicating the length of the string in bytes.

MEMORY OPERATIONS

To execute an instruction, the processor control circuits must cause the word containing the

instruction to be transferred from the memory to the processor. Operands and results must also be

moved between the memory and the processor.

Thus, two basic memory operations are needed, they are

Load

Store

Load

o The load operation transfers a copy of the content of a specific memory location to the

processor.

o To start a load operation, the processor sends the address of the desired location to the

memory.

o The memory reads the data stored at that address and sends them to the processor.

Address Processor

Memory

Read

Data

Store

• The store operation transfers an item of information from the processor register to a specific

memory location.

• The processor sends the address of the desired memory location to the memory, together with

the data to be written into that location.

Address Processor

Memory

Write

Data

59

60

An information item of either one word or one byte can be transferred between the processor and the

memory in a single operation. Processor register can hold one word of information at a time.

VIII. INSTRUCTION SET ARCHITECTURE

Interface between the high level language and the machine language

It has the following parts: • Instruction set

• Addressing modes

• Instruction formats

• Instruction representation

Instructions

Logical instructions

AND, OR, XOR, Shift

Arithmetic instructions

Data types

Integers: Unsigned, Signed, Byte, Short, Long

Real numbers: Singleprecision (float), Doubleprecision (double) Operations

Addition, Subtraction, Multiplication, Division

Data transfer instructions

Register transfer: Move

Memory transfer: Load, Store

I/O transfer: In, Out

61

Control transfer instructions

• Unconditional branch

• Conditional branch

• Procedure call

• Return

Addressing modes

Specification of operands in instructions

Different addressing modes:

• Register direct: Value of operand in a register

• Register indirect: Address of operand in a register

• Immediate: Value of operand

• Memory direct: Address of operand

• Indexed: Base register, Index register

• Relative: Base register, Displacement

• Indexed relative: Base register, Index register,

Instruction formats 3operand instructions

ADD op1, op2, op3; op1 op2 + op3 2operand instructions

ADD op1, op2; op1 op1 + op2 1operand instructions

INC op1; op1 op1 + 1 Types of operands:

Register operands Memory operands specified using addressing modes Effect of instruction format:

62

Instruction length Number of instructions for a program

– Complexity of instruction decoding (Control unit) Complex Instruction Set Computer (CISC) processors: 2operand instructions and 1operand instructions Any instruction can use memory operands, Many addressing modes, Complex instruction formats: Varying length instructions , Micro programmed control unit

Reduced Instruction Set Computer (RISC) processors: 3operand instructions, 2operand instructions, and 1operand instructions Architecture (LSA) processors:

• Only memory transfer instructions (Load and Store) can use memory operands. • All other instructions can use register operands only.

– A few addressing modes – Simple instruction formats: Fixed length instructions – Hardwired control unit

IX. ADDRESSINGMODES

In general, a program operates on data that reside in the computer’s memory. These data can be

organized in a variety of ways. If we want to keep track of students’ names, we can write them in a list.

If we want to associate information with each name, for example to record telephone numbers or marks

in various courses, we may organize this information in the form of a table. Programmers use

organizations called data structures to represent the data used in computations. These include lists,

linked lists, arrays, queues, and so on.

Programs are normally written in a high-level language, which enables the programmer to use

constants, local and global variables, pointers, and arrays. When translating a high-level language

program into assembly language, the compiler must be able to implement these constructs using the

facilities provided in the instruction set of the computer in which the program will be run. The different

ways in which the location of an operand is specified in an instruction are referred to as addressing

modes.

63

64

1. IMPLEMENTATION OF VARIABLES AND CONSTANTS

Variables and constants are the simplest data types and are found in almost every Computer

program. In assembly language, a variable is represented by allocating a register or a memory location

to hold its value. Thus, the value can be changed as needed using appropriate instructions.

We accessed an operand by specifying the name of the register or the address of the memory

location where the operand is located.

Register mode The operand is the contents of a processor register; the name (address) of the register is given in

the instruction. It is used to access the variables in the program.

Absolute mode The operand is in a memory location; the address of this location is given explicitly in the

instruction. It is also called as Direct mode. It also used to access the variables in the program.

Example instruction for register and absolute mode:

Move LOC, R2

uses the register and absolute modes. The processor registers are used as temporary storage locations

where the data in a register are accessed using the Register mode. The Absolute mode can represent

global variables in a program. A declaration such as

Integer A, B;

In a high-level language program will cause the compiler to allocate a memory location to each of the

variables A and B. Absolute mode can be used to access the variables in the program.

Immediate mode

65

Address and data constants can be represented in assembly language using the Immediate

mode. The operand is given explicitly in the instruction.

For example, the instruction

Move 200immediate, R0

places the value 200 in register R0. Clearly, the Immediate mode is only used to specify the value of a

source operand. Using a subscript to denote the Immediate mode is not appropriate in assembly

languages. A common convention is to use the sharp sign (#) in front of the value to indicate that this

value is to be used as an immediate operand.

Hence, we write the instruction above in the form

Move #200, R0

Constant values are used frequently in high-level language programs. For example,

the statement

A = B + 6

contains the constant 6. Assuming that A and B have been declared earlier as variables

and may be accessed using the Absolute mode, this statement may be compiled as

follows:

Move B, R1

Add #6, R1

Move R1, A

Constants are also used in assembly language to increment a counter, test for some bit pattern, and so

on.

66

2. INDIRECTION AND POINTERS In the addressing modes that follow, the instruction does not give the operand or its address

explicitly. Instead, it provides information from which the memory address of the operand can be

determined. We refer to this address as the effective address (EA) of the operand.

Indirect mode The effective address of the operand is the contents of a register or memory location whose

address appears in the instruction. We denote indirection by placing the name of the register or the

memory address given in the instruction in parentheses.

To execute the Add instruction the processor uses the value B, which is in register R1, as the effective

address of the operand. It requests a read operation from the memory to read the contents of location B.

The value read is the desired operand, which the processor adds to the contents of register R0. Indirect

addressing through a memory location is also possible. In this case, the processor first reads the

contents of memory location A, then requests a

Fig: Indirect addressing.

Fig: Use of indirect addressing in the program

67

68

second read operation using the value B as an address to obtain the operand. The register or memory

location that contains the address of an operand is called a pointer. Consider the analogy of a treasure

hunt: In the instructions for the hunt you may be told to go to a house at a given address. Instead of

finding the treasure there, you find a note that gives you another address where you will find the

treasure. By changing the note, the location of the treasure can be changed, but the instructions for the

hunt remain the same. Changing the note is equivalent to changing the contents of a pointer in a

computer program. For example, by changing the contents of register R1 or location A, the same Add

instruction fetches different operands to add to register R0. Let us now return to the program for adding

a list of numbers. Indirect addressing can be used to access successive numbers in the list, resulting in

the program. Register R2 is used as a pointer to the numbers in the list, and the operands are accessed

indirectly through R2. The initialization section of the program loads the counter value n from memory

location N into R1 and uses the immediate addressing mode to place the address value NUM1, which is

the address of the first number in the list, into R2. Then it clears R0 to 0. The first two instructions in

the loop implement the unspecified instruction block starting at LOOP. The first time through the loop,

the instruction

Add (R2), R0

fetches the operand at location NUM1 and adds it to R0. The second Add instruction adds 4 to the

contents of the pointer R2, so that it will contain the address value NUM2 when the above instruction is

executed in the second pass through the loop.

Consider the C-language statement

A= *B;

where B is a pointer variable. This statement may be compiled into

Move B, R1

69

Move (R1), A

Using indirect addressing through memory, the same action can be achieved with

Move (B), A

Despite its apparent simplicity, indirect addressing through memory has proven to be of limited

usefulness as an addressing mode, and it is seldom found in modern computers. An instruction that

involves accessing the memory twice to get an operand is not well suited to pipelined execution.

Indirect addressing through registers is used extensively. The program shows the flexibility it provides.

Also, when absolute addressing is not available, indirect addressing through registers makes it possible

to access global variables by first loading the operand’s address in a register.

3. INDEXING AND ARRAYS

It is useful in dealing with lists and arrays.

Index mode

The effective address of the operand is generated by adding a constant value to the contents of a

register. The register used may be either a special register provided for this purpose, or, more

commonly; it may be any one of a set of general-purpose registers in the processor. In either case, it is

referred to as an index register. We indicate the Index mode symbolically as

X (Ri)

where X denotes the constant value contained in the instruction and Ri is the name of the register

involved. The effective address of the operand is given by

EA = X + [Ri ]

70

The contents of the index register are not changed in the process of generating the effective address. In

an assembly language program, the constant X may be given either as an explicit number or as a

symbolic name representing a numerical value. When the instruction is translated into machine code,

the constant X is given as a part of the instruction and is usually represented by fewer bits than the word

length of the computer. Since X is a signed integer, it must be sign-extended to the register length

before being added to the contents of the register. The index register, R1, contains the address of a

memory location, and the value X defines an offset (also called a displacement) from this address to the

location where the operand is found.

An alternative use: Constant X corresponds to a memory address, and the contents of the index

register define the offset to the operand. In either case, the effective address is the sum of two values;

one is given explicitly in the instruction, and the other is stored in a register.

Indexed addressing:

71

To see the usefulness of indexed addressing, consider a simple example involving a list of test

scores for students taking a given course. Assume that the list of scores, beginning at location LIST. A

four-word memory block comprises a record that stores the relevant information for each student. Each

record consists of the student’s identification number (ID), followed by the scores the student earned on

three tests. There are n students in the class, and the value n is stored in location N immediately in front

of the list. The addresses given in the figure for the student IDs and test scores assume that the memory

is byte addressable and that the word length is 32 bits.

We should note that the list in n represents a two-dimensional array having n rows and four columns.

Each row contains the entries for one student, and the columns give the IDs and test scores.

Fig: A list of students’ marks. 72

73

Suppose that we wish to compute the sum of all scores obtained on each of the tests and store

these three sums in memory locations SUM1, SUM2, and SUM3. In the body of the loop, the program

uses the Index addressing mode. To access each of the three scores in a student’s record, Register R0 is

used as the index register. Before the loop is entered, R0 is set to point to the ID location of the first

student record; thus, it contains the address LIST. On the first pass through the loop, test scores of the

first student are added to the running sums held in registers R1, R2, and R3, which are initially cleared

to 0. These scores are accessed using the Index addressing modes 4(R0), 8(R0), and 12(R0). The index

register R0 is then incremented by 16 to point to the ID location of the second student. Register R4,

initialized to contain the value n, is decremented by 1 at the end of each pass through the loop. When

the contents of R4 reach 0, all student records have been accessed, and the loop terminates. Until then,

the conditional branch instruction transfers control back to the start of the loop to process the next

record. The last three instructions transfer the accumulated sums from registers R1, R2, and R3, into

memory locations SUM1, SUM2, and SUM3, respectively. It should be emphasized that the contents of

the index register, R0, are not changed when it is used in the Index addressing mode to access the

scores. The contents of R0 are changed only by the last Add instruction in the loop, to move from one

student record to the next. In general, the Index mode facilitates access to an operand whose location is

defined relative to a reference point within the data structure in which the operand appears. In the

example just given, the ID locations of successive student records are the reference points, and the test

scores are the operands accessed by the Index addressing mode.

Fig: Indexed addressing used in accessing test scores in the list

We have introduced the most basic form of indexed addressing. Several variations of this basic form

provide for very efficient access to memory operands in practical programming situations. For example,

a second register may be used to contain the offset X, in which case we can write the Index mode as

(Ri,R j )

74

The effective address is the sum of the contents of registers Ri and Rj . The second register is usually

called the base register. This form of indexed addressing provides more flexibility in accessing

operands, because both components of the effective address can be changed. As an example of where

this flexibility may be useful, consider again the student record data structure shown in Figure. In the

above program, we used different index values in the three Add instructions at the beginning of the loop

75

to access different test scores. Suppose each record contains a large number of items, many more than

the three test scores of that example. In this case, we would need the ability to replace the three Add

instructions with one instruction inside a second (nested) loop. Just as the successive starting locations

of the records (the reference points) are maintained in the pointer register R0, offsets to the individual

items relative to the contents of R0 could be maintained in another register. The contents of that register

would be incremented in successive passes through the inner loop.

Yet another version of the Index mode uses two registers plus a constant, which can be denoted as

X(Ri,R j )

In this case, the effective address is the sum of the constant X and the contents of registers Ri and Rj .

This added flexibility is useful in accessing multiple components inside each item in a record, where the

beginning of an item is specified by the (Ri,R j ) part of the addressing mode. In other words, this mode

implements a three-dimensional array.

4. RELATIVE ADDRESSING

We have defined the Index mode using general-purpose processor registers. A useful version of

this mode is obtained if the program counter, PC, is used instead of a general purpose register. Then,

X(PC) can be used to address a memory location that is X bytes away from the location presently

pointed to by the program counter. Since the addressed location is identified “relative” to the program

counter, which always identifies the current execution point in a program, the name Relative mode is

associated with this type of addressing.

Relative mode

76

The effective address is determined by the Index mode using the program counter in place of

the general-purpose register Ri. This mode can be used to access data operands. But, its most common

use is to specify the target address in branch instructions. An instruction such as

Branch>0 LOOP

causes program execution to go to the branch target location identified by the name LOOP if the branch

condition is satisfied. This location can be computed by specifying it as an offset from the current value

of the program counter. Since the branch target may be either before or after the branch instruction, the

offset is given as a signed number.

Recall that during the execution of an instruction, the processor increments the PC to point to the next

instruction. Most computers use this updated value in computing the effective address in the Relative

mode. For example, suppose that the Relative mode

is used to generate the branch target address LOOP in the Branch instruction of the program in Figure

2.12. Assume that the four instructions of the loop body, starting at LOOP, are located at memory

locations 1000, 1004, 1008, and 1012. Hence, the updated contents of the PC at the time the branch

target address is generated will be 1016. To branch to location LOOP (1000), the offset value needed is

X = -16. Assembly languages allow branch instructions to be written using labels to denote the branch

target. When the assembler program processes such an instruction, it computes the required offset

value, -16 in this case, and generates the corresponding machine instruction using the addressing mode -

16(PC).

5. ADDITIONAL MODES

We have given a number of common versions of the Index mode, not all of which may be

found in any one computer. Although these modes suffice for general computation, many computers

77

provide additional modes intended to aid certain programming tasks. The two modes described next are

useful for accessing data items in successive locations in the memory.

Auto increment mode The effective address of the operand is the contents of a register specified in the instruction.

After accessing the operand, the contents of this register are automatically incremented to point to the

next item in a list. We denote the Autoincrement mode by putting the specified register in parentheses,

to show that the contents of the register are used as the effective address, followed by a plus sign to

indicate that these contents are to be incremented after the operand is accessed. Thus, the

Autoincrement mode is written as

(Ri )+

Implicitly, the increment amount is 1 when the mode is given in this form. But in a byte addressable

memory, this mode would only be useful in accessing successive bytes of some list. To access

successive words in a byte-addressable memory with a 32-bit word length, the increment must be 4.

Computers that have the Autoincrement mode automatically increment the contents of the register by a

value that corresponds to the size of the accessed operand. Thus, the increment is 1 for byte-sized

operands, 2 for 16-bit operands, and 4 for 32-bit operands. Since the size of the operand is usually

specified as part of the operation code of an instruction, it is sufficient to indicate the

Autoincrement mode as (Ri)+.

If the Autoincrement mode is available, it can be used in the first Add instruction and the second Add

instruction can be eliminated. The modified program is shown in below Fig.

As a companion for the Autoincrement mode, another useful mode accesses the items of a list in the

reverse order:

Autodecrement mode

The contents of a register specified in the instruction are first automatically decremented and

are then used as the effective address of the operand. We denote the Autodecrement mode by putting

the specified register in parentheses, preceded by a minus sign to indicate that the contents of the

register are to be decremented before being used as the effective address. Thus, we write

-(Ri )

Fig: The Autoincrement addressing mode used in the program

In this mode, operands are accessed in descending address order. The reader may

wonder why the address is decremented before it is used in the Autodecrement mode

and incremented after it is used in the Autoincrement mode. The actions performed by the

Autoincrement and Autodecrement addressing modes can obviously be achieved by using two

instructions, one to access the operand and the other to increment or decrement the register that contains

the operand address. Combining the two operations in one instruction reduces the number of

instructions needed to perform the desired task.

78

79

X. RISC

RISC

RISC stands for “ Reduced Instruction Set Computer” .

This research was further developed by the universities of Berkeley and Stanford to give basic architectural models.The IBM was the first company to define the RISC architecture in the 1970s RISC can be described as a philosophy with three basic levels : (i)All instruction will be executed in a single cycle (ii)Memory will only be accessed via load and store instruction. (iii)All executions units will be hardwired with no micro coding. The instruction set is the hardware “language” in which the software tells the processor what to do. The vacated area of chip can be used in ways that accelerate the performance of more commonly used instructions . It becomes easier to optimize the design . Basically the philosophy is, that instructions are handled in parts:

• Fetch the instruction • Get the arguments • Perform the action • Write back the result

which means : r0 = r1 + r2 ro=r1+r2

RISC CHARACTERISTICS

Simple instruction set Same length instructions. 1 machine-cycle instructions

R4000 Internal Block Diagram

.

13

80

CPU Register Overview

It’s a 32 general purpose register. A program counter(PC) register. 2 registers that hold the results of integer multiply and divide operations( HI & LO). The R4000 has no Program Status 81

Word(PSW) register, as such this is covered by the status and cause registers incorporated within the system control coprocessor(CP0).

82

CPU Instruction Set Overview

Each CPU instruction is 32-bits long.

There are three instruction formats : immediate ( I - type) jump (J - type) register (r - type)

14

83

Memory Management Unit (MMU)

The MIPS R4000 processor provides a full-featured MMU which uses an on-chip translation look aside buffer(TLB) to translate virtual addresses into physical addresses. 84

System Control Coprocessor(CP0)

CP0 translates virtual addresses into physical addresses and manages exceptions and transitions between kernel, supervisor, and user states.CP0 also controls the cache subsystem, as well as providing diagnostic control and error recovery facilities.

86

Floating Point Unit(FPU), CP1

R4000 has on-chip floating point unit designated as CP1. The FPU extends the CPU instruction set to perform arithmetic operations

on floating-point values.

The FPU features include :

Full 64-bit Operation. Load and store instruction set. Tightly coupled coprocessor Interface.

87

XII. CISC

CISC, which stands for Complex Instruction Set Computer, is a philosophy for designing chips that are easy to program and which make efficient use of memory. Each instruction in a CISC instruction set might perform a series of operations inside the processor. This reduces the number of instructions required to implement a given program, and allows the programmer to learn a small but flexible set of instructions.

Since the earliest machines were programmed in assembly language and memory was slow and expensive, the CISC philosophy made sense, and was commonly implemented in such large computers as the PDP-11 and the DEC system 10 and 20 machines.

Most common microprocessor designs including the Intel(R) 80x86 and Motorola 68K series also follow the CISC philosophy. As we shall see, recent changes in software and hardware technology have forced a re-examination of CISC. But first, let's take a closer look at the decisions which led to CISC.

CISC philosophy 1:

Use Microcode The earliest processor designs used dedicated (hardwire) logic to decode and execute each instruction in the processor's instruction set. This worked well for simple designs with few registers, but made more complex architectures hard to build, as control path logic can be hard to implement. So, designers switched tactics they built some simple logic to control the data paths between the various elements of the processor, and used a simplified microcode instruction set to control the data path logic. This type of implementation is known as a microprogrammed implementation.

In a microprogrammed system, the main processor has some built-in memory (typically ROM) which contains groups of microcode instructions which correspond with each machine-language instruction. When a machine language instruction arrives at the central processor, the processor executes the corresponding series of microcode instructions. Because instructions could be retrieved up to 10 times faster from a local ROM than from main memory, designers began to put as many instructions as possible into microcode. In fact, some processors could be ordered with custom microcode which would replace frequently used but slow routines in certain application.

There are some real advantages to a microcoded implementation:

Since the microcode memory can be much faster than main memory, an instruction set can be implemented in microcode without losing much speed over a purely hard-wired implementation. New chips are easier to implement and require fewer transistors than implementing the same instruction set with dedicated logic, and a micro programmed design can be modified to handle entirely new instruction sets quickly.

88

Using micro coded instruction sets, the IBM 360 series was able to offer the same programming model across a range of different hardware configurations.

Some machines were optimized for scientific computing, while others were optimized for business computing. However, since they all shared the same instruction set, programs could be moved from machine to machine without re-compilation (but with a possible increase or decrease in performance depending on the underlying hardware.) This kind of flexibility and power made microcoding the preferred way to build new computers for quite some time.

CISC philosophy 2:

Build "rich" instruction sets One of the consequences of using a microprogrammed design is that designers could build more functionality into each instruction. This not only cut down on the total number of instructions required to implement a program, and therefore made more efficient use of a slow main memory, but it also made the assembly-language programmer's life simpler. Soon, designers were enhancing their instruction sets with instructions aimed specifically at the assembly language programmer. Such enhancements included string manipulation operations, special looping constructs, and special addressing modes for indexing through tables in memory.

For example:

ABCD Add Decimal with Extend ADDA Add Address ADDX Add with Extend ASL Arithmetic Shift Left CAS Compare and Swap Operands NBCD Negate Decimal with Extend EORI Logical Exclusive OR Immediate TAS Test Operand and Set

CISC philosophy 3:

Build high-level instruction sets Once designers started building programmer-friendly instruction sets, the logical next step was to build instruction sets which map directly from high-level languages. Not only does this simplify the compiler writer's task, but it also allows compilers to emit fewer instructions per line of source code. Modern CISC microprocessors, such as the 68000, implement several such instructions, including routines for creating and removing stack frames with a single call.

For example: DBcc Test Condition, Decrement and Branch ROXL Rotate with Extend Left

89

RTR Return and Restore Codes SBCD Subtract Decimal with Extend SWAP Swap register Words CMP2 Compare Register against Upper and Lower Bounds

The rise of CISC :

CISC Design Decisions: use microcode, build rich instruction sets, build high-level instruction sets taken together, these three decisions led to the CISC philosophy which drove all computer designs until the late 1980s, and is still in major use today. (Note that "CISC" didn't enter the computer designer's vocabulary until the advent of RISC it was simply the way that everybody designed computers). The next lesson discusses the common characteristics that all CISC designs share, and how those characteristics affect the operation of a CISC machine.

Characteristics of a CISC design Introduction

While the chips that emerged from the 1970s and 1980s followed their own unique design paths, most were bound by what we are calling the "CISC Design Decisions". These chips all have similar instruction sets, and similar hardware architectures. In general terms, the instruction sets are designed for the convenience of the assembly-language programmer and the hardware designs are fairly complex.

Instruction sets

The design constraints that led to the development of CISC (small amounts of slow memory, and the fact that most early machines were programmed in assembly language) give CISC instruction sets some common characteristics. A 2-operand format, where instructions have a source and a destination. For example, the add instruction "add #5, D0" would add the number 5 to the contents of register D0 and place the result in register D0.Register to register, register to memory, and memory to register commands. Multiple addressing modes for Memory, including specialized modes for indexing through arrays Variable length instructions where the length often varies according to the addressing.

Hardware architectures

Most CISC hardware architectures have several characteristics in common. Complex instruction-decoding logic, driven by the need for a single instruction to support multiple addressing modes. A small number of general purpose registers. This is the direct result of having instructions which can operate directly on memory and the limited amount of chip space not dedicated to instruction decoding, execution, and microcode storage. Several special purpose registers. Many CISC designs set aside special registers

90

for the stack pointer, interrupt handling, and so on. This can simplify the hardware design somewhat, at the expense of making the instruction set more complex. A "Condition code" register which is set as a side-effect of most instructions. This register reflects whether the result of the last operation is less than, equal to, or greater than zero, and records if certain error conditions occur. The ideal CISC machine

CISC processors were designed to execute each instruction completely before beginning the next instruction. Even so, most processors break the execution of an instruction into several definite stages; as soon as one stage is finished, the processor passes the result to the next stage. An instruction is fetched from main memory. The instruction is decoded, the controlling code from the microprogram identifies the type of operation to be performed, where to find the data on which to perform the operation, and where to put the result. If necessary, the processor reads in additional information from memory. The instruction is executed. the controlling code from the microprogram determines the circuitry/hardware that will perform the operation. The results are written to memory.

In an ideal CISC machine, each complete instruction would require only one clock cycle (which means that each stage would complete in a fraction of a cycle.) In fact, this is the maximum possible speed for a machine that executes 1 instruction at a time. A realistic CISC machine

In reality, some instructions may require more than one clock per stage, as the animation shows. However, a CISC design can tolerate this slowdown since the idea behind CISC is to keep the total number of cycles small by having complicated things happen within each cycle. The advantages of CISC

At the time of their initial development, CISC machines used available technologies to optimize computer performance.

Microprogramming is as easy as assembly language to implement, and much less expensive than hardwiring a control unit.

The ease of microcoding new instructions allowed designers to make CISC machines upwardly compatible: a new computer could run the same programs as earlier computers because the new computer would contain a superset of the instructions of the earlier computers. As each instruction became more capable, fewer instructions could be used to implement a given task. This made more efficient use of the relatively slow main memory.

Because microprogram instruction sets can be written to match the constructs of high-level languages, the compiler does not have to be as complicated.

The disadvantages of CISC

91

Earlier generations of a processor family generally were contained as a subset in every new version so instruction set & chip hardware become more complex with each generation of computers.

So that as many instructions as possible could be stored in memory with the least possible wasted space, individual instructions could be of almost any length this means that different instructions will take different amounts of clock time to execute, slowing down the overall performance of the machine.

Many specialized instructions aren't used frequently enough to justify their existence approximately 20% of the available instructions are used in a typical program.

CISC instructions typically set the condition codes as a side effect of the instruction. Not only does setting the condition codes take time, but programmers have to remember to examine the condition code bits before a subsequent instruction changes them.

XII. ALU DESIGN

An Arithmetic and Logic Unit (ALU) is a combinational circuit that performs logic and arithmetic micro-operations on a pair of n-bit operands (ex. A[3:0] and B[3:0]). The operations performed by an ALU are controlled by a set of function-select inputs. In this lab you will design a 4-bit ALU with 3 function-select inputs: Mode M, Select S1 and S0 inputs. The mode input M selects between a Logic (M=0) and Arithmetic (M=1) operation. The functions performed by the ALU are specified in Table I.

Table 1: Functions of ALU M = 0 Logic

S1 S0 C0 FUNCTION OPERATION (bit wise) 0 0 X AiBBi AND 0 1 X Ai + Bi OR 1 0 X Ai⊕ Bi XOR 1 1 X (Ai⊕ Bi)’ XNOR

M = 1 Arithmetic S1 S0 C0 FUNCTION OPERATION 0 0 0 A Transfer A 0 0 1 A + 1 Increment A by 1 0 1 0 A + B Add A and B 0 1 1 A + B + 1 Increment the sum of A and B by 1 1 0 0 A + B' A plus one's complement of B 1 0 1 A - B Subtract B from A (i.e. B' + A + 1) 1 1 0 A' + B B plus one's complement of A

92

1 1 1 B - A B minus A (or A' + B + 1) A block diagram is given in Figure 1.

Figure 1: Block diagram of the 4-bit ALU.

When doing arithmetic, you need to decide how to represent negative numbers. As is commonly done in digital systems, negative numbers are represented in two’s complement. This has a number of advantages over the sign and magnitude representation such as easy addition or subtraction of mixed positive and negative numbers. Also, the number zero has a unique representation in two’s complement. The two’s complement of a n-bit number N is defined as,

2n - N = (2n - 1 - N) + 1

The last representation gives us an easy way to find two’s complement: take the bit wise complement of the number and add 1 to it. As an example, to represent the number -5, we take two’s complement of 5 (=0101) as follows,

5 0 1 0 1 --> 1 0 1 0 (bit wise complement) + 1 1 0 1 1 (two’s complement)

Numbers represented in two’s complement lie within the range -(2n-1) to +(2n-1 - 1). For a 4-bit number this means that the number is in the range -8 to +7. There is a potential problem we still need to be aware of when working with two's complement, i.e. over- and underflow as is illustrated in the example below, 0 1 0 0 (=carry Ci) +5 0 1 0 1 +4 + 0 1 0 0 +9 0 1 0 0 1 = -7!

93

also,

1 0 0 0 (=carry Ci) -7 1 0 0 1 -2 + 1 1 1 0 -9 1 0 1 1 1 = +7!

Both calculations give the wrong results (-7 instead of +9 or +7 instead of -9) which is caused by the fact that the result +9 or -9 is out of the allowable range for a 4-bit two’s complement number. Whenever the result is larger than +7 or smaller than -8 there is an overflow or underflow and the result of the addition or subtraction is wrong. Overflow and underflow can be easily detected when the carry out of the most significant stage (i.e. C4 ) is different from the carry out of the previous stage (i.e. C3). You can assume that the inputs A and B are in two’s complement when they are presented to the input of the ALU.

b. Design strategies

When designing the ALU we will follow the principle "Divide and Conquer" in order to use a modular design that consists of smaller, more manageable blocks, some of which can be re-used. Instead of designing the 4-bit ALU as one circuit we will first design a one-bit ALU, also called a bit-slice. These bit-slices can then be put together to make a 4-bit ALU.

There are different ways to design a bit-slice of the ALU. One method consists of writing the truth table for the one-bit ALU. This table has 6 inputs (M, S1, S0, C0, Ai and Bi) and two outputs Fi and Ci+1. This can be done but may be tedious when it has to be done by hand.

An alternative way is to split the ALU into two modules, one Logic and one Arithmetic module. Designing each module separately will be easier than designing a bit-slice as one unit. A possible block diagram of the ALU is shown in Figure 2. It consists of three modules: 2:1 MUX, a Logic unit and an Arithmetic unit.

94

Figure 2: Block diagram of a bit-slice ALU

c.Displaying the results

In order the easily see the output of the ALU you will display the results on the seven-segment displays and the LEDs (LD).

1. The result of the logic operation can be displayed on the LEDs (LD). Use also one of these LEDs to display the overflow flag V.

2. Since you are working with a 4-bit representation for 2's complement numbers, the maximum positive number is +7 and the most negative number is –8. Thus a single seven-segment display can be used to show the magnitude of the number. Use another seven-segment display for the “-“ sign (e.g. use segment “g”).

3. There is one complication when using more than one of the seven-segment displays on the Digilab board, as can be seens from the connections of the LED segments of the displays. You will notice that the four seven-segment displays share the same cathodes A, B, ..., G). This implies that one cannot directly connect the signals for the segments of the magnitude and sign to these terminals, since that would short the outputs of the gates which would damage the FPGA!. How could you solve this problem? Sketch a possible solution in your lab notebook. (Hint: You can alternate the signals applied to the cathodes between those of the Magnitude and Sign displays. If you do this faster than 30 times per second the eye will not notice the flickering. You will also need to alternate the anode signals). What type of circuit will be needed to accomplish this? You can make use of an on-chip clock, called OSC4 that provides clock signals of 8MHz, 500KHz, 590Hz and 15Hz.

4. Figure 3 shows a schematic of the overall system, consisting of the ALU, Decoder and Switching circuit, and Displays on the Digital lab board.

95

http://www.seas.upenn.edu/~ese201/boards/digilab.html#LEDS

http://www.seas.upenn.edu/~ese201/boards/digilab.html#Figure4

Figure 3: Overall system, including the 4-bit ALU and display units.

d. Tasks:

Do the following tasks prior to coming to the lab. Write the answers to all questions in your lab notebook prior to coming to the lab. There is no on-line submission for the pre-lab. Ask the TA to sign pre-lab section in your lab notebook at the start of the lab session. You will also need to include answer to the pre-lab questions in your lab report.

1. Design the MUX. You can choose to design the MUX with gates or by writing HDL (VHDL) code. Choose one of the two methods and write the design down in your lab notebook.

2. Design of the Logic unit. Here you also have several choices to design this unit:

a. Write truth table, derive the K-map and give the minimum gate implementation b. Use a 4:1 MUX and gates c. Write an HDL file As part of the pre-lab, you can choose any of the three methods. Briefly justify why you chose a particular design method. Explain the design procedure and give the logic diagram or the HDL file. In case you use a MUX, you need also to give the schematic or the HDL file for the MUX.

3. Design the arithmetic unit. Again, here you have different choices to design and implement the arithmetic unit. A particularly attractive method is one that makes use of previously designed modules, such as your Full Adder. The arithmetic unit performs basically additions on a set of inputs. By choosing the proper inputs, one can perform a range of operations. This approach is shown in Figure 4. The only

96

blocks that need to be designed are the A Logic and B Logic circuits. You can make use of your previously designed full adder (MYFA).

Figure 4: Schematic block diagram of the arithmetic unit.

a. Give the truth tables for the Xi and Yi functions with as inputs S1, S0 and Ai, and S1, S0 and Bi, respectively. Fill out the following tables. Notice that in definition table I of the ALU, the variable C0 acts as the Carry input. Depending on the value of C0, one performs the function on the odd or even entries of the definition table I. As an example the first entry is "transfer A" (for C0=0) while the second one is "A+1" (for C0=1); Similarly for A + B and A + B + 1, etc.

S1 S0 AiXi (A

Logic)S1 S0 Bi

Yi (B

Logic)0 0 0 . 0 0 0 . 0 0 1 . 0 0 1 . 0 1 0 . 0 1 0 . 0 1 1 . 0 1 1 . 1 0 0 . 1 0 0 . 1 0 1 . 1 0 1 . 1 1 0 . 1 1 0 . 1 1 1 . 1 1 1 .

Table II: Truth tables for the A and B logic circuits.

b. Give the K-map for Xi and Yi functions. Find the minimum realization for Xi and Yi.

c. Draw the logic diagram for Xi and Yi. d. Design the circuit that detects over- or underflow.

4.Design the decoder for the seven-segment displays. Remember that the segments of the display are active-low. The decoders should be designed in such a way that when the Logic Mode (M=0) is selected, only the LEDs are active and when the Arithmetic Mode (M=1) is selected only the seven-segment displays are active.

97

http://www.seas.upenn.edu/~ese201/boards/digilab.html#SevenSegmentDisplay

XIII. FIXED AND FLOATING-POINT OPERATION

Definition:

An arithmetic operation performed on floating-point numbers; "this computer can

perform a million flops per second".

Floating point hardware was standard throughout the 7090/94 family. The 7090

had single precision (36-bit) floating point operations while the 7094/7094 II machines

also provided double precision (72-bit) floating point instructions. The fraction was

considered normalized if Bit-9 (or Bit-18 in Double Precision) contained the first 1-bit of

the fraction so that the floating point word was positioned to have no leading zero bits.

The characteristic for single precision numbers consisted of eight bits (Bits 1-8)

and defined the exponent of the number. Since the exponent could either be positive or

negative, but the hardware sign bit was already allocated for the fraction, then the

exponent was algebraically signed in so-called excess form where the characteristic was

formed by subtracting +128 from the exponent (e.g., an exponent of +12 would be coded

as 140 and -30 would be coded as 98). The allowable range for the single precision

exponent was -128 (decimal) to +127 (decimal) which yielded a floating point range

between approximately 10E-39 to 10E+39 (decimal).

As example, single precision floating point 10.00 (decimal) was represented as

204500000000 (octal) which yielded a sign bit of 0; a characteristic of 204 (octal); and a

mantissa of 500000000 (octal). The zero sign bit indicated an algebraically positive

number; the 204 (octal) or 132 (decimal) characteristic indicated, after subtracting 128

(decimal), an exponent of 4; and the mantissa of 500000000 (octal) indicated a fraction of

(2 ** -2) + (2 ** -3) or 0.63 (decimal). Therefore, the floating point number was (2 ** 4)

* (0.63) or 10.00.

Other floating point examples: 0.00390625 (decimal) was represented by

171400000000 (octal); 44.00 (decimal) was represented by 206510000000 (octal); and -

20.00 (decimal) was represented by 605500000000 (octal).

98

IEEE STANDARD 754 FLOATING POINT NUMBERS

IEEE Standard 754 floating point is the most common representation today for real numbers on computers, including Intel-based PC's, Macintoshes, and most Unix platforms. This article gives a brief overview of IEEE floating point and its representation. Discussion of arithmetic implementation may be found in the book mentioned at the bottom of this article.

Floating Point Numbers

There are several ways to represent real numbers on computers. Fixed point places a radix point somewhere in the middle of the digits, and is equivalent to using integers that represent portions of some unit. For example, one might represent 1/100ths of a unit; if you have four decimal digits, you could represent 10.82, or 00.01. Another approach is to use rationals, and represent every number as the ratio of two integers.

Floating-point representation - the most common solution - basically represents reals in scientific notation. Scientific notation represents numbers as a base number and an exponent. For example, 123.456 could be represented as 1.23456 × 102. In hexadecimal, the number 123.abc might be represented as 1.23abc × 162.

Floating-point solves a number of representation problems. Fixed-point has a fixed window of representation, which limits it from representing very large or very small numbers. Also, fixed-point is prone to a loss of precision when two large numbers are divided. Floating-point, on the other hand, employs a sort of "sliding window" of precision appropriate to the scale of the number. This allows it to represent numbers from 1,000,000,000,000 to 0.0000000000000001 with ease.

Storage Layout

IEEE floating point numbers have three basic components: the sign, the exponent, and the mantissa. The mantissa is composed of the fraction and an implicit leading digit (explained below). The exponent base (2) is implicit and need not be stored.

The following figure shows the layout for single (32-bit) and double (64-bit) precision floating-point values. The number of bits for each field are shown (bit ranges are in square brackets):

Sign Exponent Fraction Bias

Single Precision 1 [31] 8 [30-23] 23 [22-00] 127

Double Precision 1 [63] 11 [62-52] 52 [51-00] 1023

99

The Sign Bit

The sign bit is as simple as it gets. 0 denotes a positive number; 1 denotes a negative number. Flipping the value of this bit flips the sign of the number.

The Exponent

The exponent field needs to represent both positive and negative exponents. To do this, a bias is added to the actual exponent in order to get the stored exponent. For IEEE single-precision floats, this value is 127. Thus, an exponent of zero means that 127 is stored in the exponent field. A stored value of 200 indicates an exponent of (200-127), or 73. For reasons discussed later, exponents of -127 (all 0s) and +128 (all 1s) are reserved for special numbers. For double precision, the exponent field is 11 bits, and has a bias of 1023.

The Mantissa

The mantissa, also known as the significand, represents the precision bits of the number. It is composed of an implicit leading bit and the fraction bits. To find out the value of the implicit leading bit, consider that any number can be expressed in scientific notation in many different ways. For example, the number five can be represented as any of these:

5.00 × 100

0.05 × 102

5000 × 10-3

In order to maximize the quantity of representable numbers, floating-point numbers are typically stored in normalized form. This basically puts the radix point after the first non-zero digit. In normalized form, five is represented as 5.0 × 100.

A nice little optimization is available to us in base two, since the only possible non-zero digit is 1. Thus, we can just assume a leading digit of 1, and don't need to represent it explicitly. As a result, the mantissa has effectively 24 bits of resolution, by way of 23 fraction bits.

Putting it All Together

1. The sign bit is 0 for positive, 1 for negative.

2. The exponent's base is two.

3. The exponent field contains 127 plus the true exponent for single-precision, or 1023 plus the true exponent for double precision.

100

4. The first bit of the mantissa is typically assumed to be 1.f, where f is the field of fraction bits.

Ranges of Floating-Point Numbers

Let's consider single-precision floats for a second. Note that we're taking essentially a 32-bit number and re-jiggering the fields to cover a much broader range. Something has to give, and it's precision. For example, regular 32-bit integers, with all precision centered around zero, can precisely store integers with 32-bits of resolution. Single-precision floating-point, on the other hand, is unable to match this resolution with its 24 bits. It does, however, approximate this value by effectively truncating from the lower end. For example:

11110000 11001100 10101010 00001111 // 32-bit integer = +1.1110000 11001100 10101010 x 231 // Single-Precision Float = 11110000 11001100 10101010 00000000 // Corresponding Value

This approximates the 32-bit value, but doesn't yield an exact representation. On the other hand, besides the ability to represent fractional components (which integers lack completely), the floating-point value can represent numbers around 2127, compared to 32-bit integers maximum value around 232.

The range of positive floating point numbers can be split into normalized numbers (which preserve the full precision of the mantissa), and denormalized numbers (discussed later) which use only a portion of the fractions's precision.

Denormalized Normalized Approximate Decimal

Single Precision ± 2-149 to (1-2-23)×2-126 ± 2-126 to (2-2-

23)×2127 ± ~10-44.85 to ~1038.53

Double Precision

± 2-1074 to (1-2-52)×2-

1022± 2-1022 to (2-2-

52)×21023 ± ~10-323.3 to ~10308.3

Since the sign of floating point numbers is given by a special leading bit, the range

for negative numbers is given by the negation of the above values. There are five distinct numerical ranges that single-precision floating-point numbers are not able to represent:

1. Negative numbers less than -(2-2-23) × 2127 (negative overflow)

2. Negative numbers greater than -2-149 (negative underflow)

3. Zero

4. Positive numbers less than 2-149 (positive underflow)

101

5. Positive numbers greater than (2-2-23) × 2127 (positive overflow)

Overflow means that values have grown too large for the representation, much in the same way that you can overflow integers. Underflow is a less serious problem because is just denotes a loss of precision, which is guaranteed to be closely approximated by zero. Here's a table of the effective range (excluding infinite values) of IEEE floating-point numbers:

Binary Decimal

Single ± (2-2-23) × 2127 ~ ± 1038.53

Double ± (2-2-52) × 21023 ~ ± 10308.25

Note that the extreme values occur (regardless of sign) when the exponent is at the maximum value for finite numbers (2127 for single-precision, 21023 for double), and the mantissa is filled with 1s (including the normalizing 1 bit).

Special Values

IEEE reserves exponent field values of all 0s and all 1s to denote special values in the floating-point scheme.

Zero

As mentioned above, zero is not directly representable in the straight format, due to the assumption of a leading 1 (we'd need to specify a true zero mantissa to yield a value of zero). Zero is a special value denoted with an exponent field of zero and a fraction field of zero. Note that -0 and +0 are distinct values, though they both compare as equal.

Denormalized

If the exponent is all 0s, but the fraction is non-zero (else it would be interpreted as zero), then the value is a denormalized number, which does not have an assumed leading 1 before the binary point. Thus, this represents a number (-1)s × 0.f × 2-126, where s is the sign bit and f is the fraction. For double precision, denormalized numbers are of the form (-1)s × 0.f × 2-1022. From this you can interpret zero as a special type of denormalized number.

Infinity

The values +infinity and -infinity are denoted with an exponent of all 1s and a fraction of all 0s. The sign bit distinguishes between negative infinity and positive

102

infinity. Being able to denote infinity as a specific value is useful because it allows operations to continue past overflow situations. Operations with infinite values are well defined in IEEE floating point.

Not A Number

The value NaN (Not a Number) is used to represent a value that does not represent a real number. NaN's are represented by a bit pattern with an exponent of all 1s and a non-zero fraction. There are two categories of NaN: QNaN (Quiet NaN) and SNaN (Signalling NaN).

A QNaN is a NaN with the most significant fraction bit set. QNaN's propagate freely through most arithmetic operations. These values pop out of an operation when the result is not mathematically defined.

An SNaN is a NaN with the most significant fraction bit clear. It is used to signal an exception when used in operations. SNaN's can be handy to assign to uninitialized variables to trap premature usage. Semantically, QNaN's denote indeterminate operations, while SNaN's denote invalid operations.

Special Operations

Operations on special numbers are well-defined by IEEE. In the simplest case, any operation with a NaN yields a NaN result. Other operations are as follows:

Operation Result

n ÷ ±Infinity 0

±Infinity × ±Infinity ±Infinity

±nonzero ÷ 0 ±Infinity

Infinity + Infinity Infinity

±0 ÷ ±0 NaN

Infinity - Infinity NaN

±Infinity ÷ ±Infinity NaN

±Infinity × 0 NaN

To sum up, the following are the corresponding values for a given representation:

103

Float Values (b = bias)

Sign Exponent (e) Fraction (f) Value

0 00..00 00..00 +0

0 00..00 00..01

: 11..11

Positive Denormalized Real 0.f × 2(-b+1)

0 00..01

: 11..10

XX..XX Positive Normalized Real 1.f × 2(e-b)

0 11..11 00..00 +Infinity

0 11..11 00..01

: 01..11

SNaN

0 11..11 10..00

: 11..11

QNaN

1 00..00 00..00 -0

1 00..00 00..01

: 11..11

Negative Denormalized Real -0.f × 2(-b+1)

1 00..01

: 11..10

XX..XX Negative Normalized Real -1.f × 2(e-b)

1 11..11 00..00 -Infinity

1 11..11 00..01

: 01..11

SNaN

1 11..11 10..00

: 11.11

QNaN

104

coa unit 1 notes

Documents