coa

Upload: nbpr

Post on 19-Oct-2015

11 views

Category:

Documents


0 download

TRANSCRIPT

  • FUNCTIONAL UNITS OF A COMPUTER SYSTEMThe computer system is divided into three separate units for its operation. These are 1)arithmetic logical unit, 2) control unit, and 3) central processing unit. 4) Input output unit.

    (a) Arithmetic Logical Unit (ALU)After you enter data through the input device it is stored in the primary storage unit.Arithmetic Logical Unit performs the actual processing of data and instruction. Themajor operations performed by the ALU are addition, subtraction, multiplication,division, logic and comparison. Data is transferred to ALU from storage unit whenrequired. After processing, the output is returned back to storage unit for furtherprocessing or getting stored.(b)Control Unit The next component of computer is the Control Unit, which acts like thesupervisor seeing whether things are done in proper fashion. The control unit determinesthe sequence in which computer programs and instructions are executed. Things likeprocessing of programs stored in the main memory, interpretation of the instructions andissuing of signals for other units of the computer to execute them. It also acts as a switchboard operator when several users access the computer simultaneously. Thereby itcoordinates the activities of computers peripheral equipment as they perform the inputand output. Therefore it is the manager of all operations mentioned in the previoussection.(c) Central Processing Unit (CPU)The ALU and the CU of a computer system are jointly known as the central processingunit. You may call CPU as the brain of any computer system. It is just like a human brainthat takes all major decisions, makes all sorts of calculations and directs different part ofthe computer by activating and controlling the operations.(d)Input/output unitA computer must receive both data and program statements to function properly and beable to solve problems. The method of feeding data and programs to a computer isaccomplished by an input device. Computer input devices read data from a source, suchas magnetic disks, and translate that data into electronic impulses for transfer into theCPU. Some typical input devices are a keyboard, a mouse, or a scanner.Output unit sends processed results to the outside world. Examples: Display screens,

    computer organization and architectureUnit no: 1

  • Basic Operational Concepts of a Computer Most computer operations are executed in the ALU (arithmetic and logic unit) of aprocessor. Example: to add two numbers that are both located in memory. Each number is brought into the processor, and the actual addition is carried out by theALU. The sum then may be stored in memory or retained in the processor for immediate use.Registers When operands are brought into the processor, they are stored in high-speed storageelements (registers). A register can store one piece of data (8-bit registers, 16-bit registers, 32-bit registers,64-bit registers, etc) Access times to registers are faster than access times to the fastest cache unit in thememory hierarchy.Instructions Instructions for a processor are defined in the ISA (Instruction Set Architecture) Level2 Typical instructions include: Mov BX, LocA Fetch the instruction Fetch the contents of memory location LocA Store the contents in general purpose register BX Add AX,BX Fetch the instruction Add the contents of registers BX and AX Place the sum in register AXHow are instructions sent between memory and the processor The program counter (PC) or instruction pointer (IP) contains the memory address ofthe next instruction to be fetched and executed. Send the address of the memory location to be accessed to the memory unit and issuethe appropriate control signals (memory read). The instruction register (IR) holds the instruction that is currently being executed. Timing is crucial and is handled by the control unit within the processor.

    printers, plotters, modems, microfilms,synthesizers, high-tech blackboards,film recorders.

  • Single BUS STRUCTURESBus structure and multiple bus structures are types of bus or computing. A bus is

    basically a subsystem which transfers data between the components of Computercomponents either within a computer or between two computers. It connects peripheraldevices at the same time.

    - A multiple Bus Structure has multiple inter connected service integration busesand for each bus the other buses are its foreign buses. A Single bus structure is verysimple and consists of a single server.

    - A bus can not span multiple cells. And each cell can have more than one buses. -Published messages are printed on it. There is no messaging engine on Single busstructure

    I)In single bus structure all units are connected in the same bus than connectingdifferent buses as multiple bus structure.i)multiple bus structure's performance is better than single bus structure.ii)single bus structure's cost is cheap than multiple bus structure.

    Performance:

    Instruction formats The purpose of an instruction is to specify both an operation to be carried

    out by a CPU or other processor and the set of operands or data to be used in theoperation. The operands include the input data or arguments of the operation and theresults that are produced.

    most instruction specify a register-transfer operation of the formX1:= op(x1,x2,.xn)

    In the 680x0 family, simple instructions are assigned short formatse.g. the add-register instruction

    ADD.L D1,D2Denotes register-to register addition of 32-bit operands, that is

    D2 := D2 + D1Where D1 & D2 are two of the 680x0s data registers.

    the instruction specifies the memory-to-register addition operation

  • D2 := D2 + M(ADR1)Instruction format of the RISC 1

    Instruction Types instructions are divided into the following five types:

    1)Data-transfer instruction, which copy information from one location toanother location either in the processors internal register set or in the external mainmemory.

    Operation :MOVE,LOAD,STORE,SWAP,PUSH,POP

    2)Arithmetic instructions, which perform operations on numerical data.Operation: ADD,ADD WITH CARRY,SUBTRACT,MULTIPLY

    3)Logical instructions, which include Boolean and other nonnumericalOperations.

    Operation: AND,OR,NOT,EXCLUSIVE OR,LOGICAL SHIFT

    4)Program control instructions, such as branch instruction, which changethe sequence in which programs are executed.

    Operation: JUMP,RETURN,EXECUTE,SKIP CONDITIONALCOMPARE,TEST,WAIT

    5)Input-output (IO) instructions, which cause information to be transferredBetween the processor or its main memory and external IO devices.

    Operation: INPUT, OUTPUT, START IO, TEST IO, HALT IO

    Opcode source Rs destination Rd source S2

    Set conditioncode Set immediateaddress

  • Computer softwaresoftware is a general term used to describe the role that computer programs,

    procedures and documentation play in a computer system. The term includes: Application software, such as word processors which perform productive tasks

    for users. Firmware, which is software programmed resident to electrically

    programmable memory devices on board mainboards or other types of integratedhardware carriers.

    Middleware, which controls and co-ordinates distributed systems. System software such as operating systems, which interface with hardware to

    provide the necessary services for application software. Software testing is a domain dependent of development and programming.

    Software testing consists of various methods to test and declare a software product fitbefore it can be launched for use by either an individual or a group.

    Testware, which is an umbrella term or container term for all utilities andapplication software that serve in combination for testing a software package but notnecessarily may optionally contribute to operational purposes. As such, testware is not astanding configuration but merely a working environment for application software orsubsets thereof.

    Types of software

    System softwareSystem software helps run the computer hardware and computer system. It includes acombination of the following:

    device drivers operating systems servers utilities windowing systems

  • The purpose of systems software is to unburden the applications programmer from theoften complex details of the particular computer being used, including such accessoriesas communications devices, printers, device readers, displays and keyboards, and also topartition the computer's resources such as memory and processor time in a safe and stablemanner. Examples are- Windows XP, Linux and Mac.Programming software

    Programming software usually provides tools to assist a programmer in writingcomputer programs, and software using different programming languages in a moreconvenient way. The tools include:

    compilers debuggers interpreters linkers text editors

    Application softwareApplication software allows end users to accomplish one or more specific (not

    military softwaremolecular modeling software

    image editing spreadsheet simulation software Word processing Decision making software

    Instruction Set Architecture (ISA)The Instruction Set Architecture (ISA) is the part of the processor that is visible to

    the programmer or compiler writer. The ISA serves as the boundary between softwareand hardware. We will briefly describe the instruction sets found in many of themicroprocessors used today. The ISA of a processor can be described using 5 catagories:Operand Storage in the CPU

    Where are the operands kept other than in memory?Number of explicit named operands

    directly computer development related) tasks. Typical applications include: industrial automation business software computer games quantum chemistry and solid state physics software telecommunications (i.e., the internet and everything that flows on it) databases educational software medical software

  • How many operands are named in a typical instruction.Operand location

    Can any ALU instruction operand be located in memory? Or must all operands bekept internally in the CPU?

    OperationsWhat operations are provided in the ISA.Type and size of operandsWhat is the type and size of each operand and how is it specified?Of all the above the most distinguishing factor is the first.The 3 most common types of ISAs are:1. Stack - The operands are implicitly on top of the stack.2. Accumulator - One operand is implicitly the accumulator.3. General Purpose Register (GPR) - All operands are explicitely mentioned, they areeither registers or memory locations.

    Lets look at the assembly code ofA = B + C;

    Stack Accumulator GPRPUSH A LOAD A LOAD R1,APUSH B ADD B ADD R1,BADD STORE C STORE R1,CPOP C - -

    StackAdvantages: Simple Model of expression evaluation (reverse polish). Short instructions.Disadvantages: A stack can't be randomly accessed This makes it hard to generateeficient code. The stack itself is accessed every operation and becomes a bottleneck.AccumulatorAdvantages: Short instructions. Disadvantages: The accumulator is only temporarystorage so memory traffic is the highest for this approach.Advantages: Makes code generation easy. Data can be stored for long periods inregisters. Disadvantages: All operands must be named leading to longer instructions.

    Earlier CPUs were of the first 2 types but in the last 15 years all CPUs made are GPRprocessors. The 2 major reasons are that registers are faster than memory, the more datathat can be kept internaly in the CPU the faster the program wil run. The other reason is

    that registers are easier for a compiler to use.

  • superscalar processor--can execute more than one instructions per cycle.cycle--smallest unit of time in a processor.parallelism--the ability to do more than one thingat once.

    pipelining--overlapping parts of a large task to increase throughput without decreasinglatency

    Addressing ModesThe addressing mode specifies a rule for interpreting or translating the address field

    of the instruction into the effective address from where the operand is actually referenced.

    Types of addressing modes are

    Immediate Addressing:

  • Direct Addressing:In direct addressing mode, effective address of the operand is given in the address field ofthe instruction. It requires one memory reference to read the operand from the givenlocation and provides only a limited address space. Length of the address field is usuallyless than the word length.Ex : Move P, Ro, Add Q, Ro P and Q are the address of operand.Indirect Addressing:Indirect addressing mode, the address field of the instruction refers to the address of aword in memory, which in turn contains the full length address of the operand. Theadvantage of this mode is that for the word length of N, an address space of 2N can beaddressed. He disadvantage is that instruction execution requires two memory referenceto fetch the operand Multilevel or cascaded indirect addressing can also be used.

    Register Addressing:Register addressing mode is similar to direct addressing. The only difference is that theaddress field of the instruction refers to a register rather than a memory location 3 or 4bits are used as address field to reference 8 to 16 generate purpose registers. Theadvantages of register addressing are Small address field is needed in the instruction.

    Register Indirect Addressing:This mode is similar to indirect addressing. The address field of the instruction refers to aregister. The register contains the effective address of the operand. This mode uses onememory reference to obtain the operand. The address space is limited to the width of theregisters available to store the effective address.Displacement Addressing:In displacement addressing mode there are 3 types of addressing mode. They are :

    1) Relative addressing 2) Base register addressing 3) Indexing addressing.This is a combination of direct addressing and register indirect addressing. The valuecontained in one address field. A is used directly and the other address refers to a registerwhose contents are added to A to produce the effective address.

    Stack Addressing:Stack is a linear array of locations referred to as last-in first out queue. The stack is areserved block of location, appended or deleted only at the top of the stack. Stackpointer is a register which stores the address of top of stack location. This mode ofaddressing is also known as implicit addressing.

    This is the simplest form of addressing. Here, the operand is given in the instructionitself. This mode is used to define a constant or set initial values of variables. Theadvantage of this mode is that no memory reference other than instruction fetch isrequired to obtain operand. The disadvantage is that the size of the number is limited tothe size of the address field, which most instruction sets is small compared to wordlength.

  • Reduced Instruction Set Computer (RISC)As we mentioned before most modern CPUs are of the GPR (General Purpose

    Register) type. A few examples of such CPUs are the IBM 360, DEC VAX, Intel 80x86and Motorola 68xxx. But while these CPUS were clearly better than previous stack andaccumulator based CPUs they were still lacking in several areas:

    1. Instructions were of varying length from 1 byte to 6-8 bytes. This causes problemswith the pre-fetching and pipelining of instructions.2. ALU (Arithmetic Logical Unit) instructions could have operands that werememory locations. Because the number of cycles it takes to access memory varies sodoes the whole instruction. This isn't good for compiler writers, pipelining andmultiple issue.3. Most ALU instruction had only 2 operands where one of the operands is also thedestination. This means this operand is destroyed during the operation or it must besaved before somewhere.Thus in the early 80's the idea of RISC was introduced. The SPARC project was startedat Berkeley and the MIPS project at Stanford. RISC stands for Reduced Instruction SetComputer. The ISA is composed of instructions that all have exactly the same size,usualy 32 bits. Thus they can be pre-fetched and pipelined succesfuly. All ALUinstructions have 3 operands which are only registers. The only memory access is throughexplicit LOAD/STORE instructions.

    Thus A = B + C will be assembled as:LOAD R1,ALOAD R2,BADD R3,R1,R2STORE C,R3

    Although it takes 4 instructions we can reuse the values in the registers.

    Why is this architecture called RISC?The answer is that to make all instructions the same length the number of bits that

    are used for the opcode is reduced. Thus less instructions are provided. The instructionsthat were thrown out are the less important string and BCD (binary-coded decimal)operations. In fact, now that memory access is restricted there aren't several kinds ofMOV or ADD instructions. Thus the older architecture is called CISC (CompleteInstruction Set Computer). RISC architectures are also called LOAD/STOREarchitectures.The number of registers in RISC is usualy 32 or more. The first RISC CPU the MIPS2000 has 32 GPRs as opposed to 16 in the 68xxx architecture and 8 in the 80x86architecture. The only disadvantage of RISC is its code size. Usualy more instructions areneeded and there is a waste in short instructions (POP, PUSH).

  • The CISC ApproachThe primary goal of CISC architecture is to complete a task in as few lines of

    assembly as possible. This is achieved by building processor hardware that is capable ofunderstanding and executing a series of operations. For this particular task, a CISCprocessor would come prepared with a specific instruction (we'll call it "MULT"). Whenexecuted, this instruction loads the two values into separate registers, multiplies theoperands in the execution unit, and then stores the product in the appropriate register.Thus, the entire task of multiplying two numbers can be completed with one instruction:

    MULT 2:3, 5:2MULT is what is known as a "complex instruction." It operates directly on the computer'smemory banks and does not require the programmer to explicitly call any loading orstoring functions. It closely resembles a command in a higher level language. Forinstance, if we let "a" represent the value of 2:3 and "b" represent the value of 5:2, thenthis command is identical to the C statement "a = a * b."

    One of the primary advantages of this system is that the compiler has to do very littlework to translate a high-level language statement into assembly. Because the lengthof the code is relatively short, very little RAM is required to store instructions. Theemphasis is put on building complex instructions directly into the hardware.

    CISC RISCEmphasis on hardware Emphasis on softwareIncludes multi-clockcomplex instructions

    Single-clock, reducedinstruction only

    Memory-to-memory:"LOAD" and "STORE"incorporated ininstructions

    Register to register:"LOAD" and "STORE" areindependent instructions

    Small code sizes, highcycles per second

    Low cycles per second,large code sizes

    Transistors used forstoring complexinstructions

    Spends more transistorson memory registers

  • http:/

    /csetu

    be.co.

    nr/

  • http:/

    /csetu

    be.co.

    nr/

  • http:/

    /csetu

    be.co.

    nr/

    BASIC PROCESSING UNITControl Unit has two major functions:

    To control the sequencing of information-processing tasksperformed by machine Guiding and supervising each unit to make sure that eachunit carries out every operation assigned at the proper time

    Control of a computer can be distributed or centralized Early computers used distributed control and a lot ofredundant hardware

    computer organization and architectureUnit no: 2

  • http:/

    /csetu

    be.co.

    nr/

    PROCESSING UNIT FEATURES

    Execution of a Complete Instructiono Add (R3), R1 /* R1 [R1] + [[R3]]o Adds the contents of a memory location pointed to by R3 to

    register R1.o Sequence of control steps:1. PCout, MARin, Read, Select4, Add, Zin2. Zout, PCin, Yin, WMFC3. MDRout, IRin4. R3out, MARin, Read5. R1out, Yin, WMFC6. MDRout, SelectY, Add, Zin

  • http:/

    /csetu

    be.co.

    nr/

    7. Zout, R1in, End

    Multiple bus architecture Single-bus structure: Control sequences are long as only one data item

    can be transferred over the bus in a clock cycle. Figure on next slide shows a three-bus structure.

    All registers are combined into a single block called register file withthree ports: 2 outputs allowing 2 registers to be accessed

    simultaneously and have their contents put on buses A and B, and 1input allowing data on bus C to be loaded into a third register.

  • http:/

    /csetu

    be.co.

    nr/

    Buses A and B are used to transfer source operands to the A and Binputs of ALU, and result transferred to destination over bus C.

    For the ALU, R=A (or R=B) means that its A (or B) input ispassed unmodified to bus C.

    Add R4, R5, R6 /* R6 [R4] + [R5]o Adds the contents of R4 and R5 to R6.

    Sequence of control steps:o PCout, R=B, MARin, Read, IncPCo WMFCo MDRoutB, R=B, IRino R4outA, R5outB, SelectA, Add, R6in, End

  • Hardwired controlThe control logic is implemented with gates, F/Fs, decoders, and

    other digital circuitsTo execute instructions, a computer's processor must generate the

    control signals used to perform the processor's actions in the propersequence. This sequence of actions can either be executed by anotherprocessor's software or in hardware.

    Hardware signals are generated either by hardwired control, inwhich the instruction bits directly generate the signals

    hardwired control usually was implemented using discretecomponents, flip-chips, or even rotating discs or drums. This can begenerally done by two methods.

    The classical method of sequential circuit design. It attempts tominimize theamount of hardwire, in particular, by using only log2pflip flops to realize a p state circuit.

    An approach that uses one flip flop per state. While expensive interms of flip flops, this method simplifies controller unit design anddebuggi

    Combinational logic Determine outputs at each state. Determine next state.

    Storage elements Maintain state representation

  • Hardwired Implementation

    The Cycles (Fetch, Indirect, Execute, Interrupt) areconstructed as a State Machine

    The Individual instruction executions can be constructedas State Machines

    State Machine

    CombinationalLogic Circuit

    StorageElements

    Inputs Outputs

    Clock

  • http:/

    /csetu

    be.co.

    nr/

    Common sections can be shared. There is a lot ofsimilarity

    One ALU is implemented. All instructions share it

  • Microprogrammed control

    A control unit whose binary control variables are stored in memory(control memory).The Control Memory contains sequences of microinstructions that providethe control signals to execute instruction cycles, e.g. Fetch, Indirect,Execute, and Interrupt

    Microinstruction : Control Word in Control MemoryThe microinstruction specifies one or more microoperations

    MicroprogramA sequence of microinstruction

    Dynamic microprogramming : Control Memory =RAM

    n RAM can be used for writing (to change awritable control memory)

    n Microprogram is loaded initially from anauxiliary memory such as a magnetic disk

    Static microprogramming : Control Memory =ROM

    n Control words in ROM are made permanentduring the hardware production.

    Microprogrammed control Organization :1) Control Memory

    A memory is part of a control unit : MicroprogramComputer Memory (employs a microprogrammedcontrol unit)

    --Main Memory : for storing user program(Machine instruction/data)--Control Memory : for storing microprogram(Microinstruction)

    2) Control Address Register Specify the address of the microinstruction

    3) Sequencer (= Next Address Generator) Determine the address sequence that is read from

    control memory

  • Next address of the next microinstruction can bespecified several way depending on the sequencerinput : 4) Control Data Register (= PipelineRegister )

    Hold the microinstruction read from controlmemory

    Allows the execution of the microoperationsspecified by the control word simultaneously withthe generation of the next microinstruction

  • http:/

    /csetu

    be.co.

    nr/

  • Microprogrammed control--Typical Microinstruction Formats

    --Micro-instruction Types Each micro-instruction specifies single (or few) micro-operations to

    be performed(vertical micro-programming)

    Each micro-instruction specifies many different micro-operations tobe performed in parallel

    (horizontal micro-programming)

    Vertical Micro-programming Width is narrow n control signals encoded into log2 n bits Limited ability to express parallelism Considerable encoding of control information requires external

    memory word decoder to identify the exact control line beingmanipulated

  • Horizontal Micro-programming

    Wide memory word High degree of parallel operations possible Little encoding of control information

    Micro-instruction AddressFunction Codes

    JumpCondition

    Internal CPU Control Signals Micro-instruction Address

    Jump ConditionSystem BusControl Signals

  • Nanoprogramming Use a 2-level control storage organization Top level is a vertical format memory Output of the top level memory drives the address register of the bottom(nano-level) memory Nanomemory uses the horizontal format Produces the actual control signal outputs The advantage to this approach is significant saving in control memorysize (bits) Disadvantage is more complexity and slower operation (doing 2 memory

    accesses fro each microinstruction)Example: Supppose that a system is being designed with 200 controlpoints and 2048 microinstructions Assume that only 256 different combinations of control points are everused A single-level control memory would require 2048x200=409,600 storagebitsA nano programmed system would use Microstore of size 2048x8=16k Nanostore of size 256x200=51200 Total size = 67,584 storage bitsNano programming has been used in many CISC microprocessors

  • http:/

    /csetu

    be.co.

    nr/

  • http:/

    /csetu

    be.co.

    nr/

    Nano programmed machine

  • Hazard (computer architecture)In computer architecture, a hazard is a potential problem that can

    happen in a pipelined processor. It refers to the possibility of erroneouscomputation when a CPU tries to simultaneously execute multipleinstructions which exhibit data dependence. There are typically three typesof hazards: data hazards, structural hazards, and branching hazards (controlhazards).Instructions in a pipelined processor are performed in several stages, so thatat any given time several instructions are being executed, and instructionsmay not be completed in the desired order.

    RAW (read after write) WAW (write after write) WAR (write afterread)

    Consider two instructions i and j, with i occurring before j. Thepossible data hazards are:RAW (read after write) - j tries to read a source before i writes it, so jincorrectly gets the old value.This is the most common type of hazard and the kind that we use forwardingto overcome.

    WAW (write after write) - j tries to write an operand before it iswritten by i. The writes end up being performed in the wrong order, leavingthe value written by i rather than the value written by j in the destination.This hazard is present only in pipelines that write in more than one pipestage or allow an instruction to proceed even when a previous instruction isstalled. The DLX integer pipeline writes a register only in WB and avoidsthis class of hazards.

    WAW hazards would be possible if we made the following twochanges to the DLX pipeline:

    All the data hazards discussed here involve registers within the CPU.By convention, the hazards are named by the ordering in the program thatmust be preserved by the pipeline.

    Unit no: 3

  • Data Hazards We must ensure that the results obtained when instructions are

    executed in a pipelined processor are identical to those obtained whenthe same instructions are executed sequentially.

    Hazard occursA 3 + AB 4 A

    No hazardA 5 CB 20 + C

    When two operations depend on each other, they must be executedsequentially in the correct order.

    Another example:Mul R2, R3, R4Add R5, R4, R6

  • Operand Forwarding Instead of from the register file, the second instruction can get data

    directly from the output of ALU after the previous instruction iscompleted.

    A special arrangement needs to be made to forward the output ofALU to the input of ALU.

    Registerfile

    SRC1 SRC2

    RSLTDestination

    Source 1Source 2

    (a) Datapath

    ALU

    E: Execute(ALU)

    W: Write(Register file)

    SRC1,SRC2 RSLT

    (b) P osition of the source and result registers in the processor pipeline

    Figure 8.7. Operand forw arding in a pipelined processor .

    Forwarding path

  • Handling Data Hazards in Software Let the compiler detect and handle the hazard:

    I1: Mul R2, R3, R4NOPNOP

    I2: Add R5, R4, R6 The compiler can reorder the instructions to perform some useful

    work during the NOP slots.

    Side Effects: The previous example is explicit and easily detected. Sometimes an instruction changes the contents of a register other than

    the one named as the destination. When a location other than one explicitly named in an instruction as a

    destination operand is affected, the instruction is said to have a sideeffect. (Example?)

    Example: conditional code flags:Add R1, R3AddWithCarry R2, R4

    Instructions designed for execution on pipelined hardware shouldhave few side effects.

    Instruction Hazards-> Whenever the stream of instructions supplied by the instruction fetchunit is interrupted, the pipeline stalls.->Cache miss->Branch

  • Unconditional Branches

    F2I2 (Branch)

    I3

    Ik

    E2

    F3

    Fk Ek

    Fk+1 Ek+1Ik+1

    Instruction

    Figure 8.8. An idle cycle caused by a branch instruction.

    Execution unit idle

    1 2 3 4 5Clock cycleTime

    F1I1 E1

    6

    X

  • Branch Timing

    --Branch penalty- Reducing the penalty

  • Instruction Queue and Prefetching

    Conditional Branches A conditional branch instruction introduces the added hazard caused

    by the dependency of the branch condition on the result of a precedinginstruction.

    The decision to branch cannot be made until the execution of thatinstruction has been completed.

    Branch instructions represent about 20% of the dynamic instructioncount of most programs.

  • Datapath and Control ConsiderationsDatapath: portion of the processor which contains hardware necessary toperform all operations required by the computer (the brawn).Control: portion of the processor (also in hardware) which tells the datapathwhat needs to be done (the brain).

    Following operations can be performed independentlyin the processor :

    Reading an instruction from instruction cache Incrementing the PC Decoding an instruction Reading from or writing into data cache Reading the contents of up to two registers from the register

    file Writing into one register in register file Performing an ALU operation

    Performance Considerations Performance = (accuracy, cost of misprediction) Branch History Table (BHT):

    Lower bits of PC used to index table of 1-bitvalues

    Says whether or not branch taken last time No address check (unlike caches)

    Problem: in loop, 1-bit BHT causes doublemisprediction

    End-of-loop case, when it exits instead oflooping

  • First loop pass next time: predicts exit instead oflooping(Average loop does about 9 iterationsbefore exit)

    Exception HandlingException Types

    I/O device request Breakpoint Integer arithmetic overflow FP arithmetic anomaly Page fault Misaligned memory accesses Memory-protection violation Undefined instruction

  • Privilege violation Hardware and power failure

    Exception RequirementsSynchronous vs. asynchronous

    I/O exceptions: Asyncronous Allow completion of current instruction

    Exceptions within instruction: Synchronous Harder to deal with User requested vs. coerced

    Requested predictable and easier to handle User maskable vs. unmaskable Resume vs. terminate

    Easier to implement exceptions that terminate programexecution

    Stopping & Restarting Execution Some exceptions require restart of instruction e.g. Page fault in MEM stage When exception occurs, pipeline control can: Force a trap instruction into next IF stage Until the trap is taken, turn off all writes for the faulting (and later)

    instructions OS exception-handling routine saves faulting instruction PC

    Precise exceptions Instructions before the faulting one complete Instructions after it restart As if execution were serial Exception handling complex if faultinginstruction can change state before exception occurs Precise exceptions simplifies OS Required for demand paging

  • Memory organizationRAM composed of a large number of (2M) of addressable

    locations, each of which stores a w-bit word.RAM operates as follows: first the address of the target location to

    be accessed is transferred via the address bus to the RAMs addressbuffer.

    The address is then processed by the address decoder, whichselects the required location in the storage cell unit.

    If a read operation is requested, the contents of the addressedlocation aretransferred from the storage cell unit to the data buffer and from there tothe data bus.

    If a write operation is requested, the word to be stored istransferred from the data bus to the selected location in the stored unit.The storage unit is made up of many identical 1-bit memory cells

    and theirInterconnections. In each line connected to the storage cell unit, we canexpect tofind a driver that acts as either an amplifier or a transducer of physicalsignals.

    Organizationassume that each word is stored in a single track and that each

    access resultsIn the transfer of a block of words.

    The address of the data to be accessed is applied to the addressdecoder, whose output determines the track to be used and the location ofthe desired block ofInformation within the track.

    the track address determines the particular read-write head tobe selected.The selected head is moved into position to transfer data to offrom the target track.

    A track position indicator generates the address of the blockthat isCurrently passing the read-write head.

    The generated address is compared with the block addressproduced by the address decoder.

    computer organization and architectureUnit no: 4

  • http:/

    /csetu

    be.co.

    nr/

    The selected head is enabled and the data transfer between thestorage trackand the memory data buffer register begins.

    The read-write head is disabled when a complete block ofinformation hasbeen transferred.

    Static memories (RAM) Circuits capable of retaining their state as long as power is applied Static RAM(SRAM) volatile

  • DRAMS: Charge on a capacitor Needs Refreshing

  • http:/

    /csetu

    be.co.

    nr/

    Synchronous DRAMsSynchronized with a clock signal

  • http:/

    /csetu

    be.co.

    nr/

    Memory system considerations Cost Speed Power dissipation Size of chip

  • http:/

    /csetu

    be.co.

    nr/

    Principle of locality:

    Temporal locality (locality in time): If an item is referenced, it will tend tobe referenced again soon.Spatial locality (locality in space): If an item is referenced, items whoseaddresses are close by will tend to be referenced soon.Sequentiality (subset of spatial locality ).The principle of locality can be exploited implementing the memory ofcomputer as a memory hierarchy, taking advantage of all types of memories.Method: The level closer to processor (the fastest) is a subset of any level

    further away, and all the data is stored at the lowest level (the slowest).

  • Cache Memories Speed of the main memory is very low in comparison with the speed ofprocessor For good performance, the processor cannot spend much time of its timewaiting to access instructions and data in main memory. Important to device a scheme that reduces the time to access theinformation An efficient solution is to use fast cache memory When a cache is full and a memory word that is not in the cache isreferenced, thecache control hardware must decide which block should be removed tocreate space for the new block that contain the referenced word.

  • The basics of Caches" The caches are organized on basis of blocks, the smallest amount of datawhich can be copied between two adjacent levels at a time." If data requested by the processor is present in some block in the upperlevel,it is called a hit." If data is not found in the upper level, the request is called a miss and thedata is retrieved from the lower level in the hierarchy." The fraction of memory accesses found in the upper level is called a hitratio." The storage, which takes advantage of locality of accesses is called a cache

  • http:/

    /csetu

    be.co.

    nr/

    Performance of caches Accessing

  • Accessing a Cache

  • Virtual memoryVirtual memory is a computer system technique which gives an

    application program the impression that it has contiguous working memory(an address space), while in fact it may be physically fragmented and mayeven overflow on to disk storage.Virtual memory provides two primary functions:1. Each process has its own address space, thereby not required to berelocated nor required to use relative addressing mode.2. Each process sees one contiguous block of free memory upon launch.Fragmentation is hidden.

    All implementations (Excluding emulators) require hardware support.This is typically in the form of a Memory Management Unit built into theCPU.

    Systems that use this technique make programming of largeapplications easier and use real physical memory (e.g. RAM) moreefficiently than those without virtual memory. Virtual memory differssignificantly from memory virtualization in that virtual memory allowsresources to be virtualized as memory for a specific system, as opposed to alarge pool of memory being virtualized as smaller pools for many differentsystems.

    Note that "virtual memory" is more than just "using disk space toextend physical memory size" - that is merely the extension of the memoryhierarchy to include hard disk drives. Extending memory to disk is a normalconsequence of using virtual memory techniques, but could be done by othermeans such as overlays or swapping programs and their data completely outto disk while they are inactive. The definition of "virtual memory" is basedon redefining the address space with a contiguous virtual memory addressesto "trick" programs into thinking they are using large blocks of contiguousaddresses.

  • Paged virtual memory

  • Compiler time: If it is known in advance that a program will resideat a specific location of main memory, then the compiler may be told tobuild the object code with absolute addresses right away. For example, theboot sect in a bootable disk may be compiled with the starting point of codeset to 007C:0000.

    Load time: It is pretty rare that we know the location a program willbe assigned ahead of its execution. In most cases, the compiler mustgenerate relocatable code with logical addresses. Thus the addresstranslation may be performed on the code during load time. Figure 3 showsthat a program is loaded at location x. If the whole program resides on amonolithic block, then every memory reference may be translated to bephysical by added to x.

    Memory Management RequirementsPartitioning Strategies Fixed

    Fixed Partitions divide memory into equal sized pieces (exceptfor OS)

    Degree of multiprogramming = number of partitions Simple policy to implement

    All processes must fit into partition space Find any free partition and load the process

    Partitioning Strategies VariableIdea: remove wasted memory that is not needed in each partition

    Memory is dynamically divided into partitions based on process needsDefinition:

    Hole: a block of free or available memory Holes are scattered throughout physical memory

    External fragmentation memory that is in holes too small to be usable by any process

  • Memory Allocation Mechanism MM system maintains data about free and allocated memory alternatives

    Bit maps - 1 bit per allocation unit Linked Lists - free list updated and coalesced when not allocated to a

    process At swap-in or process create

    Find free memory that is large enough to hold the process Allocate part (or all) of memory to process and mark remainder as free

    Compaction Moving things around so that holes can be consolidated Expensive in OS time

    Memory Management Policies First Fit: scan free list and allocate first hole that is large enough fast Next Fit: start search from end of last allocation Best Fit: find smallest hole that is adequate slower and lots of

    fragmentation Worst fit: find largest hole

    OS

    process 1

    process 2

    process 3

    OS

    process 1

    process 3

    Process 2Terminates

    OS

    process 1

    process 3

    Process 4Starts

    process 4

  • ASSOCIATIVE MEMORY

    A content addressable processor is a content addressable memorywith the added capability to write in parallel (multi-write) into allthose words indicating agreement as the result of a search.

    A typical associative memory has the followingcomponents: Memory array Comparand register Mask register Match/Mismatch (response) register Multiple match resolver Search logic Input/Output register Word select register

    --Memory Cell Array provides storage and search medium for data.--Comparand Register contains the data to be compared against thecontents of the memory cell array.--Mask Register is used to mask off portions of the data wordswhich do not participate in the operations.

    Comparand Reg.

    Mask Reg.

    Memory CellArray

    Input/Output Reg.

    Tag Reg.MultipleMatchResolver

    Word SelectReg.

    Some/None Bit

  • -- Word Select Register is used to mask off the memorywords which do not participate in the operation.

    -- Match/Mismatch Register indicates the success or failure of asearch operation.--Input/Output Buffer acts as an interface between associativememory and the outside word.-- Multiple Match Resolver narrows down the scope of thesearch to a specific location in the memory cell array in a caseswhere more than one memory word will satisfy the searchcondition(s).-- Some/None bit shows the overall search result.

    SECONDARY STORAGE DEVICES

    Optical storage, the typical Optical disc, stores information in deformities on the surfaceof a circular disc and reads this information by illuminating the surface with a laser diodeand observing the reflection. Optical disc storage is non-volatile. The deformities may bepermanent (read only media ), formed once (write once media) or reversible (recordableor read/write media). The following forms are currently in common use: CD, CD-ROM, DVD, BD-ROM: Read only storage, used for mass distribution ofdigital information (music, video, computer programs)

    CD-R, DVD-R, DVD+R BD-R: Write once storage, used for tertiary and off-linestorage

    CD-RW, DVD-RW, DVD+RW, DVD-RAM, BD-RE: Slow write, fast read storage,used for tertiary and off-line storage

    Ultra Density Optical or UDO is similar in capacity to BD-R or BD-RE and is slowwrite, fast read storage used for tertiary and off-line storage.

  • A Compact Disc (also known as a CD) is an optical disc used to store digital data. It wasoriginally developed to store sound recordings exclusively, but later it also allowed thepreservation of other types of data. Audio CDs have been commercially available sinceOctober 1982. In 2009, they remain the standard physical storage medium for audio.

    Standard CDs have a diameter of 120 mm and can hold up to 80 minutes ofuncompressed audio (700 MB of data). The Mini CD has various diameters ranging from60 to 80 mm; they are sometimes used for CD singles or device drivers, storing up to 24minutes of audio.

    The technology was eventually adapted and expanded to encompass data storage CD-ROM, write-once audio and data storage CD-R, rewritable media CD-RW, VideoCompact Discs (VCD), Super Video Compact Discs (SVCD), PhotoCD, PictureCD, CD-i, and Enhanced CD.

    Magnetic disk memoriesA magnetic disk consists of 1-12 platters (metal or glass disk covered with magneticrecording material on bothsides), with diameters between 1-3.5 inches

    Each platter is comprised of concentric tracks (5-30K) and each track is divided intosectors (100 500 per track, each about 512 bytes) A movable arm holds the read/write heads for each disk surface and moves them allin tandem a cylinder of data is accessible at a timeTo read/write data, the arm has to be placed on the correct track this seek timeusually takes 5 to 12 ms on average can take less if there is spatial locality

    Rotational latency is the time taken to rotate the correct sector under the head average is typically more than 2 ms (15,000 RPM)

    Transfer time is the time taken to transfer a block of bits out of the disk and istypically 3 65 MB/second.

    Magneto-optical disc storage is optical disc storage where the magnetic state on aferromagnetic surface stores information. The information is read optically and written bycombining magnetic and optical methods. Magneto-optical disc storage is non-volatile,sequential access, slow write, fast read storage used for tertiary and off-line storage.

  • Accessing I/O DevicesInterface to CPU and MemoryInterface to one or more peripherals

    Generic Model of IO Module

    computer organization and architectureUnit no: 5

  • Interface for an IO Device:

    CPU checks I/O module device status I/O module returns statusIf ready, CPU requests data transfer I/O module gets data from deviceI/O module transfers data to CPU

    Programmed I/OCPU has direct control over I/O

    Sensing statusRead/write commandsTransferring data

    CPU waits for I/O module to complete operationWastes CPU timeCPU requests I/O operationI/O module performs operationI/O module sets status bitsCPU checks status bits periodicallyI/O module does not inform CPU directlyI/O module does not interrupt CPUCPU may wait or come back laterUnder programmed I/O data transfer is very like memory access (CPU viewpoint)Each device given unique identifier

    IO interface circuits:->task of connecting an IO device to a computer system is greatly eased

  • -->the simplest interface circuits is a one-word, addressable register thatServes as an I/o port.

    most basic IO interface circuits are programmable circuits intended toAct as serial or parallel ports .serial ports accommodate many types of slow peripheralDevices ranging from secondary memory units to network connections.I/O Mapping

    Memory mapped I/ODevices and memory share an address spaceI/O looks just like memory read/writeNo special commands for I/O

    Large selection of memory access commands availableIsolated I/O

    Separate address spacesNeed I/O or memory select linesSpecial commands for I/O

    Interrupt driven and programmed I/O require active CPU intervention (All data

    must pass through CPU) Transfer rate is limited by processor's ability to service the device CPU is tied up managing I/O transfer Additional Module (hardware) on bus DMA controller takes over bus from CPU for I/O

    Waiting for a time when the processor doesn't need bus Cycle stealing seizing bus from CPU

    Transfer of control through the use of interrupts

    By the use of standard ICs known as IO interfacing circuits.

  • An interrupt is an asynchronous signal indicating the need for attention or asynchronous event in software indicating the need for a change in execution.A hardware interrupt causes the processor to save its state of execution and beginexecution of an interrupt handler. Software interrupts are usually implemented asinstructions in the instruction set, which cause a context switch to an interrupt handlersimilar to a hardware interrupt.Interrupts are a commonly used technique for computer multitasking, especially inreal-time computing. Such a system is said to be interrupt-driven.An act of interrupting is referred to as an interrupt request (IRQ).When there is an interrupt, continuation of the execution of the current program ismeaningless.Execution of the Current program needs to be stoppedIn a computer system, there are many sources of interrupts

    Needs to prepare interrupt processing routines for each source of interrupts Source of interrupt needs to be identified Needs to initiate execution of the interrupt processing routine associated

    with the identified interruptWhen the interrupt is resolved, the interrupted program needs to be continued for theefficiency reason

    Needs to resume execution of the interrupted program.

  • In a computer, a vectored interrupt is an I/O interrupt that tells the part of the computerthat handles I/O interrupts at the hardware level that a request for attention from an I/Odevice has been received and and also identifies the device that sent the request.A vectored interrupt is an alternative to a polled interrupt , which requires that theinterrupt handler poll or send a signal to each device in turn in order to find out whichone sent the interrupt request.PCI interruptsDevices are required to follow a protocol so that the interrupt lines can be shared. ThePCI bus includes four interrupt lines, all of which are available to each device. However,they are not wired in parallel as are the other PCI bus lines.PCI bridges (between two PCI buses) map the four interrupt traces on each of theirsides in varying ways.The result is that it can be impossible to determine how a PCIdevice's interrupts will appear to software.PCI interrupt lines are level-triggered. This was chosen over edge-triggering in order togain an advantage when servicing a shared interrupt line, and for robustness: edgetriggered interrupts are easy to miss.PCI Express does not have physical interrupt lines at all. It uses message-signaledinterrupts exclusively.pipeline interruptInterrupt: Hardware signal to switch processor to new instruction stream.When interrupt occurs, state of interrupted process is saved, including PC, registers,and memoryInterrupt is precise if the following three conditions hold: All instructions preceding u have been executed, and have modified the state correctly

    All instructions following u are unexecuted, and have not modified thestate

    If the interrupt was caused by an instruction, it was caused by instructionu, which is either completely executed (e.g.: overflow) or completelyunexecuted (e.g: VM page fault).

    Vectored interrupts

  • Precise interrupts are desirable if software is to fix up error that caused interrupt andexecution has to be resumed

    Easy for external interrupts, could be complex and costly for internal Imperative for some interrupts (VM page faults, IEEE FP standard)

    Direct Memory Access (DMA)

    Polling or interrupt driven I/O incurs considerable overheadMultiple program instructionsSaving program stateIncrementing memory addressesKeeping track of word countTransfer large amounts of data at high speed without continuous intervention by theprocessorSpecial control circuit required in the I/O device interface, called a DMA controllerDMA controller keeps track of memory locations, transfers directly to memory (via thebus) independent of the processorSingle Bus, Detached DMA controller

    Each transfer uses bus twice I/O to DMA then DMA to memory CPU is suspended twice

    Single Bus, DMA controller integrated into I/O module Controller may support one or more devices Each transfer uses bus once DMA to memory CPU is suspended once

    Operation of a DMA transfer

  • Lecture plan

    DMA ControllerPart of the I/O device interfaceDMA ChannelsPerforms functions that would normally be carried out by the processorProvides memory addressBus signals that control transferKeeps track of number of transfersUnder control of the processor

    Buses, Bus control, bus interfacing, Bus arbitrationBuses:

    A bus is a subsystem that transfers data between components inside acomputer, or between computers.

    Multiple devices communicating over a single set of wires Only one device can talk at a time or the message is garbled Each line or wire of a bus can at any one time contain a single binary digit. Over

    time, however, a sequence of binary digits may be transferred These lines may and often do send information in parallel A computer system may contain a number of different buses

  • Bus Interconnection A bus is a communication pathway connecting two or more device. A key characteristic of a bus is that it is a shared transmission medium. A bus consists of multiple pathways or lines. Each line is capable of transmitting signal representing binary digit (1 or 0 A sequence of bits can be transmit across a single line. Several lines can be used to transmit bits simultaneously (in parallel). A bus that connects major components (CPU,Memory,I/O) is called System Bus. The most common computer interconnection structures are based on the use of

    one or more system buses.Bus interfacing Synchronous occurrence of events on the bus is determined by a clock

    (Clock Cycle or Bus Cycle) which includes line upon

    Asynchronous occurrence of one event follows and depends on the previous event

    Bus arbitration Centralized bus controller (Arbiter), hardware device,is responsible for

    allocating time on the bus (daisy chain)

  • Distributed access control logic in each module act together to share bus

    INTERFACE CIRCUITS

    Circuitry required connecting an I/O device to a computer bus Provides a storage buffer for at least one word of data. Contains status flag that can be accessed by the processor. Contains address-decoding circuitry Generates the appropriate timing signals required by the bus control scheme. Performs format conversions Ports Serial port Parallel port

  • Figure . An example of a computer system using different interfacestandards.

  • PCI (Peripheral Component Interconnect) PCI stands for Peripheral Component Interconnect Introduced in 1992 It is a Low-cost bus It is Processor independent It has Plug-and-play capability

    PCI bus transactionsPCI bus traffic is made of a series of PCI bus transactions. Each transaction is made

    up of an address phase followed by one or more data phases. The direction of the dataphases may be from initator to target (write transaction) or vice-versa (read transaction), butall of the data phases must be in the same direction. Either party may pause or halt the dataphases at any point. (One common example is a low-performance PCI device that does notsupport burst transactions, and always halts a transaction after the first data phase.)

    64-bit addressing is done using a 2-stage address phase. The initiator broadcasts thelow 32 address bits, accompanied by a special "dual address cycle" command code. Deviceswhich do not support 64-bit addressing can simply not respond to that command code. Thenext cycle, the initiator transmits the high 32 address bits, plus the real command code. Thetransaction operates identically from that point on. To ensure compatibility with 32-bit PCIdevices, it is forbidden to use a dual address cycle if not necessary, i.e. if the high-orderaddress bits are all zero.

    While the PCI bus transfers 32 bits per data phase, the initiatortransmits a 4-bit byte mask indicating which 8-bit bytes are to be considered significant. Inparticular, a masked write must affect only the desired bytes in the target PCI device.

    Arbitration Address phase Address phase timing Data phases Ending transactions

    Table 4.3. Data transfer signals on the PCI bus.

    Any PCI device may initiate a transaction. First, it must request permission from aPCI bus arbiter on the motherboard. The arbiter grant permission to one of the requestingdevices. The initiator begins the address phase by broadcasting a 32-bit address plus a 4-bitcommand code, then waits for a target to respond. All other devices examine this address andone of them responds a few cycles later.

  • SCSI Bus Defined by ANSI X3.131 Small Computer System Interface 50, 68 or 80 pins Max. transfer rate 160 MB/s, 320 MB/s.SCSI Bus Signals

  • USB - Universal Serial Bus Speed Low-speed(1.5 Mb/s) Full-speed(12 Mb/s) High-speed(480 Mb/s) Port Limitation Device Characteristics Plug-and-play

  • Universal Serial Bus Tree

    Structure

    USB (Universal Serial Bus) is a specification to establish communicationbetween devices and a host controller (usually personal computers). USB isintended to replace many varieties of serial and parallel ports. USB can connectcomputer peripherals such as mice, keyboards, digital cameras, printers, personalmedia players, flash drives, and external hard drives. For many of those devices,USB has become the standard connection method. USB was designed for personalcomputers[citation needed], but it has become commonplace on other devicessuch as smartphones, PDAs and video game consoles, and as a power cordbetween a device and an AC adapter plugged into a wall plug for charging. As of2008, there are about 2 billion USB devices sold per year, and approximately 6billion total sold to date.

    coa-1.pdfcoa-2.pdfcoa-3.pdfcoa-4.pdfcoa-5.pdf