chapter 7 processing unit processing unit processing unit datapath internal bus architecture ...
Post on 11-Jan-2016
248 Views
Preview:
TRANSCRIPT
Chapter 7Processing Unit
Processing UnitProcessing Unit DatapathDatapath
Internal Bus ArchitectureInternal Bus Architecture Internal ProcessingInternal Processing
• Hard-wiredHard-wired• Microinstruction method (briefly)Microinstruction method (briefly)
Next LectureNext Lecture PipeliningPipelining
2
Fundamental Concepts
For simplicity, assume that each instruction occupies one For simplicity, assume that each instruction occupies one memory wordmemory word
Instruction execution stagesInstruction execution stages Fetch stageFetch stage
Fetch the contents of the memory location pointed to Fetch the contents of the memory location pointed to by PC and load it into IR : [IR] by PC and load it into IR : [IR] [[PC]] [[PC]]
Increment the contents of PC : [PC] Increment the contents of PC : [PC] [PC] + 4 [PC] + 4 Execution stageExecution stage
Carry out the instruction fetched Carry out the instruction fetched Accessing register, memory, etcAccessing register, memory, etc Performing computation using ALUPerforming computation using ALU Using internal and external resourcesUsing internal and external resources
Datapath
linesData
Addresslines
External Memory Bus
Carry-in
ALU
PC
MAR
MDR
Y
Z
Add
XOR
Sub
IR
TEMP
R0
controlALU
lines
Control signals
R n 1-
Instruction
decoder and
Internal processor bus
control logic
A B
MUXSelect
Constant 4
ADD R1,R2,R3ADD R1,R2,R3
LDR R0, addrLDR R0, addr
4
Datapath with a single common bus
ALU and all registers are on a single common busALU and all registers are on a single common bus The common bus is internal to the CPU (do not be The common bus is internal to the CPU (do not be
confused with external buses connecting CPU to memory confused with external buses connecting CPU to memory and I/O devices)and I/O devices)
The external memory bus connects to the CPU via MDR The external memory bus connects to the CPU via MDR and MARand MAR
The number and function of registers R0 through R(n-1) The number and function of registers R0 through R(n-1) varies from one CPU to anothervaries from one CPU to another
Registers can either be general purpose or special Registers can either be general purpose or special purposepurpose
Register Y, Z and TEMP are transparent to the program, Register Y, Z and TEMP are transparent to the program, they are used only by the CPU for temporary storagethey are used only by the CPU for temporary storage
DatapathDatapath: ALU, registers, and the interconnecting bus: ALU, registers, and the interconnecting bus Assume all the registers have a clock inputAssume all the registers have a clock input
5
Processing
Most of the operations needed to execute an instruction Most of the operations needed to execute an instruction can be carried out by performing one or more of the can be carried out by performing one or more of the following functionsfollowing functions Fetch the contents of a given memory location and Fetch the contents of a given memory location and
load them into a CPU register (e.g., LDR R0, addr)load them into a CPU register (e.g., LDR R0, addr) Store a word of data from a CPU register into a given Store a word of data from a CPU register into a given
location in memory (e.g., STO R0, addr)location in memory (e.g., STO R0, addr) Transfer a word of data from one CPU register to Transfer a word of data from one CPU register to
another or to the ALU (e.g., MOV R2,R3 or ADD R1,#1)another or to the ALU (e.g., MOV R2,R3 or ADD R1,#1) Perform an arithmetic or logical operation and store Perform an arithmetic or logical operation and store
the result in a CPU register (e.g., ADD R1,R2,R3)the result in a CPU register (e.g., ADD R1,R2,R3)
Register Transfer
Registers need input and Registers need input and output gatingoutput gating
RiRiinin control signal for input of control signal for input of Ri: when RiRi: when Riinin=1, data available =1, data available on the common bus is loaded on the common bus is loaded in Riin Ri
RiRioutout control signal for output control signal for output of Ri when Riof Ri when Rioutout=1, the =1, the contents of Ri are placed on contents of Ri are placed on the busthe bus
Example: transfer the Example: transfer the contents of R1 to R4contents of R1 to R4 Enable output of R1 : Enable output of R1 :
R1R1outout=1=1 Enable input of R4: Enable input of R4:
R4R4inin=1=1
BA
Z
ALU
Yin
Y
Zin
Zout
Riin
Ri
Riout
Internal processor bus
Constant 4
MUXSelect
Arithmetic & Logical Operation
ALU is a combinational circuit that ALU is a combinational circuit that has no internal storagehas no internal storage
To add two numbers, the two To add two numbers, the two operands have to be availableoperands have to be available to to the ALU simultaneouslythe ALU simultaneously
Register Y holds one of the two Register Y holds one of the two numbersnumbers
The other number is gated onto the The other number is gated onto the busbus
The result is stored temporarily in ZThe result is stored temporarily in Z
Example : ADD R1, R2, R3 (R3=R1+R2)Example : ADD R1, R2, R3 (R3=R1+R2)
Step 1,Step 1, R1 R1outout=1 and Y=1 and Yinin=1=1
Step 2,Step 2, R2 R2outout=1, Add=1, Z=1, Add=1, Zinin = 1 = 1
Step 3,Step 3, Z Zoutout = 1, R3 = 1, R3inin=1: contents of Z =1: contents of Z are transferred to R3are transferred to R3
Step3 cannot be done concurrently Step3 cannot be done concurrently with step2, because only one with step2, because only one register can be connected to the register can be connected to the bus at any given time bus at any given time
BA
Z
ALU
Yin
Y
Zin
Zout
Riin
Ri
Riout
Internal processor bus
Constant 4
MUXSelect
Add
8
Register Gating and Timing of Data Transfers Each bit of a register consists of a flip-flop (FF) Each bit of a register consists of a flip-flop (FF) WWhile Rihile Riinin=1 , the state of =1 , the state of eacheach FF changes FF changes to to its its
correspondcorrespondinging data on the bus data on the bus At a clock edge while At a clock edge while RiRiinin=1,=1, the data stored in the FF the data stored in the FF
immediately before the transition is locked untilimmediately before the transition is locked until RiRiinin=1 =1 againagain
TThe output of the register is capable of being disconnectedhe output of the register is capable of being disconnected from the bus, placing a 0 or placing a 1 on the bus: tri-statefrom the bus, placing a 0 or placing a 1 on the bus: tri-state
D Q
Q
Clock
1
0
Ri out
Ri in
Bus
9
Fetch Operation
CPU has to specify the address of the memory location and request a CPU has to specify the address of the memory location and request a read operation (e.g., LDR R2, [R1])read operation (e.g., LDR R2, [R1])
1.1. Send an address (MAR Send an address (MAR [R1]) to memory [R1]) to memory CPU transfers the address of the required word into MARCPU transfers the address of the required word into MAR
2.2. Start a Read operationStart a Read operation CPU uses the control lines of the memory bus to indicate a Read CPU uses the control lines of the memory bus to indicate a Read
operation is neededoperation is needed3.3. Wait for MFC (memory function complete) response Wait for MFC (memory function complete) response
CPU waits until it receives an answer from memory informing that CPU waits until it receives an answer from memory informing that the Read has been completed.the Read has been completed.
When MFC is set to 1, it indicates that the specified location has When MFC is set to 1, it indicates that the specified location has been read and the contents are available on the data lines of the been read and the contents are available on the data lines of the memory busmemory bus
The duration of this step depends on the speed of memoryThe duration of this step depends on the speed of memory Overall execution time of an instruction can be decreased by useful Overall execution time of an instruction can be decreased by useful
work, example: incrementing the PCwork, example: incrementing the PC4.4. R2 R2 [MDR] [MDR]
The information on the memory bus is first loaded into MDR The information on the memory bus is first loaded into MDR The contents of the MDR are next moved into a destination registerThe contents of the MDR are next moved into a destination register
Read Timing1 2
Clock
Address
MR
Data
MFC
Read
MDRinE
MDRout
Step 3
MARin
11
Synchronous Asynchronous Transfer
Asynchronous transferAsynchronous transfer One device initiates the transfer and waits until the One device initiates the transfer and waits until the
other device respondsother device responds Enables transfer of data between two independent Enables transfer of data between two independent
devices that have different speeds of operationdevices that have different speeds of operation Synchronous transferSynchronous transfer
One of the control lines of the bus carries pulses from One of the control lines of the bus carries pulses from a clock running continuously at a fixed frequencya clock running continuously at a fixed frequency
These pulses provide common timing signals to the These pulses provide common timing signals to the CPU and main memoryCPU and main memory
Simpler implementationSimpler implementation Cannot accommodate devices of widely varying Cannot accommodate devices of widely varying
speed, except by reducing the speed of all devices to speed, except by reducing the speed of all devices to that of the slowest onethat of the slowest one
MixedMixed
12
Store Operation
STORE R2, [R1]STORE R2, [R1]
Step 1,Step 1,MAR MAR [R1] [R1]
Step Step 2,2,MDR MDR [R2], Write [R2], Write
Step Step 3,3,Wait for MFCWait for MFC
Steps 1 and 2 can be carried out simultaneously if the Steps 1 and 2 can be carried out simultaneously if the architecture allows itarchitecture allows it
This is not possible with a single CPU bus This is not possible with a single CPU bus Step 3 may be overlapped with other operations, Step 3 may be overlapped with other operations,
provided that there is no conflictprovided that there is no conflict
13
Execution of a Complete Instruction Example: ADD (R3),R1
1.1. Instruction FetchInstruction Fetch
2.2. Fetch operand(s)Fetch operand(s)
3.3. Perform the additionPerform the addition
4.4. Store results into R1Store results into R1
linesData
Addresslines
External Memory
Bus
Carry-in
ALU
PC
MAR
MDR
Y
Z
Add
XOR
SubcontrolALU
lines
A B
MUXSelect
Constant 4
14
Execution of a Complete Instruction Example: ADD (R3),R1
Step Action
1 PCout , MARin , Read,Select4,Add, Zin
2 Zout , PCin , WMFC
3 MDRout , IRin
4 R3out , MARin , Read
5 R1out , Yin , WMF C
6 MDRout , SelectY,Add, Zin
7 Zout , R1in , End
linesData
Addresslines
External Memory
Bus
Carry-in
ALU
PC
MAR
MDR
Y
Z
Add
XOR
SubcontrolALU
lines
A B
MUXSelect
Constant 4
IR
R1
R3
15
Steps 1, 2 and 3. Fetch & Increase PC
PCPCoutout, MAR, MARinin, Read, Se, Read, Select 4lect 4, Add, Z, Add, Zinin
Load the content of the PC into MAR, and send a Load the content of the PC into MAR, and send a read request read request
PCPCoutout, M, MAARRinin, R, Readead WWhile waiting for a response, increment PC hile waiting for a response, increment PC
SSeeleclect t constant 4 in MUXconstant 4 in MUX ALU inputALU input B is receiving B is receiving the current value in PC,the current value in PC, SSpecify Add operationpecify Add operation In In step 2step 2, , move updated value back into PCmove updated value back into PC and and
wait MFCwait MFC ((ZZoutout, PC, PCinin, WMFC, WMFC)) In In step 3step 3, the w, the word fetched from memory is loaded ord fetched from memory is loaded
into IR into IR MDRMDRoutout, IR, IRinin
16
Steps 4, 5, 6 and 7
Step 4 and 5:Step 4 and 5: FFetch the first operand: the content of the memory etch the first operand: the content of the memory
locationlocation pointed to by R3pointed to by R3 R3R3outout, MAR, MARinin, Read, Read R1R1outout, Y, Yinin, WMFC, WMFC
Step 6:Step 6: PPerform the additionerform the addition
MDRMDRoutout, , Select Y, Select Y, Add, ZAdd, Zinin
Step 7:Step 7: LLoad results into R1oad results into R1
ZZoutout, R1, R1inin, End, End
17
StepAction
1 PCout, MARin , Read,Select4, Add, Zin
2 Zout, PCin , Yin, WMF C
3 MDRout , IRin
4 Offset-field-of-IRout, Add,Zin
5 Zout, PCin, End
Control sequence for an unconditional branch instruction
Branch Instructions
18
Steps of Unconditional Branching Branching: branch address is obtained by adding an offset X Branching: branch address is obtained by adding an offset X
(given in(given in the branch instruction) to the current value of the PCthe branch instruction) to the current value of the PC
1.1. Fetch an instructionFetch an instruction PCPCoutout, MAR, MARinin, Read, Select 4, Add, Z, Read, Select 4, Add, Zinin
ZZoutout, PC, PCinin, Y, Yinin, WMFC, WMFC MDRMDRoutout, IR, IRinin
2.2. EExecutexecute Offset-field-of-IROffset-field-of-IRoutout, Add, Z, Add, Zinin
ZZoutout, PC, PCinin, End, End
PC is incremented during the fetch PC is incremented during the fetch phasephase before knowing the before knowing the typetype of instruction being executedof instruction being executed
WWhen the offset is added to the contenthen the offset is added to the contentss of the PC, the PC has of the PC, the PC has already been updated to the instruction following the branchalready been updated to the instruction following the branch
TThe offset is the difference between the branch target address he offset is the difference between the branch target address and theand the address immediately following the branchaddress immediately following the branch
19
Steps of Conditional Branching
Check the status of the condition codes before loading Check the status of the condition codes before loading the new value into the PCthe new value into the PC
Offset-field-of-IROffset-field-of-IRoutout, Add, Z, Add, Zinin
If conditions do not match, then EndIf conditions do not match, then End
Multi-Bus Structure
All general purpose registers are All general purpose registers are combined intocombined into a register filea register file
RRegister file can be implemented egister file can be implemented in VLSI using an array of memory in VLSI using an array of memory cells similar to the one used in cells similar to the one used in RAM chipsRAM chips
TThe register file has two outputs, he register file has two outputs, allowing theallowing the contentcontentss of the of the register to be placed on buses A register to be placed on buses A and B simultaneouslyand B simultaneously
CCompared to the single bus ompared to the single bus organization, this organization organization, this organization requires fewer control stepsrequires fewer control steps (i.e., (i.e., faster)faster)
Memory busdata lines
Bus A Bus B Bus C
Instructiondecoder
PC
Registerfile
Constant 4
ALU
MDR
A
B
R
MU
X
Incrementer
Addresslines
MAR
IR
Controls
21
StepAction
1 PCout, R=B, MARin, Read,IncPC
2 WMFC
3 MDRoutB, R=B, IR in
4 R4outA, R5outB, Select BusA,Add, R6in, End
Control sequence for the instruction
Multiple Bus Operation ExampleAdd R4,R5,R6
Steps 1…3: Instruction fetchSteps 1…3: Instruction fetch Step 4: AdditionStep 4: Addition
22
Buses A and B are used to transfer the source operandsBuses A and B are used to transfer the source operands Bus C is used to transfer the destinationBus C is used to transfer the destination TThe path from the source to the destination goes he path from the source to the destination goes
throughthrough the ALU (where the operation is performed)the ALU (where the operation is performed) Copies of one register to another also go through the Copies of one register to another also go through the
ALUALU TTemporary storage registers (Y, Z) are not neededemporary storage registers (Y, Z) are not needed Ensuring that a register can serve as both a sourceEnsuring that a register can serve as both a source and and
a destination a destination not possible if registers are simple latchesnot possible if registers are simple latches the register file must be implemented using edge the register file must be implemented using edge
triggeredtriggered m master-slave aster-slave flip-flopsflip-flops TThe three-bus architecture allows execution of a he three-bus architecture allows execution of a
register-to-register operation in a single clock cycleregister-to-register operation in a single clock cycle
Multiple Bus Operation ExampleAdd R4,R5,R6
23
Overlap fetch and execute phasesOverlap fetch and execute phases IInstruction unit: fetch instructions and place them nstruction unit: fetch instructions and place them
into a queue ready for executioninto a queue ready for execution IIt generates memory addresses based on the t generates memory addresses based on the
address of theaddress of the last instruction fetchedlast instruction fetched AAttempts to ttempts to prefetchprefetch the correct instruction on the correct instruction on
branchesbranches based on a based on a history of phistory of prrevious branchesevious branches Prefetching with branch predictionPrefetching with branch prediction
Including a fast Including a fast cachecache on the same chip as the CPU on the same chip as the CPU HHides the memory response time ides the memory response time IIf the desired data is found in the cache: cache f the desired data is found in the cache: cache hit;hit;
otherwise a cache otherwise a cache missmiss IIf a cache miss occursf a cache miss occurs,, it is necessary to go to it is necessary to go to the the
main memorymain memory
Enhancements
24
Generating Control Signals
To execute an instruction, the CPU must generate control To execute an instruction, the CPU must generate control signals corresponding to the current instructionsignals corresponding to the current instruction
TTwo types of approacheswo types of approaches HHardard--wired wired MMicroprogrammedicroprogrammed
25
Hard-wired Control
CLKClock
Control step
IRencoder
Decoder/
Control signals
codes
counter
inputs
Condition
External
Current instruction
e.g., MFC
e.g., result of previous
computation
For an instruction, many steps are needed as shown
previously
26
Several non overlapping time slotsSeveral non overlapping time slots (i.e., steps) (i.e., steps) are are required for executing an instructionrequired for executing an instruction
EEach time slot must be long enough for the functions ach time slot must be long enough for the functions specified in the step to be completedspecified in the step to be completed
AAssume all time slots are equalssume all time slots are equal TThe control unit may be based on the use of a counterhe control unit may be based on the use of a counter
driven by CLKdriven by CLK TThe required control signalhe required control signalss are uniquely determined are uniquely determined
byby contentcontentss of the control step counter of the control step counter contentcontentss of the instruction register of the instruction register (i.e., instruction (i.e., instruction
fetched)fetched) contents of the condition code and other status contents of the condition code and other status
flags (e.g. MFC status signal)flags (e.g. MFC status signal) The decoder/encoder is a combinational circuit that The decoder/encoder is a combinational circuit that
generates the required control outputs depending on generates the required control outputs depending on the state of all its inputsthe state of all its inputs
Hard-wired Control
27
Externalinputs
Encoder
ResetCLK
Clock
Control signals
counter
Run End
Conditioncodes
decoderInstruction
Step decoder
Control step
IR
T1 T2 Tn
INS1
INS2
INSm
Separation of Decoding and Encoding Functions
28
Diagram with decoding and encoding function separated Diagram with decoding and encoding function separated The step decoder provides a separate signal line for The step decoder provides a separate signal line for
each step in the control sequenceeach step in the control sequence The output of the instruction decoder consists of a The output of the instruction decoder consists of a
separate line for each machine instructionseparate line for each machine instruction All input signals to the encoder block should be All input signals to the encoder block should be
combined to generate individual control signals (e.g. Yin, combined to generate individual control signals (e.g. Yin, PCout, Add, End)PCout, Add, End)
ExamplesExamples
Separation of Decoding and Encoding Functions
29
Control Signals
BA
Z
ALU
Yin
Y
Zin
Zout
Riin
Ri
Riout
Internal processor bus
Constant 4
MUXSelect
Add
XOR
SubcontrolALU
lines
30
T1
AddBranch
T4 T6
Generation of the Zin Control Signal
Example encoder structure, Zin = T1 + T6Example encoder structure, Zin = T1 + T6 ·· ADD + TADD + T44 ·· BR BR + ...+ ...
Zin is turned on duringZin is turned on during slot T1 for all instructionsslot T1 for all instructions slot T6 for an ADD instructionslot T6 for an ADD instruction (e.g., Add (R3),R1) (e.g., Add (R3),R1) slot Tslot T44 for a for an unconditionaln unconditional bbranchranch
Zin
31
Example: Add (R3),R1
Step Action
1 PCout , MARin , Read,Select4,Add, Zin
2 Zout , PCin , WMFC
3 MDRout , IRin
4 R3out , MARin , Read
5 R1out , Yin , WMF C
6 MDRout , SelectY,Add, Zin
7 Zout , R1in , End
Control sequence for instruction Add (R3),R1(Yin at step 2 is there b/c steps 1~3 are common for all instructions)
Yin ,
32
StepAction
1 PCout, MARin , Read,Select4, Add,Zin
2 Zout, PCin , Yin, WMF C
3 MDRout , IRin
4 Offset-field-of-IRout, Add,Zin
5 Zout, PCin, End
Control sequence for an unconditional branch instruction
Unconditional Branch
33
Generation of the End Control Signal
Example encoder structure, End = T7 Example encoder structure, End = T7 ·· ADD + T ADD + T55 ·· BR + BR + (T(T55 ·· N + T4 N + T4 ·· N N’’) ) ·· BRN + ... BRN + ...
T7
Add Branch
Branch<0case
T5
End
NN
T4T5
34
A Complete CPU
Instructionunit
Integerunit
Floating-pointunit
Instructioncache
Datacache
Bus interface
Mainmemory
Input/Output
System bus
Processor
35
The instruction unit fetches instructions from an instruction The instruction unit fetches instructions from an instruction cache, or from main memory on a cache misscache, or from main memory on a cache miss
SSeparate processing units to deal with integer and floating eparate processing units to deal with integer and floating pointpoint
DData cache is between the processing units and main ata cache is between the processing units and main memorymemory
SSeparate caches for instruction and data (split cache)eparate caches for instruction and data (split cache) OOther processors may have one cache for both data and ther processors may have one cache for both data and
instructionsinstructions (unified cache)(unified cache) The CPU is connected to the system bus (rest of the The CPU is connected to the system bus (rest of the
computer) throughcomputer) through a bus interfacea bus interface
AlternativesAlternatives MMore than two processing units: several units of the same ore than two processing units: several units of the same
typetype to increase parallelismto increase parallelism PProcessors that execute instructions at a rate faster thanrocessors that execute instructions at a rate faster than
one instruction per cycle are called : one instruction per cycle are called : superscalarsuperscalar
A Complete CPU
36
PCin
PCout
MA
Rin
Rea
d
MD
Rout
IRin
Yin
Sel
ect
Ad
d
Zin
Zout
R1 out
R1in
R3 out
WM
FC
En
d
0
1
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
0
0
1
0
0
0
1
0
0
1
0
0
0
0
0
1
0
0
1
0
0
0
1
0
0
0
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
1
0
0
0
0
1
0
1
0
0
0
0
1
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
1
0
0
Micro -instruction
1
2
3
4
5
6
7
Figure 7.15 An example of microinstructions for Figure 7.6.
Microprogrammed Control Approach
Add (R3), R1Add (R3), R1
steps
37
Example: Add (R3),R1
Step Action
1 PCout , MARin , Read,Select4,Add, Zin
2 Zout , PCin , WMFC
3 MDRout , IRin
4 R3out , MARin , Read
5 R1out , Yin , WMF C
6 MDRout , SelectY,Add, Zin
7 Zout , R1in , End
Control sequence for instruction Add (R3),R1(Yin at step 2 is there b/c steps 1~3 are common for all instructions)
Yin ,
Datapath
linesData
Addresslines
External Memory Bus
Carry-in
ALU
PC
MAR
MDR
Y
Z
Add
XOR
Sub
IR
TEMP
R0
controlALU
lines
Control signals
R n 1-
Instruction
decoder and
Internal processor bus
control logic
A B
MUXSelect
Constant 4
39
Basic Organization of a Microprogrammed Control Unit
storeControl
generator
Startingaddress
CW
Clock PC
IR
PCin
PCout
MA
Rin
Rea
dM
DR out
IRin
Y in
Sele
ctA
dd Z in
Z out
R1 out
R1 in
R3 out
WM
FCE
nd
0100000
0000001
1000000
1001000
1001000
0010010
0010000
0100100
1000000
1000010
1000010
0100001
0000100
0000001
0001000
0100100
Micro -instruction
1234567
(index)
40
Control signals are generated by a program similar to Control signals are generated by a program similar to machine language programsmachine language programs
Individual bits of aIndividual bits of a control word control word ( (CW)CW) correspond to correspond to controlcontrol signals signals
EEach of the control steps defines a unique combination ofach of the control steps defines a unique combination of 1s and 0s in the CW1s and 0s in the CW
MMicroroutineicroroutine: : a a sequence of CWs corresponding to the sequence of CWs corresponding to the control sequence of a control sequence of a single single machine instructionmachine instruction
IIndividual control words are called ndividual control words are called microinstructionsmicroinstructions
Microprogrammed Control
microroutine ≈ subroutinemicroinstruction ≈ instruction
microprogram counter ≈ program counter
41
Assume that the microroutines for all instructions are Assume that the microroutines for all instructions are stored in special memory called astored in special memory called a control store control store
TThe control unit can generate control signals for anyhe control unit can generate control signals for any instruction by sequentially reading the CWs in the instruction by sequentially reading the CWs in the correspondingcorresponding microroutinemicroroutine
AA microprogram counter (microprogram counter (µµPC)PC) is used to point to the next is used to point to the next microinstructionmicroinstruction
WWhen a new instruction is fetched into IR, the startinghen a new instruction is fetched into IR, the starting address generator loads the starting address of the address generator loads the starting address of the correspondingcorresponding microroutine into the microroutine into the µµPCPC
TThe he µµPC is incremented to access successive PC is incremented to access successive microinstructionsmicroinstructions
Basic Organization of a Microprogrammed Control Unit
42
Branch Instructions How does the control unit check the status of the condition flags How does the control unit check the status of the condition flags
or status flags on conditional branchesor status flags on conditional branches The microinstruction set needs to be expanded to include The microinstruction set needs to be expanded to include
conditional branch microinstructions conditional branch microinstructions In addition to the branch address, these microinstructions specify In addition to the branch address, these microinstructions specify
the flag or bit that should be checked as a conditionthe flag or bit that should be checked as a condition
Example: Microroutine for the instruction Branch on negativeExample: Microroutine for the instruction Branch on negativeAddressAddress microinstruction . microinstruction .
0 0 PCPCoutout, MAR, MARinin, Read, Select4, Add, Z, Read, Select4, Add, Zinin
1 1 Zout, PCZout, PCinin, Y, Yinin, WMFC , WMFC
2 2 MDRMDRoutout, IR, IRinin
3 3 Branch to starting address of an appropriate microroutineBranch to starting address of an appropriate microroutine.. .. .... .... 25 25 if N=0 then branch to microinstruction 0if N=0 then branch to microinstruction 0
26 26 Offset field of IROffset field of IRoutout, SelectY, Add, Z, SelectY, Add, Zinin
27 27 ZZoutout, PC, PCinin, End, End
After loading the instruction into IR, a branch microinstruction transfers After loading the instruction into IR, a branch microinstruction transfers control to the microroutine starting at location 25control to the microroutine starting at location 25
43
Allowing Conditional Branch in Microprogram
Controlstore
Clock
generator
Starting andbranch address Condition
codes
inputsExternal
CW
IR
PC
44
Support for microprogram branchingSupport for microprogram branching Starting and branch address generatorStarting and branch address generator The block loads a new µPC when a microinstruction The block loads a new µPC when a microinstruction
requires a branchrequires a branch Input to the block include: status flags, condition flags, IRInput to the block include: status flags, condition flags, IR The µPC is incremented by one every time except in the The µPC is incremented by one every time except in the
following situationsfollowing situations When a new instruction is loaded into IR, When a new instruction is loaded into IR, µµPC is PC is
loaded with the loaded with the starting address of the microroutine starting address of the microroutine for that instructionfor that instruction
WWhen a branch microinstruction is encountered and hen a branch microinstruction is encountered and the branchthe branch condition is satisfiedcondition is satisfied
WWhen an End microinstruction is encountered: the hen an End microinstruction is encountered: the µµPC is loaded with the first microinstruction PC is loaded with the first microinstruction (i.e., (i.e., address 0) to fetch a new instruction to IRaddress 0) to fetch a new instruction to IR
Allowing Conditional Branch in Microprogram
45
Implementation of Microinstructions
1st 1st designdesign : Assign one bit position to each control signal : Assign one bit position to each control signal - - Resulting in Resulting in long microinstructionslong microinstructions OOnly few bits are set to 1 in any given microinstructionnly few bits are set to 1 in any given microinstruction EExample of the single bus organizationxample of the single bus organization
4 general purpose registers 4 general purpose registers SSome of the connections to the CPU are permanently ome of the connections to the CPU are permanently
enabled: theenabled: the output of IR to the decoding circuit, the two output of IR to the decoding circuit, the two inputs of the ALUinputs of the ALU
AA total of 20 gating signals are needed total of 20 gating signals are needed AAdditional signals include: Read, Write, Clear Y, Set dditional signals include: Read, Write, Clear Y, Set
Carry-in,Carry-in, WMFC and EndWMFC and End SSignals to specify with ALU operation to perform: 16 ignals to specify with ALU operation to perform: 16
operations operations 16 bits 16 bits Total of 42 bits of control signalsTotal of 42 bits of control signals
46
An alternative: Encoded control signalsAn alternative: Encoded control signals Most signals are not needed simultaneouslyMost signals are not needed simultaneously Many signals are mutually exclusiveMany signals are mutually exclusive Only one function of the ALU is needed at a timeOnly one function of the ALU is needed at a time Read and write signals to memory cannot be active at the Read and write signals to memory cannot be active at the
same timesame time The source for a data transfer must be unique: cannot gate The source for a data transfer must be unique: cannot gate
the contents of two registers simultaneously on a single busthe contents of two registers simultaneously on a single bus Signals can be grouped so that mutually exclusive signals are Signals can be grouped so that mutually exclusive signals are
placed in the same groupplaced in the same group 4 bits are needed to represent the 16 functions of the ALU4 bits are needed to represent the 16 functions of the ALU Register output control signals can be in a group consisting of Register output control signals can be in a group consisting of
PCPCoutout, MDR, MDRoutout, Z, Zoutout, Address, Addressoutout, R0, R0outout, R1, R1outout, R2, R2outout, R3, R3outout and and TEMPTEMPoutout : encoding with 4 bits : encoding with 4 bits
Control signals can be grouped and encoded to reduce the Control signals can be grouped and encoded to reduce the number of bits in microinstructionsnumber of bits in microinstructions
Microinstructions
47
Field-encoded Microinstructions
F2 (3 bits)
000: No transfer001: PC
in010: IRin011: Zin100: R0in101: R1
in110: R2in111: R3in
F1 F2 F3 F4 F5
F1 (4 bits) F3 (3 bits) F4 (4 bits) F5 (2 bits)
0000: No transfer0001: PC
out0010: MDRout0011: Zout0100: R0out0101: R1
out0110: R2out0111: R3out1010: TEMPout1011: Offsetout
000: No transfer001: MAR
in010: MDRin011: TEMPin100: Yin
0000: Add0001: Sub
1111: XOR
16 ALUfunctions
00: No action01: Read
10: Write
Microinstruction
F6 F7 F8
F6 (1 bit) F7 (1 bit) F8 (1 bit)
0: SelectY1: Select4
0: No action1: WMFC
0: Continue1: End
Total 20 bitsTotal 20 bits
48
Most fields must include one inactive code for the case Most fields must include one inactive code for the case where no action is requiredwhere no action is required
No active code is reserved in the ALU;No active code is reserved in the ALU; thus the ALU thus the ALU is is active at all timesactive at all times;; the control on Z the control on Zinin makes sure makes sure that that the result of an operated is gated only when the result of an operated is gated only when appropriateappropriate
GGrouping control signals requires more hardware to rouping control signals requires more hardware to decodedecode bit patternsbit patterns
TThe cost of the additional hardware is he cost of the additional hardware is amortizedamortized by by having having thethe smaller control store smaller control store
Field-encoded Microinstructions
49
Microprogram Sequencing
Each machine instruction is implemented by a microroutineEach machine instruction is implemented by a microroutine AA microroutine is entered by decoding an instruction into microroutine is entered by decoding an instruction into a a
starting address that is loaded into the starting address that is loaded into the µµPCPC BBranching capabilities are introduced through branchranching capabilities are introduced through branch
microinstructionsmicroinstructions HHaving a separate microroutine for each machine instructionaving a separate microroutine for each machine instruction
leads to a large control storeleads to a large control store TThere are several instructions and several addressing modeshere are several instructions and several addressing modes OOrganize the microprogram so that microroutines sharerganize the microprogram so that microroutines share as as
mamanny common parts as possibley common parts as possible SSharing common parts requires several branch haring common parts requires several branch
microinstructionsmicroinstructions LLonger time is needed to execute branch microinstructionsonger time is needed to execute branch microinstructions
50
Example: ADD src, Rdst
AAssume that the source operand can be specified using: ssume that the source operand can be specified using: register, autoincrement, autodecrement, indirect and register, autoincrement, autodecrement, indirect and indirect formindirect formss of all of these modes of all of these modes
AA suitable microprogram will combine all the modes suitable microprogram will combine all the modes See next slideSee next slide
51
52
Branch address modification using bitBranch address modification using bit--ORingORing Branches are not always made to a single branch addressBranches are not always made to a single branch address AA direct consequence of combining microroutines direct consequence of combining microroutines At the point At the point αα of the previous example of the previous example, it is necessary to choose, it is necessary to choose between the between the
actions required by direct and indirect addressingactions required by direct and indirect addressing modesmodes Indirect mode: microinstruction at location 170 (fetch Indirect mode: microinstruction at location 170 (fetch an an operandoperand from memory)from memory) Direct mode: microinstruction at location 171 (fetchDirect mode: microinstruction at location 171 (fetchinging an an operand is bypassed)operand is bypassed) EEfficient branching: Bitfficient branching: Bit--ORing technique ORing technique
havhavinging the preceding instruction specify 170 the preceding instruction specify 170 use an OR gate to change the least significant bit use an OR gate to change the least significant bit of 170 ifof 170 if direct addressing direct addressing
mode mode
Wide Branch AddressingWide Branch Addressing GGenerating branch addresses means that the circuitry becomes moreenerating branch addresses means that the circuitry becomes more complexcomplex
E.E.g.g.,, the machine instruction fetch is completed the machine instruction fetch is completed,, and and anan appropriate appropriate microroutinemicroroutine should be selected according to addressing modes should be selected according to addressing modes
AA simple and inexpensive way of generating required branch addresses is using simple and inexpensive way of generating required branch addresses is using a PLAa PLA
The OP code of The OP code of aa machine instruction is translated machine instruction is translated ininto a startingto a starting addressaddress
Branch Addressing
53
Address Address MMicroinstructionicroinstructionss(octal) (octal)
000 000 PCPCoutout, MAR, MARinin, Read, Clear Y, Set carry-in, Add, Z, Read, Clear Y, Set carry-in, Add, Zinin
001001 ZZoutout, PC, PCinin, WMFC , WMFC
002 002 MDRMDRoutout, IR, IRinin
003 003 BranchBranch PC PC 101 (from instruction decoder); 101 (from instruction decoder); PC{5,4} PC{5,4} [IR_{10,9}]; [IR_{10,9}];
PCPC33 [IR [IR1010]' . [IR]' . [IR99]]’’ . [IR . [IR88]]’’
121 121 RsrcRsrcoutout, MAR, MARinin, Read, Clear Y, Set carry-in, Add, Z, Read, Clear Y, Set carry-in, Add, Zinin
122 122 ZZoutout, Rsrc, Rsrcinin
123 123 Branch { Branch { PC PC 170 ; 170 ; PCPC00 [IR [IR88]]’’}, WFMC }, WFMC
170170 MDRMDRoutout, MAR, MARinin, Read, WMFC , Read, WMFC
171171 MDRMDRoutout, Y, Yinin
172 172 RdstRdstoutout, Add, Z, Add, Zinin
173 173 ZZoutout, Rdst, Rdstinin, End, End
Detailed Example: ADD (Rsrc)+, Rdst
54
3 bit field used to specify the addressing mode for the 3 bit field used to specify the addressing mode for the source operandsource operand
Bits 10 and 9 denote indexed (11), autodecrement (10), Bits 10 and 9 denote indexed (11), autodecrement (10), autoincrement (01),autoincrement (01), and register modes (00)and register modes (00)
Bit 8 is used to specify the indirect version of the Bit 8 is used to specify the indirect version of the addressing modeaddressing mode
EE..g.g.,, 010: direct version of the autoincrement 010: direct version of the autoincrement AAssume CPU has 16 registers that can be used for ssume CPU has 16 registers that can be used for
addressing purposesaddressing purposes Bits 7 through 4 specify the source operandBits 7 through 4 specify the source operand Bits 3 through 0 specify the destination operandBits 3 through 0 specify the destination operand
Detailed Example: ADD (Rsrc)+, Rdst
55
Any of the 16 general purpose registers may be Any of the 16 general purpose registers may be involved in determining the source and destination involved in determining the source and destination operandsoperands
Microinstructions refer to control signals only Microinstructions refer to control signals only as Rsrcas Rsrcoutout, , RscrRscrinin, Rdst, Rdstoutout and Rdst and Rdstinin
TThese signals are translated hese signals are translated ininto a specific register by to a specific register by the decoding circuit connected to Rsrcthe decoding circuit connected to Rsrc and and Rdst Rdst address fields of IRaddress fields of IR
RRequires a two level decodingequires a two level decoding TThe microinstruction field must be decoded to he microinstruction field must be decoded to
determine that andetermine that an Rsrc or Rdst is involvedRsrc or Rdst is involved TThe decoded output is used to gate the contents of he decoded output is used to gate the contents of
the Rsrc or Rdstthe Rsrc or Rdst field in the IR into a second decoder field in the IR into a second decoder which produces the gating signals for the actual which produces the gating signals for the actual registers R0 through R15registers R0 through R15
Detailed Example: ADD (Rsrc)+, Rdst
56
Consider Address 123:Consider Address 123:
123 123 Branch {Branch {PC PC 170 ; 170 ; PCPC00 [IR [IR88]’}, WFMC ]’}, WFMC unmodified version causes a branch to location 170unmodified version causes a branch to location 170
WWhen a direct addressing mode appears, the fetch is bypassed by ORinhen a direct addressing mode appears, the fetch is bypassed by ORingg th the e inverse of the indirect bit inverse of the indirect bit inin the src address (bit 8 of IR) with the 0 bi the src address (bit 8 of IR) with the 0 bit t position position of the of the PCPC
003 003 BranchBranch {{PC PC 101 (from 101 (from IInstruction decoder)nstruction decoder); ; PCPC5,45,4[IR[IR10,910,9]; ];
PCPC33 [IR [IR1010]’]’..[IR[IR99]’ . [IR]’ . [IR88]} ]}
TThe five branch addresses differ in the middle octal digit onlyhe five branch addresses differ in the middle octal digit only TThe octal pattern 101 is obtained from the PLAhe octal pattern 101 is obtained from the PLA TThe 3 bits to be ORed with the middle octal digit are supplied bhe 3 bits to be ORed with the middle octal digit are supplied by y the decoding the decoding
circuitry connected to the src address mode field (bits 8,circuitry connected to the src address mode field (bits 8, 9 and 19 and 10 0 of IR)of IR) BBits 4 and 5 of the its 4 and 5 of the PC are set directly from bits 9 and 10 of IRPC are set directly from bits 9 and 10 of IR TThese bits select the appropriate microinstruction for all srchese bits select the appropriate microinstruction for all src address except the address except the
register indirect moderegister indirect mode RRegister indirect mode: set bit 3 of egister indirect mode: set bit 3 of PC to 1 using thPC to 1 using the e AND of [IRAND of [IR1010]’ , [IR]’ , [IR99]’ and ]’ and
[IR[IR88]]
Detailed Example: ADD (Rsrc)+, RdstUsing Bit-Oring Scheme
57
The previous microprogram requires several branch The previous microprogram requires several branch microinstructionsmicroinstructions
This reduces the operating speed of the computerThis reduces the operating speed of the computer AA powerful alternative is to include an address field as a part of powerful alternative is to include an address field as a part of
every microinstruction to indicate the location of the next every microinstruction to indicate the location of the next microinstructionmicroinstruction
Thus, every microinstruction becomes a branchThus, every microinstruction becomes a branch AAdvantages: flexibilitydvantages: flexibility DDisadvantages: expense of the additional bits for the address fieldisadvantages: expense of the additional bits for the address field TyTypical microprogram: 4k microinstructions with 50 to 80 bitspical microprogram: 4k microinstructions with 50 to 80 bits per per
microinstruction microinstruction 12 bit address field is needed 12 bit address field is needed
Microinstructions with Next-Address Field
58
Microinstruction Sequencing
Conditioncodes
IR
Decoding circuits
Control store
Next address
Microinstruction decoder
Control signals
InputsExternal
AR
I R
59
Advantage: separate branch microinstructions are Advantage: separate branch microinstructions are virtually eliminated, makes this scheme very attractivevirtually eliminated, makes this scheme very attractive
The µPC is replaced by a microinstruction address register The µPC is replaced by a microinstruction address register (µAR) (µAR)
The µAR holds the address of the next microinstructionThe µAR holds the address of the next microinstruction
A new control structure that supports next address field A new control structure that supports next address field and bit-ORingand bit-ORing
TThe decoding circuit includes a PLA decoder that ishe decoding circuit includes a PLA decoder that is used used to generate the starting address of a given microroutineto generate the starting address of a given microroutine on the basis of the OP code field in the IRon the basis of the OP code field in the IR
Microinstruction Sequencing
60
Example : ADD (Rsrc)+, RdstExample : ADD (Rsrc)+, Rdst Rsrc and Rdst are used instead of referring to register R0 Rsrc and Rdst are used instead of referring to register R0
through R15 explicitly through R15 explicitly Actual control signals can be decoded using the data in the Actual control signals can be decoded using the data in the
src and dst fields of IRsrc and dst fields of IR
MMicroinstruction 003icroinstruction 003 BBit-Oring is used to determine the next instruction based on it-Oring is used to determine the next instruction based on
thethe addressing mode of the source operandaddressing mode of the source operand TThe addressing mode is indicated by bits 8,9 and 10 of IRhe addressing mode is indicated by bits 8,9 and 10 of IR LLet ORet ORmodemode control whether or not this bit-ORing is used control whether or not this bit-ORing is used
MMicroinstructions 123, 143, and 166icroinstructions 123, 143, and 166 BBit-ORing is used to decide if indirect addressing of the sourceit-ORing is used to decide if indirect addressing of the source
operand is usedoperand is used ORORindsrcindsrc signal is used for this purpose signal is used for this purpose
Microprogram Sequencing
61
Format for Microinstructions
F1 (3 bits)
000: No transfer001: PCout010: MDRout011: Zout100: Rsrcout101: Rdstout110: TEMPout
F0 F1 F2 F3
F0 (8 bits) F2 (3 bits) F3 (3 bits)
000: No transfer001: PCin010: IRin011: Zin100: Rsrcin
000: No transfer001: MARin
Microinstruction
Address of nextmicroinstruction
101: Rdstin
010: MDRin011: TEMPin100: Yin
F4 F5 F6 F7
F5 (2 bits)F4 (4 bits) F6 (1 bit)
0000: Add0001: Sub
0: SelectY1: Select4
00: No action01: Read
1111: XOR
10: Write
F7 (1 bit)
0: No action1: WMFC
F8 F9 F10
F8 (1 bit) F9 (1 bit) F10 (1 bit)
0: No action1: ORindsrc
0: No action1: ORmode
0: NextAdrs1: InstDec
62
PLA PLA PLA is used initially to decode the instruction OP codesPLA is used initially to decode the instruction OP codes OOne bit in the microinstruction is used to indicate ne bit in the microinstruction is used to indicate
whenwhen the output of the PLA is gated into the the output of the PLA is gated into the µµARAR Address fieldAddress field
Each microinstruction contains an 8 bit address field Each microinstruction contains an 8 bit address field that holds the address of the next microinstructionthat holds the address of the next microinstruction
Format for Microinstructions
63
Implementation of the Microroutine
1
01
111100111110
001
001
1
21 0
00
0
00
0
0
0
0
0
0
0
0
0
0
0 0
0
0
00
0 0
0101
110
37
7
00000000
0 1111
110
0
0
01707
F9
0
00
0
0
0
F10
0
0
0
00
0
00
0
00
0
0
0
F8F7F6F5F4
000 0 0 0 0 0
0
0
00
0
100
0
00
0
00
0
00
0 1
1
0
00 0
1
0
0
0
10000
0000
1100000
10
0
0
0
0
0
0
1
0 0
0
0
0
00
00 01
000000
001
110
100
10
F2
1
110 0 0 0 0 0
11
221
011110
111 00
1
12
0
21
0
00
addressOctal
111 00000
1 0000000
10000000
F0 F1
0
0 0 10 0
010010
0 11
001
110
100
0
0
0
1
1
0
1
F3
011000 0 0 0 0 00 00 00000 0 0 0 0 030 0 00 0 0
64
Microroutine for ADD (Rsrc)+, RdstMicroroutine for ADD (Rsrc)+, Rdst FFewer microinstructions are needed because branch ewer microinstructions are needed because branch
microinstructions are no longer requiredmicroinstructions are no longer required LLocations 003 and 123 have been combined with theocations 003 and 123 have been combined with the
microinstructions immediately preceding themmicroinstructions immediately preceding them WWhen microinstruction sequencing is controlled byhen microinstruction sequencing is controlled by a a µµPC, PC,
the End signal is used to reset the the End signal is used to reset the µµPCPC to point to the to point to the starting address of the microroutine that fetches the starting address of the microroutine that fetches the next machine instructionnext machine instruction
WiWith this scheme, the End signal is specifiedth this scheme, the End signal is specified explicitly in explicitly in the F0 fieldthe F0 field
Implementation of the Microroutine
Circuitry for the control signals using the next address field
Details of bit-ORing circuitry for the control signals using the next address field
decoderMicroinstruction
Control store
Next address F1 F2
Other control signals
F10F9F8
Decoder
Decoder
circuitsDecoding
Condition
External
codes
inputs
Rsrc RdstIR
Rdstout
Rdstin
Rsrcout
Rsrcin
AR
InstDecout
ORmode
ORindsrc
R15in R15out R0in R0out
66
Prefetching Microinstructions Drawback of the microprogrammed control: slow Drawback of the microprogrammed control: slow
operating speedoperating speed FFetching microinstructions from the control store takes etching microinstructions from the control store takes
a long timea long time Fast control storeFast control store LLong microinstructionsong microinstructions PPrefetchingrefetching
PProblems with prefetchingroblems with prefetching NNext microinstruction may depend ext microinstruction may depend onon the status the status
flags and results offlags and results of a a current microinstructioncurrent microinstruction PPrefetch refetch a a wrong microinstructionwrong microinstruction FFetch must be repeated with etch must be repeated with aa correct address correct address
DDisadvantages are minor and prefetching is often usedisadvantages are minor and prefetching is often used
67
Emulation
Microprogrammed control provides simple, flexible, and Microprogrammed control provides simple, flexible, and inexpensive way of executing machine instructionsinexpensive way of executing machine instructions
Allows diverse classes of instructions to be Allows diverse classes of instructions to be implementedimplemented
IIt is possible to define additional machine instructionst is possible to define additional machine instructions and implement them with microroutineand implement them with microroutine
WWe can add e can add anan instruction set of a different computer instruction set of a different computer AA given computer can emulate instruction given computer can emulate instructionss of a of a
differentdifferent computercomputer NNo software changes need to be made to legacy o software changes need to be made to legacy
programsprograms EEmulation facilitates transition to a new computer mulation facilitates transition to a new computer
system with minimal effortsystem with minimal effort Example: Pentium 4 translates X86 CISC instructions Example: Pentium 4 translates X86 CISC instructions
into its RISC microinstructions insideinto its RISC microinstructions inside
68
Pentium 4
s
69
Conclusion
Speed: hardwired approachSpeed: hardwired approach FFlexibility: microprogrammedlexibility: microprogrammed MMost present day processors use hardwiredost present day processors use hardwired
top related