problem 1a - ida

29
Problem 1a Problem 1a Consider two different implementations, M1 and M2, of the same f ( instruction set. There are three classes of instructions (A, B, and C) in the instruction set. For a given program, the average number of cycles for each instruction class is shown below The number of cycles for each instruction class is shown below . The table also shows how many instructions of a given class are in the program, as a percentage. E.g., if there are 100 instructions in total, there are 60 Class A instructions. Calculate the average CPI (Clocks Cycles per Instruction) for computer M1 and M2. 1

Upload: others

Post on 03-Nov-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Problem 1a - IDA

Problem 1aProblem 1a Consider two different implementations, M1 and M2, of the same

f (instruction set. There are three classes of instructions (A, B, andC) in the instruction set. For a given program, the averagenumber of cycles for each instruction class is shown below Thenumber of cycles for each instruction class is shown below. Thetable also shows how many instructions of a given class are in theprogram, as a percentage. E.g., if there are 100 instructions inp g p g gtotal, there are 60 Class A instructions. Calculate the averageCPI (Clocks Cycles per Instruction) for computer M1 and M2.

1

Page 2: Problem 1a - IDA

Problem 1b Problem 1b

C A h ll CPI f 1 3 d b Computer A has an overall CPI of 1.3 and can be run at a clock rate of 600MHz. Computer B has a CPI f 2 5 d b l k f 750 CPI of 2.5 and can be run at a clock rate of 750 Mhz. We have a particular program we wish to

Wh il d f t A thi run. When compiled for computer A, this program has exactly 100,000 instructions. How many instructions would the program need to many instructions would the program need to have when compiled for Computer B, in order for the two computers to have exactly the same for the two computers to have exactly the same execution time for this program?

2

Page 3: Problem 1a - IDA

Problem2a: Consider the instruction:  AND Rd, Rt, Rs. What are the 

Chapter 4 — The Processor — 3

control signals generated by the control unit in this figure ?

Page 4: Problem 1a - IDA

ANSWER : Regwrite asserted to write Mux (before ALUs) signaled to read from Mux (before ALUs) signaled to read from

registers and not immediate M (before re isters) si naled to se ALU and Mux (before registers) signaled to use ALU and

not data memoryALU l d f AND ALU is signaled to perform AND

Branch not asserted Memwrite not asserted Memread not assertedMemread not asserted

4

Page 5: Problem 1a - IDA

Problem 2bProblem 2b

Increment by 4 f t

32‐bit register

4 for next instruction

Chapter 4 — The Processor —5

Page 6: Problem 1a - IDA

Problem:  If the only thing we need to do in a processor is fetch consecutive instructions (see figure), what would the cycle time be ?Consider that the logic blocks have the following latencies. Other units have negligible latenciesunits have negligible latencies. 

I ‐Mem ADD Mux ALU Regs D‐Mem Sign‐Extend

Shift‐left‐2Extend 2

200 ps 70ps 20ps 90ps 90ps 250ps 15ps 10ps

Page 7: Problem 1a - IDA

ANSWER: 200ps because 200ps because I-Mem has larger latency than the Add unit I-Mem and Add are in parallel paths

7

Page 8: Problem 1a - IDA

Problem 2 cProblem 2 c

Consider the datapath shown in Figure. Assume that the processor has only one type of instruction: the unconditional jump instruction. Also, assume that there is NO pipelining Consider that the logic blocks have theNO pipelining. Consider that the logic blocks have the latencies as shown in Table 1. Other units have negligible latencies.latencies.

What would be the cycle‐time for this datapath?

8

Page 9: Problem 1a - IDA

9

Page 10: Problem 1a - IDA

Assembly: j exitj exit

Note: no registers!

10

Page 11: Problem 1a - IDA

RecallRecall

T k h 16 b ff d dd h dd Take the 16 bit offset and add it to the address of next instruction following the branch instruction to obtain the branch target address

Add the value in program counter (PC) and Add the value in program counter (PC) and the 16 bit offset

need an ADDer no need for ALU because we do need an ADDer, no need for ALU because we do not add to register values

Sign extend the16-bit offset to make it 32 bit like the size Sign extend the16-bit offset to make it 32 bit like the size of the PC

11

Page 12: Problem 1a - IDA

12

Page 13: Problem 1a - IDA

RecallRecall

Before adding, Shift left 2 places (word displacement)p ( p ) Add to PC + 4

• Already calculated by instruction fetch• Already calculated by instruction fetch

13

Page 14: Problem 1a - IDA

14

Page 15: Problem 1a - IDA

15

Page 16: Problem 1a - IDA

In parallel to instruction memory,  so no need to add 

QuestionQuestion mentions, negligible time for thisfor this

Unconditional jump does not need any comparison of values so registers are notcomparison of values, so registers are not used

Not used in this instruction

16

Page 17: Problem 1a - IDA

Problem 3Problem 3

In this question, assume that all branches are perfectly predicted (this eliminates all control hazards) Assume we have only one memory (foreliminates all control hazards). Assume, we have only one memory (for both instructions and data), and there might be a structural hazard. To resolve this, the pipeline must be stalled in some cycles. Show the pipelined execution and the number of cyclespipelined execution and the number of cycles.

SW  R2, 0(R3)OR R1 R2 R3OR  R1, R2, R3BEQ   R2, R0, Label  (Assume R2== R0)OR  R2, R2, R0Label :  ADD  R1, R4, R3, ,

17

Page 18: Problem 1a - IDA

SW R2 0(R3)SW  R2, 0(R3)OR  R1, R2, R3BEQ   R2, R0, Label  (Assume R2== R0)OR R2, R2, R0OR  R2, R2, R0Label :  ADD  R1, R4, R3

First, note the effect of the branch instruction

18

Page 19: Problem 1a - IDA

Recall The 5 Stagesg

Page 20: Problem 1a - IDA

•Draw the five stages for each instruction, while checking for structuralhazard and count 9 cycles as the result !hazard, and count 9 cycles as the result ! 

•Note that the question mentions that branch prediction is perfect andNote that the question mentions that branch prediction is perfect and there are no control hazards! 

20

Page 21: Problem 1a - IDA

Problem 4Problem 4SW  R16, 12(R6), ( )LW R16, 8(R6) BEQ  R5, R4, Label  (Assume  R5! = R4, i.e, the branch condition is false)ADD R5, R1, R4SLT R5, R15, R4

First, note the branch condition. All 5 instructions will be executed. 

21

Page 22: Problem 1a - IDA

Problem 4Problem 4SW  R16, 12(R6), ( )LW R16, 8(R6) BEQ  R5, R4, Label  (Assume  R5! = R4, i.e, the branch condition is false)ADD R5, R1, R4SLT R5, R15, R4

22

Page 23: Problem 1a - IDA

Problem 5Problem 5

IF ID EX MEM WB250ps 350ps 150ps 300ps 200ps

The individual stages of the datapath have the latencies as shown above. 1. Assume a non‐pipelined processor. What is the cycle time? p p p y

2. Now, assume a pipelined processor. What is the cycle time? 

23

Page 24: Problem 1a - IDA

Problem 5Problem 5

IF ID EX MEM WB250ps 350ps 150ps 300ps 200ps

The individual stages of the datapath have the latencies as shown above. 1. Assume a non‐pipelined processor. What is the cycle time? p p p y

Add them up to get the cycle time!

2. Now, assume a pipelined processor. What is the cycle time? The largest stage determines the cycle time.

24

Page 25: Problem 1a - IDA

200  ps latency 

100  ps latency 

Pipelining = Overlap the stages

Chapter 4 — The Processor — 25

Page 26: Problem 1a - IDA

Pipeline Pipeline

Chapter 4 — The Processor — 26

Page 27: Problem 1a - IDA

Clock CycleClock CycleSingle-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

Chapter 4 — The Processor — 27

Page 28: Problem 1a - IDA

Even if some stages take only 100ps instead of 200ps, the pipelined execution clock cycle must have the worst case clock cycle time of 200ps

28

Page 29: Problem 1a - IDA

Good Luck!!

29