pentiumii
TRANSCRIPT
-
7/27/2019 PentiumII
1/1
Reti
rem
ent
Unit
BIU
L1I-Cache
L1D-Cache
BTB
StoreBuffer
L2Cache To
RAM
FLOATINGPOINT
MATHUNIT
E
xecution
Jump
Uni
t
DECOD
EF E TC H
Buffer
64BIT
Dispatch
/Ex
ecu
t
e
ROB
ALU
Header
Tail
Instructions
Decoders
Uops
ALU
DONE
DONEDONEDONE
CHAPTER 12 HOW A MICROPROCESSOR WO
The dispatch/execute unitchecks eachsee if it has all the information neededwhen it finds a uop ready to process, tstores the result in the micro-op itself,
7
If a uops needs datexecute unit skips looks for the informnearby L-1 cache. Ithe processor checcache. Because thegrated with the CPbetween them 2-4 tween the CPU anda chip speed beginmation is retrievedgabytes a second, 528MB a second ifexternal memory t
8Instead of sitting in idle while that information isfetched, the execute unit continues inspectingeach uop in the buffer. When it finds a micro-opthat has all the information needed to process it,the unit executes it, stores the results in the uopitself, marks the code as completed, and movesonto the next uop in line. This is called specula-
tive executionbecause the order of uops in thecircular buffer is based upon the BTBs branchpredictions. The unit executes up to five uops si-multaneously. When the execution unit reachesthe end of the buffer, it starts at the head again,rechecking all the uops to see if any have finallyreceived the data it needs to be executed.
9
The decode unitsends all uops to
also called the ReOrder Buffer. Thmetic logic units(ALUs) that hanvolving integers. The ALUs use ahead and tail, that contains the uowhich the BTB predicted they wo
6
If an opera-tion involves floating-point numbers, such as3.14 or .33333, the ALUshand off the job to the floating-point math unit, whichcontains processing tools designed to manipulatefloating-point numbers quickly.
10
How a Pentium IIChip Works
PART 3 MICROCHIPS126
As the fetch/decodtions in the order pthree decoderswobreak up the more tions into uops, whbit micro-operationexecutionunit procfaster than it procelevel instruction.
5
While the fetch/decode unitis pulling in inI-cache, the branch target buffer(BTB) com
tion with a record in a separate set-aside bstruction has been used before. The BTB ifor instructions that involve branching, a sprograms execution could follow one of tfinds a branch instruction, it predicts, baseence, which path the program will take. Thter than 90 percent accurate.
4
The processor and cache share thesame 64-bit interface to the computersinformation. Program code or data ma-nipulated by that code move in and outof the chip at the PCs maximum busspeed, no more than 100Mhz even forprocessors that function internally at200MHz. Much of the Pentium Pro de-sign is structured to alleviate the busbottleneck by minimizing the times a
clock cyclesthe smallesttime in which a computercan do anythingticksaway without the proces-sor completing anoperation.
2 When information enters the processor through the bus interfaceduplicates the information, sends one copy to the CPUs closely lithe other to a pair oflevel 1 (L1) caches, ranging in size from 8-16the CPU. The BIU sends program code to the L1 instruction cachesends data to be used by the code to another L1 cache, the data c
3
Meanwhile, the circular bufferalso is being inspected by the re-tirement unit. It first checks tosee if the uop at the head of thebuffer has been executed. If ithasnt, the retirement unit keepschecking it until it has beenprocessed. Then the retirementunit checks the second and thirduops. If theyre already exe-cuted, the units sends all threeresultsits maximumto thestore buffer. There the predictionunit checks them out onelast time before theyresent to their properplace in systemRAM.
12
When an uop thathad been delayed is fi-nally processed, the executeunit compares the results withthose predicted by the BTB. Wherethe prediction fails, a componentcalled thejump execution unit(JEU) moves the end marker fromthe last uop in line to the uop thatwas predicted incorrectly. This sig-nals that all uops behind the endmarker should be ignored and maybe overwritten by new uops. TheBTB is told that its prediction wasincorrect, and that information be-comes part of its future predictions.
11
L2 cache
Socketconnectors
Processor
Intels Pentium microprocessor ismade up of two slices of silicon. Oneis the 5.5-9.5-million transistor CPU,where the softwares instructions areexecuted. The other is a level 2(L2),custom-designed high-speed mem-ory cache. Its 15.5-million transistorsstore up to 512 kilobytes of data andcode. Earlier processors typicallyused a cache separate from theprocessor, usually part of the com-puters motherboard.
1