pentiumii

Upload: amar-shahid

Post on 02-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 PentiumII

    1/1

    Reti

    rem

    ent

    Unit

    BIU

    L1I-Cache

    L1D-Cache

    BTB

    StoreBuffer

    L2Cache To

    RAM

    FLOATINGPOINT

    MATHUNIT

    E

    xecution

    Jump

    Uni

    t

    DECOD

    EF E TC H

    Buffer

    64BIT

    Dispatch

    /Ex

    ecu

    t

    e

    ROB

    ALU

    Header

    Tail

    Instructions

    Decoders

    Uops

    ALU

    DONE

    DONEDONEDONE

    CHAPTER 12 HOW A MICROPROCESSOR WO

    The dispatch/execute unitchecks eachsee if it has all the information neededwhen it finds a uop ready to process, tstores the result in the micro-op itself,

    7

    If a uops needs datexecute unit skips looks for the informnearby L-1 cache. Ithe processor checcache. Because thegrated with the CPbetween them 2-4 tween the CPU anda chip speed beginmation is retrievedgabytes a second, 528MB a second ifexternal memory t

    8Instead of sitting in idle while that information isfetched, the execute unit continues inspectingeach uop in the buffer. When it finds a micro-opthat has all the information needed to process it,the unit executes it, stores the results in the uopitself, marks the code as completed, and movesonto the next uop in line. This is called specula-

    tive executionbecause the order of uops in thecircular buffer is based upon the BTBs branchpredictions. The unit executes up to five uops si-multaneously. When the execution unit reachesthe end of the buffer, it starts at the head again,rechecking all the uops to see if any have finallyreceived the data it needs to be executed.

    9

    The decode unitsends all uops to

    also called the ReOrder Buffer. Thmetic logic units(ALUs) that hanvolving integers. The ALUs use ahead and tail, that contains the uowhich the BTB predicted they wo

    6

    If an opera-tion involves floating-point numbers, such as3.14 or .33333, the ALUshand off the job to the floating-point math unit, whichcontains processing tools designed to manipulatefloating-point numbers quickly.

    10

    How a Pentium IIChip Works

    PART 3 MICROCHIPS126

    As the fetch/decodtions in the order pthree decoderswobreak up the more tions into uops, whbit micro-operationexecutionunit procfaster than it procelevel instruction.

    5

    While the fetch/decode unitis pulling in inI-cache, the branch target buffer(BTB) com

    tion with a record in a separate set-aside bstruction has been used before. The BTB ifor instructions that involve branching, a sprograms execution could follow one of tfinds a branch instruction, it predicts, baseence, which path the program will take. Thter than 90 percent accurate.

    4

    The processor and cache share thesame 64-bit interface to the computersinformation. Program code or data ma-nipulated by that code move in and outof the chip at the PCs maximum busspeed, no more than 100Mhz even forprocessors that function internally at200MHz. Much of the Pentium Pro de-sign is structured to alleviate the busbottleneck by minimizing the times a

    clock cyclesthe smallesttime in which a computercan do anythingticksaway without the proces-sor completing anoperation.

    2 When information enters the processor through the bus interfaceduplicates the information, sends one copy to the CPUs closely lithe other to a pair oflevel 1 (L1) caches, ranging in size from 8-16the CPU. The BIU sends program code to the L1 instruction cachesends data to be used by the code to another L1 cache, the data c

    3

    Meanwhile, the circular bufferalso is being inspected by the re-tirement unit. It first checks tosee if the uop at the head of thebuffer has been executed. If ithasnt, the retirement unit keepschecking it until it has beenprocessed. Then the retirementunit checks the second and thirduops. If theyre already exe-cuted, the units sends all threeresultsits maximumto thestore buffer. There the predictionunit checks them out onelast time before theyresent to their properplace in systemRAM.

    12

    When an uop thathad been delayed is fi-nally processed, the executeunit compares the results withthose predicted by the BTB. Wherethe prediction fails, a componentcalled thejump execution unit(JEU) moves the end marker fromthe last uop in line to the uop thatwas predicted incorrectly. This sig-nals that all uops behind the endmarker should be ignored and maybe overwritten by new uops. TheBTB is told that its prediction wasincorrect, and that information be-comes part of its future predictions.

    11

    L2 cache

    Socketconnectors

    Processor

    Intels Pentium microprocessor ismade up of two slices of silicon. Oneis the 5.5-9.5-million transistor CPU,where the softwares instructions areexecuted. The other is a level 2(L2),custom-designed high-speed mem-ory cache. Its 15.5-million transistorsstore up to 512 kilobytes of data andcode. Earlier processors typicallyused a cache separate from theprocessor, usually part of the com-puters motherboard.

    1