powerpc architecture term paper presentation by by umut yazkurt cmpe 511 fall 2003-2004 fall...

POWERPC ARCHITECTUREPOWERPC ARCHITECTURE

Term Paper PresentationTerm Paper Presentation

by by

Umut YazkurtUmut Yazkurt

CMPE 511CMPE 511

Fall 2003-2004Fall 2003-2004

HistoryHistory PowerPCPowerPC is a is a RISCRISC architecture architecture..

ItIt was jointly designed by Apple, IBM, and was jointly designed by Apple, IBM, and MotorolaMotorola by early 1990s by early 1990s. .

Aim was to form the basis of a new generation Aim was to form the basis of a new generation of high-performance low-cost products ranging of high-performance low-cost products ranging from low cost embedded controllers to massively from low cost embedded controllers to massively parallel supercomputersparallel supercomputers..

Because of its already largely installed software Because of its already largely installed software base, they began with IBM’s base, they began with IBM’s POWERPOWER architecture architecture which was developed for RS/6000 systems.which was developed for RS/6000 systems.

HistoryHistory

Apple, IBM, and Motorola designed the first four members of Apple, IBM, and Motorola designed the first four members of the PowerPC microprocessor family simultaneously. the PowerPC microprocessor family simultaneously.

PowerPC 601™PowerPC 601™ : the first 32 bit implementation of the : the first 32 bit implementation of the PowerPC architecture providing medium levels of PowerPC architecture providing medium levels of performance for desktop computers and workstations. performance for desktop computers and workstations.

PowerPC 603™PowerPC 603™ : : a 32-bita 32-bit low-power processor primarily for low-power processor primarily for cost-sensitive desktop and portable personal computer cost-sensitive desktop and portable personal computer systems.systems.

PowerPC 604™PowerPC 604™ : 32-bit implementations of the PowerPC : 32-bit implementations of the PowerPC architecture designed for use in high performance desktop, architecture designed for use in high performance desktop, workstation, and symmetric multiprocessing computer workstation, and symmetric multiprocessing computer systems. systems.

PowerPC 620™PowerPC 620™ : 64-bit implementation of the PowerPC : 64-bit implementation of the PowerPC architecture providing high levels of performance for architecture providing high levels of performance for technical and scientific workstations, application and LAN technical and scientific workstations, application and LAN servers and symmetric multiprocessing computer systems. servers and symmetric multiprocessing computer systems.

HistoryHistory

601601

(G1)(G1)604/604604/604E (G2)E (G2)

740/750 740/750 (G3)(G3)

G4G4 G5G5

First First shippingshipping

YearYear

19931993 19941994 19971997 19991999 20032003

Clock Clock Speed Speed (MHZ)(MHZ)

50-12050-120 166-350166-350 200-366200-366 500-1400500-1400 Up to Up to 20002000

L1 CacheL1 Cache -- 32kb 32kb inst inst

32kb 32kb datadata

32kb inst32kb inst

32kb data32kb data32kb inst32kb inst

32kb data32kb data64kb inst64kb inst

32kb 32kb datadata

L2 CacheL2 Cache

SupportSupport-- -- 256k – 1Mb256k – 1Mb 256kb-256kb-

1Mb 1Mb 512kb on 512kb on diedie

# of trans# of trans

(10^6)(10^6)2.82.8 3.6-5.13.6-5.1 6.356.35 10.5 10.5 Over 58Over 58

GeneralGeneral

The PowerPC architecture specifies an The PowerPC architecture specifies an instruction set architecture (ISA)instruction set architecture (ISA)..

It is independent of implementation It is independent of implementation aspects.aspects.

It allowsIt allows anyone to design and fabricate anyone to design and fabricate compatible PowerPC processorscompatible PowerPC processors independent of implementation differences independent of implementation differences as the technology advances.as the technology advances.

GeneralGeneral

All PowerPC processors run the same core All PowerPC processors run the same core PowerPC instruction set. PowerPC instruction set.

They differ primarily in the degree of They differ primarily in the degree of dedicated hardware support for multiple dedicated hardware support for multiple execution units, cache size and capability, execution units, cache size and capability, length of pipeline, and interface busses. length of pipeline, and interface busses.

These differences result in different These differences result in different tradeoffs in processing performance, die tradeoffs in processing performance, die area, and power dissipation.area, and power dissipation.

Programming ModelProgramming Model

The PowerPC architecture is a full 64-bit The PowerPC architecture is a full 64-bit architecture with full 64-bit integers and 64-bit architecture with full 64-bit integers and 64-bit logical address pointers.logical address pointers.

It also has a well defined 32-bit subset. Designers It also has a well defined 32-bit subset. Designers may implement either 32- or 64-bit machines. To may implement either 32- or 64-bit machines. To enable 32-bit applications to run on all PowerPC enable 32-bit applications to run on all PowerPC processors, 64 bit machines are required to processors, 64 bit machines are required to support a 32-bit operating mode.support a 32-bit operating mode.

The 32-bit processors have 32-bit wide general The 32-bit processors have 32-bit wide general

registers and branch-address registers; 64 bit registers and branch-address registers; 64 bit processors have 64-bit wide registers. processors have 64-bit wide registers.

Programming Model Programming Model

Instructions always operate on machine’s full Instructions always operate on machine’s full register width: 32 or 64 bits.register width: 32 or 64 bits.

Instructions are mode independent ; a given Instructions are mode independent ; a given instruction operates the same on 32-bit machines instruction operates the same on 32-bit machines , 64-bit machines, and 64-bit machines operating , 64-bit machines, and 64-bit machines operating in 32-bit mode.in 32-bit mode.

AA 64-bit machine operating in 32-bit mode passes 64-bit machine operating in 32-bit mode passes only the low-order 32 bits of an address to the only the low-order 32 bits of an address to the address translation mechanism, and the ALU address translation mechanism, and the ALU calculates carry and over-flow based on a 32-bit calculates carry and over-flow based on a 32-bit result.result.

Logical Address Space Logical Address Space

For 32-bit machines and 64-bit For 32-bit machines and 64-bit machines operating in the 32-bit machines operating in the 32-bit mode, the linear array of bytes that mode, the linear array of bytes that can be addressed by a pointer is 4 can be addressed by a pointer is 4 gigabytes. gigabytes.

For 64-bit machines operating in 64-For 64-bit machines operating in 64-bit mode, 18 terabytes of memory bit mode, 18 terabytes of memory can be addressed. can be addressed.

InitializationInitialization

When the processor is first initialized, it is in When the processor is first initialized, it is in supervisor (also called privileged) mode. In this mode, supervisor (also called privileged) mode. In this mode, all processor resources, including registers and all processor resources, including registers and instructions are accessible. instructions are accessible.

The processor can limit access to certain privileged The processor can limit access to certain privileged registers and instructions by placing itself in user registers and instructions by placing itself in user mode. mode.

This protection limits application code from being able This protection limits application code from being able to modify global and sensitive resources, such as the to modify global and sensitive resources, such as the caches, memory management system, and timerscaches, memory management system, and timers. .

Architecture defines five types of registers : Architecture defines five types of registers :

Special Purpose Registers (SPRs)Special Purpose Registers (SPRs) General Purpose Registers (GPRs)General Purpose Registers (GPRs) Floating Point Registers (FPRs)Floating Point Registers (FPRs) Device Control Registers (DCRs)Device Control Registers (DCRs) Machine State Register (MSR)Machine State Register (MSR)

RegistersRegisters

RegistersRegisters SPRs give status and control of resources SPRs give status and control of resources

within the processor core. within the processor core.

RegistersRegisters

Five important user mode SPRs are:Five important user mode SPRs are:

The Fixed-Point Exception Register (XER) is used for indicating The Fixed-Point Exception Register (XER) is used for indicating conditions for integer operations, such as carries and overflows. conditions for integer operations, such as carries and overflows.

The Floating-Point Status and Control Register (FPSCR) is a 32-bit The Floating-Point Status and Control Register (FPSCR) is a 32-bit register used to store the status and control of the floating-point register used to store the status and control of the floating-point operations. operations.

The Count Register (CTR) is used to hold a loop count that can be The Count Register (CTR) is used to hold a loop count that can be decremented during the execution of branch instructions. decremented during the execution of branch instructions.

The Condition Register (CR) is a 32-bit register grouped into eight The Condition Register (CR) is a 32-bit register grouped into eight fields, where each field is 4 bits that signify the result of an fields, where each field is 4 bits that signify the result of an instruction’s operation: Equal (EQ), Greater Than (GT), Less Than instruction’s operation: Equal (EQ), Greater Than (GT), Less Than (LT), and Summary Overflow (SO). (LT), and Summary Overflow (SO).

The Link Register (LR) contains the address to return to at the The Link Register (LR) contains the address to return to at the end of a function call. end of a function call.

RegistersRegisters

General Purpose Registers :General Purpose Registers :

The Architecture specifies that all The Architecture specifies that all implementations have 32 GPRs (GPR0 - GPR31). implementations have 32 GPRs (GPR0 - GPR31).

GPRs are the source and destination of all fixed-GPRs are the source and destination of all fixed-point operations and load/store operations. They point operations and load/store operations. They also provide access to SPRs and DCRs. also provide access to SPRs and DCRs.

They are all available for use in every instruction They are all available for use in every instruction with one exception: In certain instructions, GPR0 with one exception: In certain instructions, GPR0 simply means “0” and no lookup is done for simply means “0” and no lookup is done for GPR0’s contents. GPR0’s contents.

RegistersRegistersFloating Point Registers :Floating Point Registers :

The PowerPC architecture provides thirty-two 64-bit The PowerPC architecture provides thirty-two 64-bit floating-point registers. floating-point registers.

Device Control Registers :Device Control Registers :

DCRs are similar to SPRs in that they give status and DCRs are similar to SPRs in that they give status and control information, but DCRs are for resources control information, but DCRs are for resources outside the processor core. outside the processor core.

DCRs allow for memory-mapped I/O control without DCRs allow for memory-mapped I/O control without using up portions of the memory address space. using up portions of the memory address space.

RegistersRegisters Machine StateMachine State Register : Register :

MSR represents the state of the machine. MSR represents the state of the machine.

It is accessed only in supervisor mode, and contains It is accessed only in supervisor mode, and contains the settings for things such as memory translation, the settings for things such as memory translation, cache settings, interrupt enables, user/privileged cache settings, interrupt enables, user/privileged state, and floating point availability. Exact control state, and floating point availability. Exact control bits vary by implementation. bits vary by implementation.

The MSR does not readily fit into the SPR/DCR/GPR The MSR does not readily fit into the SPR/DCR/GPR classification, as it contains its own pair of classification, as it contains its own pair of instructions to read and write the contents of the instructions to read and write the contents of the MSR into a GPR. MSR into a GPR.

Data TypesData Types PowerPC can deal with data types of 8–bits (byte), 16-bits (halfword), PowerPC can deal with data types of 8–bits (byte), 16-bits (halfword),

32-bits (word) and 64-bits (doubleword) in length. It can use either 32-bits (word) and 64-bits (doubleword) in length. It can use either little-endian or big-endian style; that is, the least significant byte is little-endian or big-endian style; that is, the least significant byte is stored in the lowest or highest address.stored in the lowest or highest address.

Fixed-point data types include:Fixed-point data types include:

* Unsigned byte* Unsigned byte

* Unsigned halfword* Unsigned halfword

* Signed halfword * Signed halfword

* Unsigned word * Unsigned word

* Signed word * Signed word

* Unsigned doubleword * Unsigned doubleword

* Byte Strings: From 0 – 128 bytes in length* Byte Strings: From 0 – 128 bytes in length

Floating-point data types include IEEE-754 single- and double-precision Floating-point data types include IEEE-754 single- and double-precision types.types.

Instruction FormatInstruction Format

The architecture encodes all instructions in 32 bits The architecture encodes all instructions in 32 bits and aligns them on word address boundaries in and aligns them on word address boundaries in memory.memory.

Instructions are first decoded by the upper 6 bits, in a Instructions are first decoded by the upper 6 bits, in a field called the primary opcode. The remaining 26 bits field called the primary opcode. The remaining 26 bits contain operands and/or reserved fields.contain operands and/or reserved fields.

Different types of instructions defined are : Different types of instructions defined are :

ALU, Floating Point , Load/Store, Branch, Condition ALU, Floating Point , Load/Store, Branch, Condition and Synchronization Instructions and Synchronization Instructions

Instruction TypesInstruction Types

Addressing ModesAddressing ModesThree types of operand addressing : Three types of operand addressing :

Memory operand addressing:Memory operand addressing: Indirect addressing : Indirect addressing : * Base address in a GPR + a 16-bit sign-extended literal * Base address in a GPR + a 16-bit sign-extended literal Indirect-indexed addressing : Indirect-indexed addressing : * Base address in a GPR + displacement from another GPR * Base address in a GPR + displacement from another GPR

ALU and Floating-point instruction operandALU and Floating-point instruction operand addressing:addressing: Three-register Format Three-register Format

Branch Operand Addressing : Branch Operand Addressing : Absolute : Use the literal as the absolute address.Absolute : Use the literal as the absolute address. Relative : Use the literal as the displacement from the branch Relative : Use the literal as the displacement from the branch

instruction address.instruction address. Indirect : Take the target address from the LR or CTR registersIndirect : Take the target address from the LR or CTR registers

PowerPC G4e PipeliningPowerPC G4e Pipelining Seven Stage PipelineSeven Stage Pipeline

Superscalar Microprocessor Superscalar Microprocessor – allows multiple – allows multiple instructions to be executed in parallel.instructions to be executed in parallel.

Nine Execution UnitsNine Execution Units BPU : Branch Processing UnitBPU : Branch Processing Unit VPU : Vector Permute UnitVPU : Vector Permute Unit VIU : Vector Integer UnitVIU : Vector Integer Unit VCIU : Vector Complex Integer UnitVCIU : Vector Complex Integer Unit VFPU : Vector Floating Point UnitVFPU : Vector Floating Point Unit FPU : Floating Point UnitFPU : Floating Point Unit IU : Integer UnitIU : Integer Unit CIU : Complex Integer UnitCIU : Complex Integer Unit LSU : Load/Store Unit LSU : Load/Store Unit

G4e’s G4e’s microarchitectmicroarchitect

ure with ure with emphasis on emphasis on

pipeline stages pipeline stages of the front of the front end and the end and the functional functional

units. units.

PowerPC G4e Pipeline StagesPowerPC G4e Pipeline Stages

Stages 1 and 2 - Instruction Fetch:Stages 1 and 2 - Instruction Fetch:

These two stages are both dedicated primarily to These two stages are both dedicated primarily to grabbing an instruction from the L1 cache. grabbing an instruction from the L1 cache.

The G4e can fetch four instructions per clock cycle from The G4e can fetch four instructions per clock cycle from the L1 cache and send them on to the next stagethe L1 cache and send them on to the next stage

Stage 3 - Decode/Dispatch: Stage 3 - Decode/Dispatch:

Once an instruction has been fetched, it goes into a 12-Once an instruction has been fetched, it goes into a 12-entry instruction queue to be decoded.entry instruction queue to be decoded.

The G4e's decoder can dispatch up to three instructions The G4e's decoder can dispatch up to three instructions

per clock cycle to the next stage.per clock cycle to the next stage.

PowerPC G4e Pipeline StagesPowerPC G4e Pipeline Stages

Stage 4 - Issue:Stage 4 - Issue:

The first queue Floating-Point Issue Queue The first queue Floating-Point Issue Queue (FIQ), which holds floating-point (FP) (FIQ), which holds floating-point (FP) instructions that are waiting to be executed.instructions that are waiting to be executed.

The second is the Vector Issue Queue (VIQ), The second is the Vector Issue Queue (VIQ), which holds vector operations.which holds vector operations.

The third queue is the General Instruction The third queue is the General Instruction Queue (GIQ), which holds everything else.Queue (GIQ), which holds everything else.

Once the instruction leaves its issue queue, it Once the instruction leaves its issue queue, it goes to the execution engine to be executed.goes to the execution engine to be executed.

PowerPC G4e Pipeline StagesPowerPC G4e Pipeline Stages Stage 5 - Execute:Stage 5 - Execute:

The instructions can pass out-of-order from The instructions can pass out-of-order from their issue queues into their respective their issue queues into their respective functional units and be executed. functional units and be executed.

Stage 6 and 7 - Complete and Write-Back Stage 6 and 7 - Complete and Write-Back ::

In these two stages, the instructions are put In these two stages, the instructions are put back into the order in which they came into the back into the order in which they came into the processor, and their results are written back to processor, and their results are written back to memory.memory.

Inside of IBM PowerPC 405lp Processor Inside of IBM PowerPC 405lp Processor

powerpc architecture term paper presentation by by umut yazkurt cmpe 511 fall 2003-2004 fall...

Documents