single instructions can execute several low-level ...rlopes/mod5.1.pdf•risc cpu's have been...

CISC : A complex instruction set computer is a computer where single instructions can execute several low-level operations (such as a load from memory, an arithmetic operation, and a memory store) or are capable of multi-step operations or addressing modes within single instructions. This is because early computer architects tried to bridge the so-called semantic gap, i.e. to design instruction sets that directly supported high-level programming constructs such as procedure calls, loop control, and complex addressing modes, allowing data structure and array accesses to be combined into single instructions.

CISC Microprocessor

• Since the birth of the PC in the late 1970's there have been dramatic changes to the complexity of the CPU designs that have been aimed at desk top machines.

Intel 386DX 1985 32 32

Intel 386SX 1987 24 16

Intel 860 1989 32 64

Intel 486DX 1989 32 32

Intel486SX 1991 32 32

Motorola 68000 1980 24 16

Motorola 68020 1985 32 32

Motorola 68030 1987 32 32

Motorola 68040 1990 32 32

RISC : A computer architecture that reduces chip complexity by using simpler instructions that are designed to perform operations extremely quickly. Certain design features have been characteristic of most RISC processors: one cycle execution time: RISC processors have a CPI (clock per instruction) of one cycle. This is due to the optimization of each instruction on the CPU and pipelining. pipelining: a techique that allows for simultaneous execution of parts, or stages, of instructions to more efficiently process instructions. large number of registers: the RISC design philosophy generally incorporates a larger number of registers to prevent in large amounts of interactions with memory

RISC Features

• Single Cycle operation – The ultimate goal in any RISC design is that all

instructions will execute in 1 clock cycle irrespective of address mode.

– INTEL have simplified the micro code by the 80476 so that greater than 50% of the instructions in the common addressing modes execute in 1 external clock cycle.

• Fixed Instruction Format – In contrast to CISC Instruction which tend to have

obscure BiT oriented instruction sets, RISC Instructions tend to have a single common format and this leads to simplify decode and execution of the Instructions.

RISC Features

• LOAD / STORE Design – Since the RISC designs do not have the benefits of complex

address modes etc. it is necessary for software to have ample CPU registers to hold temporary working data values etc.

• RISC CPU's have been produced on the conventional Neumann architecture with a single data bus that carries both instructions and data.

• The Harvard architecture model which has separated Instruction and Data Bus offers in some respects a more complete RISC solution especially when considering the single cycle Instruction execution criteria.

In von-Neuman structure you can explore program memory and make any operation on data memory by by the mean of CPU. In Harvard the memory is split in two parts and the CPU can’t explore or make operations on such parts.

Simple Von Neuman vs Harvard Architectures

RISC Features

• Hardwired Control

– On a CISC CPU we have seen we are used to

seeing a complex micro code store and controller.

– However on a RISC CPU with the reduction of

instructions and address modes it is possible to

return to much more efficient techniques of

hardware decode of Instructions directly.

Hi level RISC versus CISC

Parallel processing

Processing instructions in parallel requires three major tasks:

1. checking dependencies between instructions to determine which instructions can be grouped together for parallel execution;

2. assigning instructions to the functional units on the hardware;

3. determining when instructions are initiated placed together into a single word.

EPIC (explicitly parallel instruction computer) is a 64-bit microprocessor instruction set, jointly defined and designed by Hewlett Packard and Intel, that provides up to 128 general and floating point unit registers and uses speculative loading, predication, and explicit parallelism to accomplish its computing tasks. IA-64 (Intel Architecture-64), Intel's first 64-bit CPU microarchitecture, is based on EPIC. EPIC permits microprocessors to execute software instructions in parallel by using the compiler, rather than complex on-die circuitry, to control parallel instruction execution. This was intended to allow simple performance scaling without resorting to higher clock frequencies. The EPIC architecture encodes its instructions into 128-bit-wide bundles. Each bundle contains three instructions encoded in 41 bits each and a 5-bit template field. The template field contains information about the types of instructions in the bundle and which instructions can be executed in parallel. This allows all the slots of an instruction to be filled even if enough independent instructions cannot be found. The template also specifies whether one or more instructions in this bundle can be executed in parallel with at least the first instruction of the next bundle.

VLIW (very long instruction word): Superscalar processors of the 1990s had the functional units to execute multiple instructions in parallel. However, they used a great deal of die area on scheduling circuits used to determine which instructions could execute in parallel. One suggested solution to this was Very Long Instruction Word (VLIW) architectures. VLIW architectures bundle multiple instructions that can be executed in parallel into a single long instruction. The compiler performs the scheduling, so that the processor avoids wasting run time and silicon area determining which instructions to execute in parallel. Most of the new Intel streaming SIMD extensions (SSE) are VLIW instructions.

Advantages of VLIW

Compiler prepares fixed packets of multiple operations that give the full "plan of execution"

– dependencies are determined by compiler and used to schedule according to function unit latencies

– function units are assigned by compiler and correspond to the position within the instruction packet ("slotting")

– compiler produces fully-scheduled, hazard-free code => hardware doesn't have to "rediscover" dependencies or schedule

Disadvantages of VLIW

Compatibility across implementations is a major problem

– VLIW code won't run properly with different number of function units or different latencies

– unscheduled events (e.g., cache miss) stall entire processor

Code density is another problem – low slot utilization (mostly nops)

– reduce nops by compression ("flexible VLIW", "variable-length VLIW")

Differences between EPIC and VLIW

VLIW code ends up being very tied to the pipeline it was scheduled for. It's difficult or impossible to change the pipeline depth and/or the mix of functional units without forcing a recompile of the code to match the new pipeline. EPIC encodes runs of instructions that are categorized into broad classes (memory, integer, etc.) separated by stops. Instructions grouped together between two stops must be independent (ie. no register dependencies), and can be safely issued in parallel. Different groups of instructions may be dependent on each other. The EPIC pipeline is protected, meaning that the pipeline will stall if you try to use an instruction result before it's ready. This is in contrast to the exposed pipelines of a traditional VLIW. EPIC leverages VLIW scheduling techniques to simplify its pipeline, however. A given EPIC processor has a certain number of functional units, and those functional units have certain latencies. An EPIC compiler can then schedule code as if it were running on a lesser-or-equal VLIW to avoid dependency stalls and register hazards, keeping the pipeline full. Because EPIC pipelines are fully protected, it's possible to run code compiled for one machine configuration on a differently configured device. You can change the latency of the instructions and/or the number of functional units. The code may not run optimally on a differently-configured machine, but it will run correctly.

Comparison: CISC, RISC, VLIW

Superscalar Implementation

• Simultaneously fetch multiple instructions

• Logic to determine true dependencies involving register values

• Mechanisms to communicate these values

• Mechanisms to initiate multiple instructions in parallel

• Resources for parallel execution of multiple instructions

• Mechanisms for committing process state in correct order

Superscalar Execution

Example Architectures • PowerPC 604

– six independent execution units: • Branch execution unit • Load/Store unit • 3 Integer units • Floating-point unit

– in-order issue – register renaming

• Power PC 620 – provides in addition to the 604 out-of-order issue

• Pentium – three independent execution units:

• 2 Integer units • Floating point unit

– in-order issue

single instructions can execute several low-level ...rlopes/mod5.1.pdf•risc cpu's have been...

Documents