instruction level parallalism

16
INSTRUCTION LEVEL PARALLALISM PRESENTED BY KAMRAN ASHRAF 13-NTU-4009

Upload: kamran-ashraf

Post on 19-Jul-2015

31 views

Category:

Documents


0 download

TRANSCRIPT

INSTRUCTION LEVEL

PARALLALISMPRESENTED BY KAMRAN ASHRAF

13-NTU-4009

INTRODUCTION

Instruction-level parallelism (ILP) is a

measure of how many operations in a

computer program can be performed

"in-parallel" at the same time

WHAT IS A PARALLEL INSTRUCTION?

Parallel instructions are a set of instructions that do not depend on each other to be executed.

Hierarchy

Bit level Parallelism

• 16 bit add on 8 bit processor

Instruction level Parallelism

Loop level Parallelism

• for (i=1; i<=1000; i= i+1) x[i] = x[i] + y[i];

Thread level Parallelism

• multi-core computers

EXAMPLE

Consider the following program:

1. e = a + b

2. f = c + d

3. g = e * f

Operation 3 depends on the results of "e" and "f" which are calculated from operations 1 and2, so "g" cannot be calculated until both of "e" and "f" are computed.

However, operations 1 and 2 do not depend on any other operation, so they can becomputed simultaneously.

If we assume that each operation can be completed in one unit of time then these threeinstructions can be completed in a total of two units of time, giving an ILP factor of 3/2;which means 3/2 = 1.5 greater than without ILP.

WHY ILP?

One of the goals of compilers and processors designers is to use as much ILP as

possible.

Ordinary programs are written execute instructions in sequence; one after the other, in

the order as written by programmers.

ILP allows the compiler and the processor to overlap the execution of multiple

instructions or even to change the order in which instructions are executed.

INSTRUCTION PIPELINE

An instruction pipeline is a technique

used in the design of modern

microprocessors, microcontrollers and

CPUs to increase their instruction

throughput (the number of instructions

that can be executed in a unit of time).

PIPELINING

The main idea is to divide the processing of a CPU instruction

into a series of independent steps of "microinstructions with

storage at the end of each step.

This allows the CPUs control logic to handle instructions at the

processing rate of the slowest step, which is much faster than

the time needed to process the instruction as a single step.

EXAMPLE

For example, the RISC pipeline is broken into five stages with a set of flip flops between

each stage as follow:

Instruction fetch

Instruction decode & register fetch

Execute

Memory access

Register write back

The vertical axis is successive instructions, the horizontal axis is time. So in the green

column, the earliest instruction is in WB stage, and the latest instruction is undergoing

instruction fetch.

SUPERSCALER

A superscalar CPU architecture

implements ILP inside a single processor

which allows faster CPU throughput at the

same clock rate.

WHY SUPERSCALER

A superscalar processor executes more than one instruction during a clock

cycle

Simultaneously dispatches multiple instructions to multiple redundant

functional units built inside the processor.

Each functional unit is not a separate CPU core but an execution resource

inside the CPU such as an arithmetic logic unit, floating point unit (FPU), a

bit shifter, or a multiplier.

EXAMPLE

Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a

maximum of two instructions per cycle can be completed.

OUT-OF-ORDER EXECUTION

OoOE, is a technique used in most high-performance microprocessors.

The key concept is to allow the processor toavoid a class of delays that occur when the dataneeded to perform an operation are unavailable.

Most modern CPU designs include support for outof order execution.

STEPS

Out-of-order processors breaks up the processing of instructions into these steps:

Instruction fetch.

Instruction dispatch to an instruction queue (also called instruction buffer)

The instruction waits in the queue until its input operands are available.

The instruction is issued to the appropriate functional unit and executed by that unit.

The results are queued (Re-order Buffer).

Only after all older instructions have their results written back to the register file, then this

result is written back to the register.

OTHER ILP TECHNIQUES

Register renaming which is a technique used to avoid unnecessary serialization of

program operations caused by the reuse of registers by those operations, in order to

enable out-of-order execution.

Speculative execution which allow the execution of complete instructions or parts of

instructions before being sure whether this execution is required.

Branch prediction which is used to avoid delays cause of control dependencies to be

resolved. Branch prediction determines whether a conditional branch (jump) in the

instruction flow of a program is likely to be taken or not.

THANKS