instruction level parallalism

INSTRUCTION LEVEL

PARALLALISMPRESENTED BY KAMRAN ASHRAF

13-NTU-4009

INTRODUCTION

Instruction-level parallelism (ILP) is a

measure of how many operations in a

computer program can be performed

"in-parallel" at the same time

http://simple.wikipedia.org/wiki/Computer_program

WHAT IS A PARALLEL INSTRUCTION?

Parallel instructions are a set of instructions that do not depend on each other to be executed.

Hierarchy

Bit level Parallelism

• 16 bit add on 8 bit processor

Instruction level Parallelism

Loop level Parallelism

• for (i=1; i<=1000; i= i+1) x[i] = x[i] + y[i];

Thread level Parallelism

• multi-core computers

EXAMPLE

Consider the following program:

1. e = a + b

2. f = c + d

3. g = e * f

Operation 3 depends on the results of "e" and "f" which are calculated from operations 1 and2, so "g" cannot be calculated until both of "e" and "f" are computed.

However, operations 1 and 2 do not depend on any other operation, so they can becomputed simultaneously.

If we assume that each operation can be completed in one unit of time then these threeinstructions can be completed in a total of two units of time, giving an ILP factor of 3/2;which means 3/2 = 1.5 greater than without ILP.

http://simple.wikipedia.org/wiki/Computation

http://simple.wikipedia.org/wiki/Computation

WHY ILP?

One of the goals of compilers and processors designers is to use as much ILP as

possible.

Ordinary programs are written execute instructions in sequence; one after the other, in

the order as written by programmers.

ILP allows the compiler and the processor to overlap the execution of multiple

instructions or even to change the order in which instructions are executed.

http://simple.wikipedia.org/wiki/Compiler

http://simple.wikipedia.org/wiki/Central_processing_unit

http://simple.wikipedia.org/wiki/Design

http://simple.wikipedia.org/w/index.php?title=Execution_(computing)&action=edit&redlink=1

http://simple.wikipedia.org/wiki/Instruction_(computer_science)

http://simple.wikipedia.org/wiki/Computer_program

ILP TECHNIQUES

Micro-architectural techniques that use ILP include:

Instruction pipelining

Superscalar

Out-of-order execution

Register renaming

Speculative execution

Branch prediction

http://simple.wikipedia.org/wiki/Instruction_pipelining

http://simple.wikipedia.org/wiki/Superscalar

http://simple.wikipedia.org/wiki/Out-of-order_execution

http://simple.wikipedia.org/wiki/Register_renaming

http://simple.wikipedia.org/wiki/Speculative_execution

http://simple.wikipedia.org/wiki/Branch_prediction

INSTRUCTION PIPELINE

An instruction pipeline is a technique

used in the design of modern

microprocessors, microcontrollers and

CPUs to increase their instruction

throughput (the number of instructions

that can be executed in a unit of time).

http://simple.wikipedia.org/wiki/Technique

http://simple.wikipedia.org/wiki/Microprocessor

http://simple.wikipedia.org/wiki/Microcontroller



PIPELINING

The main idea is to divide the processing of a CPU instruction

into a series of independent steps of "microinstructions with

storage at the end of each step.

This allows the CPUs control logic to handle instructions at the

processing rate of the slowest step, which is much faster than

the time needed to process the instruction as a single step.




EXAMPLE

For example, the RISC pipeline is broken into five stages with a set of flip flops between

each stage as follow:

Instruction fetch

Instruction decode & register fetch

Execute

Memory access

Register write back

The vertical axis is successive instructions, the horizontal axis is time. So in the green

column, the earliest instruction is in WB stage, and the latest instruction is undergoing

instruction fetch.

http://simple.wikipedia.org/wiki/RISC

http://upload.wikimedia.org/wikipedia/commons/2/21/Fivestagespipeline.png

http://upload.wikimedia.org/wikipedia/commons/2/21/Fivestagespipeline.png

SUPERSCALER

A superscalar CPU architecture

implements ILP inside a single processor

which allows faster CPU throughput at the

same clock rate.

WHY SUPERSCALER

A superscalar processor executes more than one instruction during a clock

cycle

Simultaneously dispatches multiple instructions to multiple redundant

functional units built inside the processor.

Each functional unit is not a separate CPU core but an execution resource

inside the CPU such as an arithmetic logic unit, floating point unit (FPU), a

bit shifter, or a multiplier.

http://simple.wikipedia.org/wiki/Arithmetic_logic_unit

http://simple.wikipedia.org/wiki/Floating_point_unit

http://simple.wikipedia.org/w/index.php?title=Multiplication_ALU&action=edit&redlink=1

EXAMPLE

Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a

maximum of two instructions per cycle can be completed.

http://upload.wikimedia.org/wikipedia/commons/c/ce/Superscalarpipeline.png

http://upload.wikimedia.org/wikipedia/commons/c/ce/Superscalarpipeline.png

OUT-OF-ORDER EXECUTION

OoOE, is a technique used in most high-performance microprocessors.

The key concept is to allow the processor toavoid a class of delays that occur when the dataneeded to perform an operation are unavailable.

Most modern CPU designs include support for outof order execution.

http://simple.wikipedia.org/wiki/Technique

STEPS

Out-of-order processors breaks up the processing of instructions into these steps:

Instruction fetch.

Instruction dispatch to an instruction queue (also called instruction buffer)

The instruction waits in the queue until its input operands are available.

The instruction is issued to the appropriate functional unit and executed by that unit.

The results are queued (Re-order Buffer).

Only after all older instructions have their results written back to the register file, then this

result is written back to the register.

http://simple.wikipedia.org/wiki/Execution_unit

http://simple.wikipedia.org/w/index.php?title=Processor_register&action=edit&redlink=1

OTHER ILP TECHNIQUES

Register renaming which is a technique used to avoid unnecessary serialization of

program operations caused by the reuse of registers by those operations, in order to

enable out-of-order execution.

Speculative execution which allow the execution of complete instructions or parts of

instructions before being sure whether this execution is required.

Branch prediction which is used to avoid delays cause of control dependencies to be

resolved. Branch prediction determines whether a conditional branch (jump) in the

instruction flow of a program is likely to be taken or not.

http://simple.wikipedia.org/wiki/Register_renaming

http://simple.wikipedia.org/wiki/Speculative_execution

http://simple.wikipedia.org/wiki/Branch_prediction

http://simple.wikipedia.org/wiki/Logical_implication

THANKS

instruction level parallalism

Documents

hierarchybit level parallelism

parallel instructions

set of instructions

kamran ashraf13ntu

computer program