instruction-level parallel processors chapter no. 4

INSTRUCTION-LEVEL PARALLEL PROCESSORS

Chapter No. 4

Evolution of ILP-processors

What is instruction-level parallelism?

The potential for executing certain instructions in parallel, because they are independent.

Any technique for identifying and exploiting such opportunities.

Instruction-level parallelism (ILP)

Basic blocks Sequences of instruction that appear between branches Usually no more than 5 or 6 instructions!

Loops for ( i=0; i<N; i++) x[i] = x[i] + s;

We can only realize ILP by finding sequences of independent instructions

Dependent instructions must be separated by a sufficient amount of time

Principle Operation of ILP- Processors

Pipelined Operation

Pipelined Operation

A number of functional units are employed in sequence to perform a single computation.

Each functional unit represent a certain stage of computation.

Pipeline allows overlapped execution of instructions.

It increases the overall processor’s throughput.

Superscalar Processors

Increase the ability of the processor to use instruction level parallelism.

Multiple instructions are issued every cycle multiple pipelines operating in parallel.

received sequential stream of instructions.The decode and execute unit then issues

multiple instructions for the multiple execution units in each cycle.

Superscalar Approach

VLIW architecture

Abbreviation of VLIW is Very Large Instruction Word.

VLIW architecture receives multi-operation instructions, i.e. with multiple fields for the control of EUs.

Basic structure of superscalar processors and VLIW is same multiple EUs, each capable of parallel

execution on data fetched from a register file.

VLIW Approach

Dependencies Between Instructions

Data dependencycontrol dependencyresource dependency

Dependencies Between Instructions

dependencesData

arise from precedence

Control

requirements concerningreferenced data

arise fromconditionalstatements

Resource

arise fromlimited resources

dependences

Dependences(constraints)

dependences

Data Dependency

An instruction j depends on data from previous instruction i cannot execute j before the earlier instruction

i cannot execute j and i simultaneously.

Data dependence's are properties of the program whether this leads to data hazard or stall in a

pipeline depending upon the pipeline organization.

Data Dependency

Data can be differentiated according to the data involved and according to their type. The data involved in dependency may be from

register or from memory the type of data dependency may be either in

a straight-line code or in a loop.

Data Dependency

Data Dependency in Straight- line Code

Straight-line code may contain three different types of dependencies: RAW (read after write) WAR (write after read) WAW (write after write)

RAW (Read after Write)

Read After Write (RAW)

i1: load r1, a; i2: add r2, r1, r1;

Assume a pipeline of Fetch/Decode/Execute/Mem/Writeback

Name Dependencies

A “name dependence” occurs when 2 instructions use the same register or memory location (a name) but there is no flow of data between the 2 instructions

There are 2 types: Antidependencies: Occur when an instruction j writes a

register or memory location that instruction i reads – and i is executed first

Corresponds to a WAR hazard

Output dependencies: Occur when instruction i and instruction j write the same register or memory location

Protected against by checking for WAW hazards

Write after Read (WAR)


i1: mul r1, r2, r3; r1 <= r2 * r3 i2: add r2, r4 , r5; r2 <= r4 + r5

If instruction i2 (add) is executed before instruction i1 (mul) for some reason, then i1 (mul) could read the wrong value for r2.


One reason for delaying i1 would be a stall for the ‘r3’ value being produced by a previous instruction. Instruction i2 could proceed because it has all its operands, thus causing the WAR hazard.

Use register renaming to eliminate WAR dependency. Replace r2 with some other register that has not been used yet.

Write after Write (WAW)


i1: mul r1, r2, r3; r1 <= r2 * r3 i2: add r1, r4 , r5; r2 <= r4 + r5

If instruction i1 (mul) finishes AFTER instruction i2 (add), then register r1 would get the wrong value. Instruction i1 could finish after instruction i2 if separate execution units were used for instructions i1 and i2.


One way to solve this hazard is to simply let instruction i1 proceed normally, but disable its write stage.

Data Dependency in Loops

Instruction belonging to a particular loop iteration may be dependent on the the instructions belonging to previous loop iterations.

This type of dependency is referred as recurrences or inter-iteration data dependency.

do X(I)= A*X(I-1) + Bend do


Loops are a “common case” in pretty much any program so this is worth mentioning…Consider:for (j=0; j<=100; j++) {A[j+1] = A[j] + C[j]; /*S1*/B[j+1] = B[j] + A[j+1]; /*S2*/}


Now, look at the dependence of S1 on an earlier iteration of S1 This is a loop-carried dependence; in other words, the

dependence exists b/t different iterations of the loop

Successive iterations of S1 must execute in order

S2 depends on S1 within an iteration and is not loop carried Multiple iterations of just this statement in a loop

could execute in parallel

Data Dependency in Graphs

Data dependency can also be represented by graphs

,tor a to denote RAW, WAR or WAR respectively. i1 i2

i3

i4

i5

t

o

o

ti1 : load r1, a;

i2: load r2, b;

i3: add r3, r1, r2;

i 4: mul r1, r2, r4;

i5: div r1, r2, r4;

Instruction interpretation: r3 (r1) (r2) etc.

i1 i2

I2 is dependent on i1

Control Dependence

Control dependence determines the ordering of an instruction with respect to a branch instruction if the branch is taken, the instruction is executed if the branch is not taken, the instruction is not

executed

An instruction that is not control dependent on a branch can not be moved before the branch instructions from then part of if-statement cannot be

executed before the branch

Control Dependence

An instruction that is not control dependent on a branch cannot be moved after the branch

other instructions cannot be moved into the then part of an if statementExample

sub r1, r2, r3;

jz zproc;

mul r4, r1, r1;

:

:

zproc: load r1, x:

:

Control Dependence

Branch ratio( %)Uncond. total branch

ratio (%) Uncond. branch ratio

(%)gen. purp. scientific gen. purp. scientific gen. purp. scientific

Yeh, Pat (1992) M8810024 5 78 82 20 4

Stephens & al. (1991)RS/6000

22 11 83 85 18 9

Lee, Smith (1984)IBM/370

29 10 73 73 21 8

PDP-11-70 39 46 18CDC 6400 8 53 4

Published branch statistics

Control Dependence

Frequent conditional branches impose a heavy performance constraint on ILP-processors.

Higher rate of instruction issue per cycle raise the probability of encountering conditional control dependency in each cycle.

Control Dependence

JC JC JC

JCJC JC

JC JC JC

JC JCJC

scalar issue

superscalar (or VLIW) issue

with 2 intructions/issue



t

JC: conditional branch

Control Dependency Graph

i5 i6

i0

i1

i2

i3

i4

i8

T F

T F

i7

i0: r1 = op1;

i1 : r2 = op2;

i2: r3 = op3;

i3: if (r2 >r1)

i4: { if (r3 >r1)

i5: r4 = r3;

i6: else r4 = r1}

i7: else r4 = r2;

i8: r5 = r4 * r4

Figure 5.2.11: An example for a CDG

Resource Dependency

An instruction is resource dependent on a previously issued instruction if it requires hardware resource which is still being used by previously issued instruction

If, for instance, there is only a single multiplication unit available, then in the code sequence

i1: div r1, r2, r3i2: div r4, r2, r5

i2 is resource dependent on i1

Instruction Scheduling

Why instruction scheduling is needed? Instruction scheduling involves:Detection

detect where dependency occurs in a code

Resolution removing dependencies from the code

two approaches for instruction scheduling static approach dynamic approach

Instruction Scheduling

Static Scheduling

Detection and resolution is accomplished by the compiler which avoids dependencies by reordering the code.

VLIW processors expects dependency free code generated by ILP compiler.

Dynamic Scheduling

Performed by the processorcontains two windows

issue windowcontains all fetched instructions which are

intended for issue in the next cycle.Issue window’s width is equal to issue rateall instructions are checked for

dependencies that may exists in an instruction.

Dynamic Scheduling

Execution windowcontains all those instructions which are still

in execution and whose results have not yet been produced are retained in execution window.

Instruction Scheduling in ILP-processors

Detection and resolution of dependencies

Parallel optimization

ILP-instruction scheduling

Parallel Optimization

Parallel optimization is achieved by reordering the sequence of instructions by appropriate code transformation for parallel execution.

Also, known as code restructuring or code reorganization.

ILP-instruction Scheduling

Preserving Sequential Consistency

To maintain the logical integrity of program

div r1, r2, r3;ad r5, r6, r7;jz anywhere;

Preserving Sequential Consistency

instruction-level parallel processors chapter no. 4

Documents