adv ca cheatsheet nn

=

=

Instr.

Count

CPI Clock

Rate

Program

Compiler

ISA

throughput 1 - 1 Superscalar

Multicycle operations STALLS in-order-execution

Multi-cycle Operation & Hazards

Non-pipelined unitsStructural Hazards Different LatencyStructural Hazards Wrong Order in WBWAW Hazards Out-of-order completion

Too much latencyStallsRAW

Instruction Level Parallelism

. =#

#

Oper. Latency (OL), Machine

Parallelism (ML), Issue Latency

(IL), Issue Parallelism (IP)

ILP

Superpipelined: cycle = 1/m of baseline, IP = 1 instr./minor cycle, OL=1

Maj.c. = m minor c, IL=1mc, MP=m x k, IPC(max)= m instr./ M.c.

Superscalar: IP=n instr./cycle, OP=1cycle,IPC(max)= n instr./cycle

VLIW: IP=n instr./cycle, OP=1cycle,IPC(max)= n VLIW/cycle

Superpipelined-Superscalar: IP = n inst../m.c., OP= m m.c., IPC(max)= n x m

Simulation

Functional VS Performance/Timing

F: Visible Arch. State,

,

(reg, mem), output

Software dev &/| Emulation P/T: Microarch details,

,

Trace VS Execution-driven

T: ,

traces as inputs,

proprietary apps & input sets.

E-D: simulator ,

app&arch state

User-Code VS Full System

1

1

2

Control Depedencies

Stalls

Branch Delay Slots

Predicated execution

Fine-grained Multithreading

Multipath execution

Prediction

2

(2,2)

64 entries

4 l.o. PC

2 bits gl.h.

2 X

Branch-Target

Buffer

3

OOO Exec Problems

Scoreboarding (1963 CDC6600)

Tomasulos Algor. (renaming + RS)

ISSUEEXECUTEWRITE RESULT VS Explicit Renaming:

3

4

Precise Interupts/Exceptions

Precise:

, &,

.

; -: TLB faults, Underflow,

,

4

ReOrderBuffer (ROB)

FIFO Commit InOrder !

br. mispred. flush

+: dependency checking,

branch prediction, bubbles,

throughput, utilization, latency

tolerance

-: Hardware,

single-thread-performance,

SISD SIMD MISD MIMD

MIMD: Process ( ) Thread

Cache Coherence

- HW: Cache invalidation

- SW: Flushing Cache Lines

- Non-Cacheable DMA

: Coherent ,

, , :

thread

.

: 1. 2. Write propagation 3.Write serialization

:

Invalidation VS Update Protocol

cache block. block,

;.

NAI: - Invalidation: Read miss transactions - Update: Read hit transaction : - Invalidation: bus / .

- Update:

4C Cache Misses Model

Compulsory: block size Capacity: block cache size Conflict: block mapped set associativity Coherence: Communiction, True Sharing: 1 Data word 2+ / False Sharing: DW c.b

Cache Coherence VS Memory Consistency

Coherence:

( cache line)

Consistency:

,

Memory Consistency Models

Sequential Consistency:

PROGRAM ORDER

ATOMICITY

Relaxed Memory Models:

Total Store Order (TSO), Processor Consistency (PC)

Partial Store Order (PSO)

Weak Ordering (WO), Release Consistency (RC), Relaxed Memory Order (RMO)

Use of: SYNCH

(Acquiring Locks) ISA

Spin-lock (Test-and-Set):

Test-and-Test-and-Set:

lock

.

adv ca cheatsheet nn

Documents