adv ca cheatsheet nn
DESCRIPTION
Advanced Computer Architecture Cheat SheetTRANSCRIPT
-
=
=
Instr.
Count
CPI Clock
Rate
Program
Compiler
ISA
throughput 1 - 1 Superscalar
Multicycle operations STALLS in-order-execution
Multi-cycle Operation & Hazards
Non-pipelined unitsStructural Hazards Different LatencyStructural Hazards Wrong Order in WBWAW Hazards Out-of-order completion
Too much latencyStallsRAW
Instruction Level Parallelism
. =#
#
Oper. Latency (OL), Machine
Parallelism (ML), Issue Latency
(IL), Issue Parallelism (IP)
ILP
Superpipelined: cycle = 1/m of baseline, IP = 1 instr./minor cycle, OL=1
Maj.c. = m minor c, IL=1mc, MP=m x k, IPC(max)= m instr./ M.c.
Superscalar: IP=n instr./cycle, OP=1cycle,IPC(max)= n instr./cycle
VLIW: IP=n instr./cycle, OP=1cycle,IPC(max)= n VLIW/cycle
Superpipelined-Superscalar: IP = n inst../m.c., OP= m m.c., IPC(max)= n x m
Simulation
Functional VS Performance/Timing
F: Visible Arch. State,
,
(reg, mem), output
Software dev &/| Emulation P/T: Microarch details,
,
Trace VS Execution-driven
T: ,
traces as inputs,
proprietary apps & input sets.
E-D: simulator ,
app&arch state
User-Code VS Full System
1
1
2
Control Depedencies
Stalls
Branch Delay Slots
Predicated execution
Fine-grained Multithreading
Multipath execution
Prediction
2
(2,2)
64 entries
4 l.o. PC
2 bits gl.h.
2 X
Branch-Target
Buffer
3
OOO Exec Problems
Scoreboarding (1963 CDC6600)
Tomasulos Algor. (renaming + RS)
ISSUEEXECUTEWRITE RESULT VS Explicit Renaming:
3
4
Precise Interupts/Exceptions
Precise:
, &,
.
; -: TLB faults, Underflow,
,
4
ReOrderBuffer (ROB)
FIFO Commit InOrder !
br. mispred. flush
+: dependency checking,
branch prediction, bubbles,
throughput, utilization, latency
tolerance
-: Hardware,
single-thread-performance,
SISD SIMD MISD MIMD
MIMD: Process ( ) Thread
Cache Coherence
- HW: Cache invalidation
- SW: Flushing Cache Lines
- Non-Cacheable DMA
: Coherent ,
, , :
thread
.
: 1. 2. Write propagation 3.Write serialization
-
:
Invalidation VS Update Protocol
cache block. block,
;.
NAI: - Invalidation: Read miss transactions - Update: Read hit transaction : - Invalidation: bus / .
- Update:
4C Cache Misses Model
Compulsory: block size Capacity: block cache size Conflict: block mapped set associativity Coherence: Communiction, True Sharing: 1 Data word 2+ / False Sharing: DW c.b
Cache Coherence VS Memory Consistency
Coherence:
( cache line)
Consistency:
,
Memory Consistency Models
Sequential Consistency:
PROGRAM ORDER
ATOMICITY
Relaxed Memory Models:
Total Store Order (TSO), Processor Consistency (PC)
Partial Store Order (PSO)
Weak Ordering (WO), Release Consistency (RC), Relaxed Memory Order (RMO)
Use of: SYNCH
(Acquiring Locks) ISA
Spin-lock (Test-and-Set):
Test-and-Test-and-Set:
lock
.