computer architecture 2009 – introduction 1 mamas – computer architecture 234267 lecturer: dr....
Post on 20-Dec-2015
219 Views
Preview:
TRANSCRIPT
Computer Architecture 2009 – Introduction1
MAMAS – Computer MAMAS – Computer ArchitectureArchitecture
234267234267Lecturer: Dr. Lihu Rappoport
Some of the slides were taken from Avi Mendelson, Randi Katz, Patterson, Gabriel Loh
Computer Architecture 2009 – Introduction2
General Course InformationGeneral Course Information Grade
20% Exercise (mandatory) תקף 80% Final exam No midterm exam
Textbooks Computer Architecture a Quantitative Approach:
Hennessy & Patterson
Other course information Course web site:
http://webcourse.cs.technion.ac.il/234267 Foils will be on the web several days before the
class
Computer Architecture 2009 – Introduction3
Lecturer detailsLecturer details Name: Lihu Rappoport Phone: 04-865-1554 Email: lihu.rappoport@intel.com
Computer Architecture 2009 – Introduction4
Class FocusClass Focus CPU
Introduction: performance, instruction set (RISC vs. CISC)
Pipeline, hazards Branch prediction Out-of-order execution
Memory Hierarchy Cache Main memory Virtual Memory
Advanced Topics PC Architecture
Motherboard & chipset, DRAM, I/O, Disk, peripherals
Computer Architecture 2009 – Introduction5
Computer System StructureComputer System Structure
CPU
PCI
North BridgeDDRII
Channel 1
mouse
LAN
LanAdap
External Graphics
Card
Mem BUSCPU BUS
Cache
SoundCard
speakers
South Bridge
PCI express ×16
IDEcontroller
IO Controller
DVDDrive
HardDisk
Pa
rall
el
Po
rt
Se
ria
l P
ort Floppy
Drivekeybrd
DDRIIChannel 2
USBcontroller
SATAcontroller
PCI express ×1
Memory controller
On-board Graphics
Computer Architecture 2009 – Introduction6
Architecture & Architecture & MicroarchitectureMicroarchitecture
ArchitectureThe processor features seen by the “user” Instruction set, addressing modes, data width, …
Micro-architectureThe way of implementation of a processor Caches size and structure, number of execution
units, … Timing is considered uArch (though it is user
visible)
Processors with different uArch can support the same Architecture
Computer Architecture 2009 – Introduction7
CompatibilityCompatibility Backward compatibility
New hardware can run existing software• Core2 Duo can run SW written for Pentium4,
PentiumM, Pentium III, Pentium II, Pentium, 486, 386, 268
Forward compatibility New software can run on existing hardware Example: new software written with SSE2TM runs on
older processor which does not support SSE2TM Commonly supports one or two generations behind
Architecture independent SW JIT – just in time compiler: Java and .NET Binary translation
Computer Architecture 2009 – Introduction8
PerformancePerformance
Computer Architecture 2009 – Introduction9
Technology Trends and Technology Trends and PerformancePerformance
Computing capacity: 4× per 3 years If we could keep all the transistors busy all the time Actual: 3.3× per 3 years
Moore’s Law: Performance is doubled every ~18 months Trend is slowing: process scaling declines, power is up
Speed
1
10
100
1000
Logic
DRAM
Capacity
1
10
100
1000
10000
100000
1000000
Logic
DRAM
2× in 3 years
1.1× in 3 years
CPU speed and Memory speed grow apart
2× in 3 years
4× in 3 years
Computer Architecture 2009 – Introduction10
Moore’s LawMoore’s Law
Graph taken from: http://www.intel.com/technology/mooreslaw/index.htm
Computer Architecture 2009 – Introduction11
CPI – Cycles Per InstructionCPI – Cycles Per Instruction CPUs work according to a clock signal
Clock cycle is measured in nsec (10-9 of a second) Clock frequency (= 1/clock cycle) measured in GHz
(109cyc/sec)
Instruction Count (IC) Total number of instructions executed in the program
CPI – Cycles Per Instruction Average #cycles per Instruction (in a given program)
IPC (= 1/CPI) : Instructions per cycles
CPI =#cycles required to execute the program
IC
Computer Architecture 2009 – Introduction12
CPU TimeCPU Time CPU Time - time required to execute a
program
CPU Time = IC CPI clock cycle
Our goal: minimize CPU Time Minimize clock cycle: more GHz (process, circuit,
uArch)
Minimize CPI: uArch (e.g.: more
execution units)
Minimize IC: architecture (e.g.: SSETM)
Computer Architecture 2009 – Introduction13
Speedupoverall =ExTimeold
ExTimenew
=1
Speedupenhanced
Fractionenhanced(1 - Fractionenhanced) +
Suppose enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then:
Amdahl’s LawAmdahl’s Law
ExTimenew = ExTimeold ×Speedupenhanced
Fractionenhanced(1 – Fraction enhanced) +
Computer Architecture 2009 – Introduction14
• Floating point instructions improved to run at 2×, but only 10% of executed instructions are FP
Speedupoverall =1
0.95= 1.053
ExTimenew = ExTimeold × (0.9 + 0.1 / 2) = 0.95 × ExTimeold
Corollary:
Make The Common Case Fast
Amdahl’s Law: ExampleAmdahl’s Law: Example
Computer Architecture 2009 – Introduction15
Calculating the CPI of a Calculating the CPI of a ProgramProgram
ICi: #times instruction of type i is executed in the program
IC: #instruction executed in the program:
Fi: relative frequency of instruction of type i : Fi = ICi/IC
CPIi – #cycles to execute instruction of type i e.g.: CPIadd = 1, CPImul = 3
#cycles required to execute the program:
CPI: CPI
cyc
IC
CPI IC
ICCPI
IC
ICCPI F
i ii
n
ii
i
n
i ii
n
# 1
1 1
# *cyc CPI IC CPI ICi ii
n
1
IC ICii
n
1
Computer Architecture 2009 – Introduction16
-2%
0%
2%
4%
6%
Evaluating PerformanceEvaluating Performance Use a performance simulator to evaluate
the performance of a new feature / algorithm Models the uarch to a great detail Run 100’s of representative applications
Produce the performance s-curve Sort the applications according to the IPC increase Baseline (0) is the processor without the new
feature
-4%
-3%
-2%
-1%
0%
1%
2%
3%
Negativeoutliers
Positiveoutliers
Bad S-curve
Small negativeoutliers
Positiveoutliers
Good S-curve
Computer Architecture 2009 – Introduction17
Comparing PerformanceComparing Performance Peak Performance
MIPS, MFLOPS Often not useful: unachievable / unsustainable in
practice Benchmarks
Real applications, or representative parts of real apps Targeted at the specific system usages
SPEC INT – integer applications Data compression, C complier, Perl interpreter,
database system, chess-playing, Text-processing, … SPEC FP – floating point applications
Mostly important scientific applications TPC Benchmarks
Measure transaction-processing throughput
Computer Architecture 2009 – Introduction18
The ISA is what the user / compiler see
The HW implements the ISA
instruction set
software
hardware
Instruction Set DesignInstruction Set Design
Computer Architecture 2009 – Introduction19
ISA ConsiderationsISA Considerations Code size
Long instructions take more time to fetch Longer instructions require a larger memory
• Important in small devices, e.g., cell phones
Number of instructions (IC) Reducing IC reduce execution time
• At a given CPI and frequency
Code “simplicity” Simple HW implementation
• Higher frequency and lower power Code optimization can better be applied to “simple
code”
Computer Architecture 2009 – Introduction20
Architectural Consideration Architectural Consideration ExampleExample
Immediate data size
1% of data values > 16-bits 12 – 16 bits of needed
0%
10%
20%
30%
0 1 2 3 4 5 6 7 8 9
10
11 12
13
14
15
Immediate data bits
Int. Avg.
FP Avg.
Computer Architecture 2009 – Introduction21
CISC ProcessorsCISC Processors CISC - Complex Instruction Set Computer
The idea: a high level machine language Example: x86
Characteristic Many instruction types, with a many addressing
modes Some of the instructions are complex
• Execute complex tasks• Require many cycles
ALU operations directly on memory• Only a few registers, in many cases not orthogonal
Variable length instructions• common instructions get short codes save code
length
Computer Architecture 2009 – Introduction22
Rank instruction % of total executed
1 load 22%
2 conditional branch 20%
3 compare 16%
4 store 12%
5 add 8%
6 and 6%
7 sub 5%
8 move register-register 4%
9 call 1%
10 return 1%
Total 96%
Simple instructions dominate instruction frequency
Top 10 x86 InstructionsTop 10 x86 Instructions
Computer Architecture 2009 – Introduction23
CISC DrawbacksCISC Drawbacks Complex instructions and complex addressing modes
complicates the processor slows down the simple, common instructions contradicts Make The Common Case Fast
Compilers don’t use complex instructions / indexing
methods
Variable length instructions are real pain in the neck Difficult to decode few instructions in parallel
• As long as instruction is not decoded, its length is unknown It is unknown where the instruction ends It is unknown where the next instruction starts
An instruction may be over more than a single cache line An instruction may be over more than a single page
Computer Architecture 2009 – Introduction24
RISC ProcessorsRISC Processors RISC - Reduced Instruction Set Computer
The idea: simple instructions enable fast hardware Characteristic
A small instruction set, with only a few instructions formats
Simple instructions• execute simple tasks• Most of them require a single cycle (with pipeline)
A few indexing methods ALU operations on registers only
• Memory is accessed using Load and Store instructions only
• Many orthogonal registers • Three address machine: Add dst, src1, src2
Fixed length instructions
Examples: MIPSTM, SparcTM, AlphaTM, PowerTM
Computer Architecture 2009 – Introduction25
RISC Processors (Cont.)RISC Processors (Cont.) Simple architecture Simple micro-
architecture Simple, small and fast control logic Simpler to design and validate Room for large on die caches Shorten time-to-market
Using a smart compiler Better pipeline usage Better register allocation
Existing RISC processor are not “pure” RISC e.g., support division which takes many cycles
Computer Architecture 2009 – Introduction26
Compilers and ISACompilers and ISA Ease of compilation
Orthogonality: • no special registers• few special cases • all operand modes available with any data type or
instruction type Regularity:
• no overloading for the meanings of instruction fields streamlined
• resource needs easily determined
Register Assignment is critical too Easier if lots of registers
Computer Architecture 2009 – Introduction27
CISC Is DominantCISC Is Dominant The x86 architecture, which is a CISC
architecture, dominates the processor market A vast amount of existing software Intel, AMD, Microsoft and others benefit from this
• Intel and AMD put a lot of money to make high performance x86 processors, despite the architectural disadvantage
• Current x86 processor give the best cost/performance CISC processors use arch ideas from the RISC world Starting at Pentium II and K6, x86 processors
translate CISC instructions into RISC-like operations internally
• the inside core looks much like that of a RISC processor
Computer Architecture 2009 – Introduction28
Software Specific ExtensionsSoftware Specific Extensions Extend arch to accelerate exec of specific
apps
Example: SSETM – Streaming SIMD Extensions 128-bit packed (vector) / scalar single precision FP
(4×32) Introduced on Pentium® III on ’99 8 new 128 bit registers (XMM0 – XMM7) Accelerates graphics, video, scientific calculations,
…
Packed: Scalar:
x0x1x2x3
y0y1y2y3
x0+y0x1+y1x2+y2x3+y3
+
128-bits
x0x1x2x3
y0y1y2y3
x0+y0y1y2y3
+
128-bits
top related