lecture 1: course introduction, technology trends, performance professor alvin r. lebeck computer...
TRANSCRIPT
Lecture 1: Course Introduction, Technology Trends, Performance
Professor Alvin R. Lebeck
Computer Science 220
Fall 2001
CPS 220 2© Alvin R. Lebeck 2001
Administrative
• Office HoursOffice: D304 LSRC
Hours: Mon 10:00-11:00 Thurs 2:00-3:00 or by appointment (email)
email: [email protected]
Phone: 660-6551
• Teaching AssistantFareed Zaffar
Office: D125 LSRC
Hours: Tuesday 10:00-11:00, Wednesday 1:00-2:00
email: [email protected]
Phone: 660-6576
CPS 220 3© Alvin R. Lebeck 2001
Administrative (Grading)
• 30% Homeworks– 6 Homeworks
– 5 points per day late, for first 10 days
– Always do the homework (better late than never)
• 30% Examinations (Midterm + Final)
• 30% Research Project (work in pairs)
• 10% Class Participation
• This course requires hard work.
4© Alvin R. Lebeck 2001
Administrative (Continued)
• Midterm Exam: In class (75 min) Closed book
• Final Exam: (3 hours) closed book
• This is a “Quals” Course.– Quals pass based on Midterm and Final exams only
5© Alvin R. Lebeck 2001
Administrative (Continued)
• Course Web Page – http://www.cs.duke.edu/courses/fall01/cps220
– Lectures posted there after class (pdf)
– Homework posted there
• Course News Group – duke.cs.cps220
– Use it to 1) read announcements/comments on class or homework, 2) ask questions (help), 3) communicate with each other
• Need Duke CS account– Duke ID, ACPUB account name (see HW #0)
6© Alvin R. Lebeck 2001
SPIDER: Systems Seminar
• Systems & Architecture Seminar– Wednesdays 3:45-5:00 in D344
– duke.cs.os-research (spider newsgroup)
• Presentations on current work– Practice talks for conferences
– Discussion on recent papers
– Your own research
• Why you should go?– If you want to work in Systems/Architecture…
– Good time to practice public speaking in front of friendly crowd
– Learn about current topics
7© Alvin R. Lebeck 2001
Assignment
• Homework #0 (Background, due Thursday)
• Read Chapters 1 & 2
CPS 220 8© Alvin R. Lebeck 2001
CPS 220 Course Focus
Understanding the design techniques, machine structures, technology factors, evaluation methods that will determine the form of computers in 21st Century
Technology ProgrammingLanguages
OperatingSystems History
ApplicationsInterface Design
(ISA)
Measurement & Evaluation
Parallelism
Computer Architecture:• Instruction Set Design• Organization• Hardware
Power
9© Alvin R. Lebeck 2001
Related Courses
Prerequisites
• CPS 104: Basic Machine Organization
• CPS 110: Basic Operating System Functions
• This course: focus on why, analysis, evaluation– Cost/performance
– Power budget
Follow on Courses
• CPS 221: Advanced Computer Architecture II– Parallel computer architecture
CPS 220 10© Alvin R. Lebeck 2001
Computer Architecture Is …
the attributes of a [computing] system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls, the logic design, and the physical implementation.
Amdahl, Blaaw, and Brooks, 1964
SOFTWARESOFTWARE
CPS 220 11© Alvin R. Lebeck 2001
Topic Coverage
Textbook: Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 2nd Ed., 1995.
• Fundamentals of Computer Architecture (Chapter 1)• Instruction Set Architecture (Chapter 2, Appendix C&D)• Pipelining (Chapter 3)• Advanced Pipelining and ILP (Chapter 4)• Memory Hierarchy (Chapter 5)• Input/Output and Storage (Chapter 6)• Networks and Interconnection Technology (Chapter 7)• Multiprocessors (Chapter 8)• Vectors (Apendix)• New Architectures/trends (papers)• Power (papers)
CPS 220 12© Alvin R. Lebeck 2001
Computer Architecture Topics
Instruction Set Architecture
Pipelining, Hazard Resolution,Superscalar, Reordering, Prediction, Speculation
Addressing,Protection,Exception Handling
L1 Cache
L2 Cache
DRAM
Disks, WORM, Tape
Coherence,Bandwidth,Latency
Emerging TechnologiesInterleavingBus protocols
RAID
VLSI
Input/Output and Storage
MemoryHierarchy
Pipelining and InstructionLevel Parallelism
CPS 220 13© Alvin R. Lebeck 2001
Computer Architecture Topics (CPS 221)
M
Interconnection NetworkS
PMPMPMP° ° °
Topologies,Routing,Bandwidth,Latency,Reliability
Network Interfaces
Shared Memory,Message Passing,Data Parallel
Processor-Memory-Switch
MultiprocessorsNetworks and Interconnections
14© Alvin R. Lebeck 2001
Computer Engineering Methodology
TechnologyTrends
15© Alvin R. Lebeck 2001
Computer Engineering Methodology
TechnologyTrends
Evaluate ExistingEvaluate ExistingSystems for Systems for BottlenecksBottlenecks
Benchmarks
16© Alvin R. Lebeck 2001
Computer Engineering Methodology
TechnologyTrends
Evaluate ExistingEvaluate ExistingSystems for Systems for BottlenecksBottlenecks
Benchmarks
Simulate NewSimulate NewDesigns andDesigns and
OrganizationsOrganizationsWorkloads
17© Alvin R. Lebeck 2001
TechnologyTrends
Evaluate ExistingEvaluate ExistingSystems for Systems for BottlenecksBottlenecks
Benchmarks
Simulate NewSimulate NewDesigns andDesigns and
OrganizationsOrganizationsWorkloads
Computer Engineering Methodology
Implement NextImplement NextGeneration SystemGeneration System
ImplementationComplexity
CPS 220 18© Alvin R. Lebeck 2001
• Application Area– Special Purpose (e.g., DSP) / General Purpose
– Scientific (FP intensive) / Commercial (Mainframe)
– Portable (Power matters)
• Level of Software Compatibility– Object Code/Binary Compatible (cost HW vs. SW; IBM S/360)
– Assembly Language (dream to be different from binary)
– Programming Language; Why not?
Context for Designing New Architectures
CPS 220 19© Alvin R. Lebeck 2001
• OS Requirements for General Purpose Apps– Size of Address Space
– Memory Management/Protection
– Context Switch
– Interrupts and Traps
– Communication
• Standards: Innovation vs. Competition– IEEE 754 Floating Point
– I/O Bus
– Networks
– Operating Systems / Programming Languages ...
Context for Designing New Architectures
20© Alvin R. Lebeck 2001
Technology Trends: Microprocessor Capacity
100
1000
10000
100000
1000000
10000000
100000000
1000000000
IntelDigital
CMOS improvements:• Die size: 2X every 3 yrs• Line width: halve / 7 yrs
“Graduation Window”
Pentium Pro: 5.5 millionSparc Ultra: 5.2 million PowerPC 620: 6.9 millionAlpha 21164: 9.3 millionAlpha 21264: 15 millionPentium III: 28 millionPentium 4: 42 millionAlpha 21364: 100 millionAlpha 21464: 250 million
21© Alvin R. Lebeck 2001
DRAM Capacity (single chip)
10
100
1000
10000
100000
1000000
10000000
1980 1983 1986 1989 1992 1996 1998 2002
size
year size cyc time
1980 64 Kb 250 ns
1983 256 Kb 220 ns
1986 1 Mb 190 ns
1989 4 Mb 165 ns
1992 16 Mb 145 ns
1996 64Mb 104 ns
1998 256Mb
2002 1Gb
CPS 220 22© Alvin R. Lebeck 2001
Technology Trends (Summary)
Capacity Speed
Logic 2x in 3 years 2x in 3 years
DRAM 4x in 3 years 1.4x in 10 years
Disk 2x in 3 years 1.4x in 10 years
CPS 220 23© Alvin R. Lebeck 2001
Processor Performance
Sun-4/260MIPS M/120
MIPS M2000IBM RS6000/540
HP 9000/750
DEC AXP 3000
0
50
100
150
200
250
300
1987 1988 1989 1990 1991 1992 1993 1994 1995
Year
Performance
IBM Power 2/590
1.54X/yr
1.35X/yr
DEC 21064a
Sun UltraSparc
24© Alvin R. Lebeck 2001
Alpha SPECint and SPECfp
0
100
200
300
400
500
600
700
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Per
form
ance
(S
pec
mar
k)
Integer Floating Point 1.54x/yr
25© Alvin R. Lebeck 2001
Chip Area Reachable in One Clock Cycle
0
0.2
0.4
0.6
0.8
1
1.2
250 180 130 100 70 50 35
f16
f8
fSIA
Fra
ctio
n of
Chi
p R
each
ed
Nanometers
26© Alvin R. Lebeck 2001
Power Density
1
10
100
1000
1.5 1 0.8 0.6 0.35 0.25 0.18 0.13 0.1
Processor
Hot Plate
Laser diode
Pow
er D
ensi
ty W
/cm
^2
Microns
27© Alvin R. Lebeck 2001
Processor Perspective
• Putting performance growth in perspective:
Pentium-III Cray YMP
Personal Comp. Supercomputer
Year 1998 1988
MIPS > 400 MIPS < 50 MIPS
Linpack 140 MFLOPS 160 MFLOPS
Cost $3,000 $1M ($1.6M in 1994$)
Clock 400 MHz 167 MHz
Cache 512 KB 0.25 KB
Memory128 MB 256 MB
• 1988 supercomputer in 1998 personal computer!
CPS 220 28© Alvin R. Lebeck 2001
Measurement and Evaluation
Design
Analysis
Architecture is an iterative process:• Searching the space of possible designs• At all levels of computer systems
Bad IdeasGood IdeasGood Ideas
Creativity
Mediocre Ideas
Cost /PerformanceAnalysis
CPS 220 29© Alvin R. Lebeck 2001
Measurement Tools
• How do I evaluate an idea?
• Performance, Cost, Die Area, Power Estimation
• Benchmarks, Traces, Mixes
• Simulation (many levels)– ISA, RT, Gate, Circuit
• Queuing Theory
• Rules of Thumb
• Fundamental Laws
• Question: What is “better” Boeing 747 or Concorde?
CPS 220 30© Alvin R. Lebeck 2001
The Bottom Line: Performance (and Cost)
• Time to run the task (ExTime)– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns … (Performance)– Throughput, bandwidth
Plane
Boeing 747
BAD/Sud Concorde
Speed
610 mph
1350 mph
DC to Paris
6.5 hours
3 hours
Passengers
470
132
Throughput (pmph)
286,700
178,200
CPS 220 31© Alvin R. Lebeck 2001
The Bottom Line: Performance (and Cost)
"X is n times faster than Y" means
ExTime(Y) Performance(X)
--------- = ---------------ExTime(X) Performance(Y)
• Speed of Concorde vs. Boeing 747
• Throughput of Boeing 747 vs. Concorde
CPS 220 32© Alvin R. Lebeck 2001
Performance Terminology
“X is n% faster than Y” means:ExTime(Y) Performance(X) n
--------- = -------------- = 1 + -----
ExTime(X) Performance(Y) 100
n = 100(Performance(X) - Performance(Y))
Performance(Y)
Example: Y takes 15 seconds to complete a task, X takes 10 seconds. What % faster is X?
CPS 220 33© Alvin R. Lebeck 2001
Example
1510
= 1.51.0
= Performance (X)Performance (Y)
ExTime(Y)ExTime(X)
=
n = 100 (1.5 - 1.0) 1.0
n = 50%
CPS 220 34© Alvin R. Lebeck 2001
Amdahl's Law
Speedup due to enhancement E: ExTime w/o E Performance w/ E
Speedup(E) = ------------- = -------------------
ExTime w/ E Performance w/o E
Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then:
ExTime(E) =
Speedup(E) =
CPS 220 35© Alvin R. Lebeck 2001
Amdahl’s Law
ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced
Speedupoverall =ExTimeold
ExTimenew
Speedupenhanced
=
1
(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
CPS 220 36© Alvin R. Lebeck 2001
Amdahl’s Law
• Floating point instructions improved to run 2X; but only 10% of actual instruction execution time is FP
Speedupoverall =
ExTimenew =
CPS 220 37© Alvin R. Lebeck 2001
Amdahl’s Law
• Floating point instructions improved to run 2X; but only 10% of actual instruction execution time is FP
Speedupoverall = 1
0.95= 1.053
ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold
CPS 220 38© Alvin R. Lebeck 2001
Corollary: Make The Common Case Fast
• All instructions require an instruction fetch, only a fraction require a data fetch/store.
– Optimize instruction access over data access
• Programs exhibit localitySpatial Locality Temporal Locality
• Access to small memories is faster– Provide a storage hierarchy such that the most frequent
accesses are to the smallest (closest) memories.
Reg'sCache
Memory Disk / Tape
CPS 220 39© Alvin R. Lebeck 2001
Occam's Toothbrush
• The simple case is usually the most frequent and the easiest to optimize!
• Do simple, fast things in hardware and be sure the rest can be handled correctly in software
CPS 220 40© Alvin R. Lebeck 2001
Metrics of Performance
Compiler
Programming Language
Application
DatapathControl
Transistors Wires Pins
ISA
Function Units
(millions) of Instructions per second: MIPS(millions) of (FP) operations per second: MFLOP/s
Cycles per second (clock rate)
Megabytes per second
Answers per monthOperations per second
CPS 220 41© Alvin R. Lebeck 2001
Aspects of CPU Performance
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
Instr. Cnt CPI Clock RateProgram
Compiler
Instr. Set
Organization
Technology
CPS 220 43© Alvin R. Lebeck 2001
Marketing Metrics
• Machines with different instruction sets ?
• Programs with different instruction mixes ?
– Dynamic frequency of instructions
• Uncorrelated with performance
• Machine dependent
• Often not where time is spent
Normalized:
add,sub,compare,mult 1
divide, sqrt 4
exp, sin, . . . 8
Normalized:
add,sub,compare,mult 1
divide, sqrt 4
exp, sin, . . . 8
66 10CPI
Clock Rate10
Time
Count nInstructio MIPS
610Time
Operations FP MFLOPS
44© Alvin R. Lebeck 2001
Cycles Per Instruction
Count nInstructio
I Fe wherFCPI CPI
I CPI Time Cycle timeCPU
Count nInstructio
Cycles
Count nInstructio
Clock Rate timeCPU CPI
ii
n
1 iii
i
n
1 ii
Invest Resources where time is Spent!
“Average Cycles Per Instruction”
“Instruction Frequency”
CPS 220 45© Alvin R. Lebeck 2001
Organizational Trade-offs
Instruction Mix
Cycle Time
CPI
Compiler
Programming Language
Application
DatapathControl
Transistors Wires Pins
ISA
Function Units
CPS 220 46© Alvin R. Lebeck 2001
Example: Calculating CPI
Typical Mix
Base Machine (Reg / Reg)
Op Freq Cycles CPIi (% Time)
ALU 50% 1 .5 (33%)
Load 20% 2 .4 (27%)
Store 10% 2 .2 (13%)
Branch 20% 2 .4 (27%)
1.5
CPS 220 47© Alvin R. Lebeck 2001
Base Machine (Reg / Reg)
Op Freq Cycles
ALU 50% 1
Load 20% 2
Store 10% 2
Branch 20% 2
Example
Add register / memory operations to traditional RISC:– One source operand in memory– One source operand in register– Cycle count of 2
Branch cycle count to increase to 3.
What fraction of the loads must be eliminated for this to pay off?
CPS 220 48© Alvin R. Lebeck 2001
Next Time
• Benchmarks
• Performance Metrics
• Cost
• Instruction Set Architectures
TODO
• Read Chapters 1 & 2
• Do Homework #0