ee382a advanced processor architectureacs.pub.ro/~cpop/smpa/l01-intro 382a.pdf · • architectures...

40
EE382A – Spring 2009 Christos Kozyrakis Lecture 1 - 1 Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a EE382A Advanced Processor Architecture Christos Kozyrakis & John Shen

Upload: others

Post on 05-Nov-2019

9 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE382A – Spring 2009 Christos Kozyrakis Lecture 1 - 1

Department of Electrical Engineering

Stanford University

http://eeclass.stanford.edu/ee382a

EE382A

Advanced Processor Architecture

Christos Kozyrakis & John Shen

Page 2: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 2

A Few Words About Christos

• Associate professor of EE & CS

– Ph.D. from U.C. Berkeley

– B.Sc. from University of Crete

• Current research

– Parallel systems (scheduling, TM)

– Energy efficient data-centers

– Security systems

– More info at http://csl.stanford.edu/~christos

• Systems I have worked on

– Networking chips: ATLAS & Telegraphos switches

– Processor chips: VIRAM media-processor

• 125 million transistors, 9.6 billion ops/sec

– FPGA prototypes: Raksha & Atlas

– Server prototypes: CoolSort

VIRAM media-processor

IRAM test chip

Telegraphos DSM

switch ATLAS ATM Switch

Raksha Security

System ATLAS TM System

Page 3: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 3

A Few Words About John

• Head of Nokia Research Center in Palo Alto

– Ph.D. from USC

– B.Sc. from University of Michigan

• Prior to Nokia

– Director of the Microarchitecture Research Lab (MRL) at Intel

• Superscalar architecture, speculative multithreading and memory prefetching,

3D die-stacking technology, and heterogeneous multi-sequencer architectures

– Professor of Computer Engineering at CMU

• Author of the main textbook for EE382a

Page 4: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 4

EE382a Team

• Instructors: Christos Kozyrakis & John Shen

• Teaching assistant: David Signiorelli

• Guest lectures: Ben Lee + one more

• Administrative support: Teresa Lynn

• Contact info & office hours: up-to-date info on class webpage

– http://eeclass.stanford.edu/ee382a

– Check frequently

Page 5: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 5

You…

• Class participation is EXTREMELY important in EE382a

• Your goals

– Ask questions

– Offer answers

– Suggest discussion topics

– Make us learn your name

• Will take and post photos of everyone next week

Page 6: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 6

Class Basics

• Lectures: Mo & We, 11am-12.15pm, Hewlett 101

– There will also be some discussion sessions on Fridays

• Friday 2-3pm, Gates Hall 498

• Discussion sessions will be explicitly announced

– The class is not available on SCPD this quarter

• Web page: http://eeclass.stanford.edu/ee382a

– Announcements, handouts, office hours, latest schedule, bulletin board

– Check frequently

– Signup with webpage for on-line access to grades

• We will let you know when registration is open…

Page 7: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 7

The Bulletin Board

• The preferred way to ask class-related questions

– We promise to check & answer often, especially close to deadlines

– We encourage you to contribute to answers & have on-line discussions on

class material

• The bulletin rules

– Before posting a new question

• Check if question has already been asked or even answered

– Use the search capabilities of your web browser

• Check the FAQ page for the assignment

– Choose an appropriate subject for your question

• E.g. “HW2, problem 3, definition of memory latency”

• For questions not appropriate for the public: send us an email

Page 8: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 8

EE382a Topics

• Pipelining overview and analysis

• Architectures for instruction level parallelism

– Supersalar: instruction fetch, branch prediction, dynamic scheduling &

register renaming, memory disambiguation

– VLIW and dynamic binary translation

• Architecture for task and data level paralellism

– Multithreading, multi-core architectures, vector processing, GPUs, tradeoffs

in designing multi-core chips, memory hierarchy for multi-core

• Cross-cutting issues

– Checkpointed processors, phase-change memory, …

Page 9: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 9

Textbooks and Papers

• Textbooks

– Required: "Modern Processor Design: Fundamentals of Superscalar

Processors", J.P. Shen and M. Lipasti, 1st edition, McGraw-Hill

• Do not use/buy the beta edition!

– Reference: “Computer Architecture: A Quantitative Approach”, J. Hennessy

& D. Patterson, 4th edition, Morgan Kaufmann

– Reference: “Computer Organization and Design: The Hardware/Software

Interface”, D. Patterson & J. Hennessy, 4th edition, Morgan Kaufmann

• Papers (check handouts link on the webpage)

– A few required papers

• These papers are included in the exam materials

• Have to submit a 1-page paper summary by the next lecture

– Several optional papers

• Further in-depth information, references for projects, …

Page 10: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 10

Assignments, Exams, and Class Load

• Single exam and 1+2 homework assignments

• Large research project

– On an open question in computer architecture

– Work in groups of up to 3 students

– See topic suggestions on-line or suggest your own project

– Milestones: proposal, halfway review/status, presentation, paper…

• Grade breakdown (tentative)

– Exam 40%, Project 40%, HW + summaries + participation 20%

– All deadlines are final, no extensions, no exceptions

– Remember the honor code (more info on web page)

• Warnings

– This will be a loaded class!!

– This class will be as good as your participation…

Page 11: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 11

Prerequisites and Registration

• Prerequisites: EE108B or equivalent

– Expected to know: simple pipelines, basic caching, virtual memory, main memory

• EE282 is not a required prerequisite

• Class registration:

– Limited to 30 students; all students must receive instructor’s approval

• Homework 1: prerequisite assessment

– Due on in-class on Monday

– Work on it on your own

– Will send you email about your registration by Wednesday

Page 12: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 12

Should I Take EE382A?

• Good reason to take EE382A

– Prepare for research in computer architecture

– Broaden your Ph.D. research perspective

– Become a digital systems architect in industry

– Honest curiosity (how do Intel/AMD/… processors work?)

– Want to take a class with a research project

• Not a good reason to take EE382A

– Prepare for quals, comps, etc…

– Need another course for your degree program

• “EE382A is supposed to be an easy A, right?”

– Learn about digital circuits and CAD tools

Page 13: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 13

On Reading & Summarizing Papers

• Look for the following

– The issue or problem addressed by the paper

– The original contributions (real or claimed, you have to check)

– Critique: what are the major strengths and weaknesses of the papers?

• Look at the claims and assumptions, the methodology, the analysis of data, and the presentation style

– Future work: what are the natural extensions or improvements to this work?

• Or, can we apply a similar methodology to other problems of interest

• Do not submit the paper abstract as your summary :)

• Helpful tips

– Read the abstract, introduction, and conclusions sections first.

– Read the rest of the paper twice

• First a quick pass to get rough idea of details, then a detailed reading

– Underline/highlight the important parts of the paper

– Keep notes on the paper margins about comments or questions

• Important insights, questionable claims, relevance to other topics, ways to improve some technique etc.

– Look up references that seem to be important or missing

• In some cases, you may also want to check who and how references this paper

Page 14: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE382A – Spring 2009 Christos Kozyrakis Lecture 1 - 14

Department of Electrical Engineering

Stanford University

http://eeclass.stanford.edu/ee382a

EE382A Lecture 1:

Introduction to Advanced Processor Architecture

Page 15: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 15

Historical Perspectives on Processors

• The Decade of the 1970’s: “Birth of Microprocessors”

– Programmable Controller

– Single-Chip Microprocessors

– Personal Computers (PC)

• The Decade of the 1980’s: “Quantitative Architecture”

– Instruction Pipelining

– Fast Cache Memories

– Compiler Considerations

– Workstations

• The Decade of the 1990’s: “Instruction-Level Parallelism”

– Superscalar,Speculative Microarchitectures

– Aggressive Compiler Optimizations

– Low-Cost Desktop Supercomputing

Page 16: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 16

Performance Growth

• Doubling every 18 months (1982-2000):

– total of 3,200X

– Cars travel at 176,000 MPH; get 64,000 miles/gal.

– Air travel: L.A. to N.Y. in 5.5 seconds (MACH 3200)

– Wheat yield: 320,000 bushels per acre

• Doubling every 24 months (1971-2001):

– total of 36,000X

– Cars travel at 2,400,000 MPH; get 600,000 miles/gal.

– Air travel: L.A. to N.Y. in 0.5 seconds (MACH 36,000)

– Wheat yield: 3,600,000 bushels per acre

Unmatched by any other industry!!

[John Crawford, Intel, 1993]

Page 17: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 17

Convergence of Key Enabling Technologies

• CMOS VLSI:

– Submicron feature sizes: 0.3u 0.25u 0.18u 0.13u 90n 65n 45nm…

– Metal layers: 3 4 5 6 7 (copper) 12 …

– Power supply voltage: 5V 3.3V 2.4V 1.8V 1.3V 1.1V …

• CAD Tools:

– Interconnect simulation and critical path analysis

– Clock signal propagation analysis

– Process simulation and yield analysis/learning

• Architecture & Microarchitecture:

– Superpipelined and superscalar machines

– Speculative and dynamic microarchitectures

– Simulation tools and emulation systems

• Compilers: – Extraction of instruction-level parallelism

– Aggressive and speculative code scheduling

– Object code translation and optimization

Page 18: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 18

Evolution of Single-Chip Processors

1970’s 1980’s 1990’s 2010

Transistor Count 10K-100K 100K-1M 1M-100M 0.5-1B

Clock Frequency 0.2-2MHz 2-20MHz 20M-1GHz 1-5GHz

Instruction/Cycle < 0.1 0.1-0.9 0.9- 2.0 10

MIPS or MFLOPS < 0.2 0.2-20 20-2,000 100,000

Watt < 2 <10 <40 1-100+ (?)

CPUs/chip` 1 1 1 4-10

Page 19: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 19

Aspects of Computer Architecture

• ARCHITECTURE (instruction set architecture)

– programmer/compiler view - “Functional appearance to its immediate user/

system programmer”

• IMPLEMENTATION (microarchitecture)

– processor designer view - “Logical structure or organization that

implements the instruction set”

• DESIGN (chip realization)

– chip/system designer view - “Physical structure that embodies the

implementation”

Page 20: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 20

Our Objective for this Quarter

• The “What’s-How’s-Why’s” of Processor Design

1. Knowledge (“what’s”)

- Technology

- Techniques

2. Design Skills (“how’s”)

- Critical Issues

- Trade-off Intuitions

3. Understanding (“why’s”)

- Deeper Insights

- Fundamental Principles

Page 21: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 21

Basic Tools and Principles for Architects

Page 22: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 22

f

Amdahl’s Law

• Speedup= timewithout enhancement / timewith enhancement

• Suppose an enhancement speeds up a fraction f of a task by a

factor of S

timenew = timeold·( (1-f) + f/S )

Soverall = 1 / ( (1-f) + f/S )

(1 - f)

timeold

(1 - f)

timenew

f/S

Page 23: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 23

Amdahl’s Law (continued)

• Real life analogy: After driving through 60 minutes of traffic jam, how

much time can you make up by speeding in the final mile?

• Applications in Computer Architecture

– RISC - Reduced Instruction Set Computer

– Optimized to execute frequently used instructions quickly

– Infrequently used instructions take longer, or even emulated with SW

We should concentrate efforts on improving frequently occurring events or

frequently used mechanisms

Page 24: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 24

Pipelining

• Latency : Elapsed time from start to completion of a particular task

• Throughput : How many tasks can be completed per unit of time

• A pipeline is like an assembly line!

• Pipelining only improves throughput

– Latency: each job still takes 5 cycles to complete

– Throughput: 1 job per cycle if pipelined vs. 1 job per 5 cycles if not pipelined

stage1 stage2 stage3 stage4 stage5

start finish

Page 25: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 25

Pipelining (continued)

• Real life analogy: Henry Ford’s automobile assembly line.

• Example in computer architecture:

– 5-stage Instruction Execution Pipeline

– Fetch-Decode-Execute-Memory-Writeback

time

Stages t0 t1 t2 t3 t4 t5 t6 t7 . . . .

Fetch I1 I2 I3 I4 I5

Decode I1 I2 I3 I4 I5

Execute I1 I2 I3 I4 I5

Memory I1 I2 I3 I4 I5

Writeback I1 I2 I3 I4 I5

Page 26: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 26

Parallel Processing

• Parallelism - the amount of independent sub-tasks available

• If sub-tasks are independent, the order that they are carried out does

not matter

• Thus by executing the independent subtasks concurrently, we can

finish the entire task faster

Improve Speedup!!!

Page 27: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 27

Parallel Processing

• Real life analogy: collaboration on problem sets

(although not always encouraged)

• Examples in computer architecture:

– Parallel computers

– Superscalar processors

– Multi-core processors

Page 28: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 28

Our-of-order Execution

• Specification (or Program) Order vs Dataflow Order

• Dataflow: Data-driven scheduling of events

– The start of an event should be enabled by the availability of its required

input (data dependency)

– The completion of an event will produce an output that will enable the start

of other events

x = a + b; y = b * 2 z = (x-y) * (x+y)

+

+-

*

*2

a b

xy

Page 29: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 29

Our-of-order Execution

• Real life analogy:

– A tip on taking tests: work on the questions you know first

• Examples in computer architecture

– Most modern microprocessors (Intel P4, Opteron etc) all schedule

instruction execution in dataflow order

Page 30: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 30

Work and Critical Path

• Work

T1 - time to complete a computation on a

sequential system

• Critical Path

T - time to complete the same computation

on an infinitely-parallel system

• Average Parallelism

Pavg = T1 / T

• For a p wide system

Tp max{ T1/p, T }

Pavg>>p Tp T1/p

+

+-

*

*2

a b

xy

x = a + b; y = b * 2 z =(x-y) * (x+y)

Page 31: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 31

Work and Critical Path

• Real life analogy: undergraduate degree requirements

– Work = unit requirement

– Critical Path to graduation is determined by course sequences and their

prerequisites

• Added constraints: classes are only available on specific quarters…

• Applications to computer architecture

– Parallel job scheduling

– Given a collection of inter-dependent task:

• How much resources should be allocated?

• Which sequence of tasks should be given priority?

Page 32: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 32

Speculation

Is it possible to parallelize the critical path?

i.e. violate data dependence?

• Guess the outcome of an operation from its inputs without performing

the operation

• Even better, guess the outcome of an operation before the inputs to

the operation are even known

• Speculation techniques must also include mechanisms for

1. Checking if the guesses are correct

2. Undoing “speculative execution” after wrong guesses

Page 33: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 33

Speculation (continued)

• Real life analogy:

– Another tip on taking tests: You can often guess what is going to be on an

exam by looking at lectures and HWs.

• Examples in computer architecture

– Circuit-level speculations: Carry Select Adder

– Architectural-level speculations

• Branch target predictions

• Load value predictions

• Speculative loop execution

Page 34: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 34

Locality Principle

• One’s recent past is a very good indication of his near future

– Temporal Locality: If you just did something, it is very likely that you will do

the same thing again soon

– Spatial Locality: If you just did something, it is very likely you will do some

thing related or similar next

• Locality == Patterns == Predictability

– Converse:

• Anti-locality : If you haven’t done something for a very long time, it is very likely

you won’t do it in the near future either

Page 35: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 35

Locality Principle (continued)

• Real life analogy:

– spatial locality - where you choose to sit in a room

– temporal locality - will you be here again next week?

• Examples in computer architecture:

– Execution of program loops

• Spatial locality - after you execute an instruction, with very good probability, you

will execute the next instruction

• Temporal locality - you are very likely to repeat the same instructions many

times

Page 36: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 36

Memoization

• If something is expensive to compute, you might want to remember the

answer for a while, just in case you will need the same answer again

Why does memoization work??

• Real life analogy:

– Keeping a list of frequently used phone numbers by your telephone

• Examples in computer architecture

– ?

Page 37: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 37

Amortization

• Overhead cost : one-time cost to set something up

• Per-unit cost : cost for per unit of operation

total cost = overhead + per-unit cost x N

• It is often okay to have a high overhead cost if the cost can be

distributed over a large number of units

low the average cost

average cost = total cost / N

= ( overhead / N ) + per-unit cost

Page 38: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 38

Amortization (continued)

• Real life analogy: economy of scale

– Why is pasta sauce cheaper when bought by the gallon?

• Examples in computer architecture:

Cache Access Latency

Tmiss= 50 cycles

Thit = 1 cycle

If on the average a cache line is reused n times before being ejected

Tave = ( Tmiss+ (n-1)Thit ) / n Tmiss / n + Thit

n = 50 Tavg 2

n = 2 Tavg 25

Page 39: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 39

Basic Equations and Metrics

• Performance

– CPUtime = Instruction Count * CPI * Clock Cycle Tie

– AMAT = Hit Time + Miss Rate * Miss Penalty

– Amdahl’s law, amortization

• Cost

– Processor cost = f(die area4)

• Power Consumption

– Power = C*Vdd2*F + Vdd*Ishortcircuit*F + Vdd*Ileakage

– Energy = Power * Time

– E*D, E*D2, ED3, …

• Fault tolerance: MTTF, MTTR, …

• Design complexity: ?

Page 40: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,

EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 40

Ready to Learn More?