ces 524 computer architecture fall 2006 w 6 to 9 pm salazar hall 2008 b. ravikumar (ravi) office:...

96
CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Upload: neil-wood

Post on 20-Jan-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

CES 524 Computer Architecture

Fall 2006W 6 to 9 PM

Salazar Hall 2008

B. Ravikumar (Ravi)Office: Darwin Hall, 116 I

Office hours: TBA

Page 2: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Text-book

                                        

Computer Architecture: A quantitative approach

Hennessy and Patterson 3rd edition

Page 3: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Other references:

• Hennessy and Patterson, Computer Organization and Design, hardware-software interface. (undergraduate text by the same authors)

• Jordan and Alaghband, Fundamentals of Parallel Processing.

• Hwang, Advanced Computer Architecture. (excellent source on parallel and pipeline computing)

• I. Koren, Computer Arithmetic Algorithms.

• Survey articles, on-line lecture notes from several sources. • http://www.cs.utexas.edu/users/dburger/teaching/cs382m-f03/homework/papers.html• http://www.cs.wisc.edu/arch/www/

Page 4: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Class notes and lecture format

• power-point slides, some times pdf, postscript etc.• Tablet PC will be used to

• add comments• draw sketches• write code• make corrections etc.

• lecture notes prepared with generous help provided through the web sites of many instructors:

• Patterson, Berkeley• Susan Eggers, University of Washington• Mark Hill, Wisconsin• Breecher, Clark U• DeHon, Caltech, etc.

Page 5: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Background expected • Logic Design

• Combinational logic, sequential logic • logic design and minimization of circuits • finite state machines • Discrete Components-multiplexers, memory units, ALU

• Basic Machine structure • processor (Data path, control), memory, I/O

• Number representations, computer arithmetic

• Assembly language programming

But if you have not studied some of these topics or don’t remember, don’t worry! These topics will be reviewed.

Page 6: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Coursework and grading• Home Work Sets (about 5 sets) 25 points

• Most problems will be assigned from the text. Text has many problems marked * with solutions at the back. These will be helpful in solving the HW problems.• Additional problems at similar level• Some implementation exercises (mostly C programming)• Policy on collaboration

• Mid-semester Tests 25 points• One will be in-class, open book/notes, 75 minutes long• The other can be take-home or in-class. (discussed later)• dates will be announced (at least) one week in advance• all topics discussed until the previous lecture

• Project 15 points• Semester-long work. Each one chooses one problem to work on.• Design, implementation, testing etc.• Report and a presentation

• Final Exam 35 points (take-home?)

Page 7: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Project Examples

• hardware designe.g. circuit design for dedicated application such as image processing

• hardware testing

• cache-efficient algorithm design and implementation

• wireless sensor networks

• embedded system design (for those who are doing CES 520)

• Tablet PC hardware study

Page 8: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Overview of the course • Review of Computer Organization

• Digital logic design • Computer Arithmetic • machine/assembly language • Data and control path design• Cost performance trade-offs

• Instruction Set Design Principles • classification of instruction set architectures • addressing modes • encoding instructions • trade-offs/examples/historical overview

• Instruction level parallelism • overview of pipelining

Page 9: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

• superscalar execution• branch prediction• dynamic scheduling

• Cache Memory • cache performance modeling • main memory technology • virtual memory • cache-efficient algorithm design• external memory data structures, algorithms, applications

• Shared-memory multiprocessor system• symmetric multiprocessors • distributed shared-memory machines

• Storage systems • RAID architecture • I/O performance measures

Page 10: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

• Advanced Topics

• Computer Arithmetic • alternatives to floating-point•Design to minimize size, delay etc. • circuit complexity trade-offs (area, time, energy etc.)

• hardware testing • Fault-models, fault testing • model checking

• external memory computation problems• external memory algorithms • data structures, applications

• Cache-efficient algorithms• Nonstandard models

• quantum computing• bio-computing and neural computing

• Sensor networks

Page 11: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Overview of the course • Review of Computer Organization

• Digital logic design

. . .

.

.

.

Inputs

Outputs

(a) Programmable OR gates

w

x

y

z

(b) Logic equivalent of part a

w

x

y

z

(c) Programmable read-only memory (PROM)

De

cod

er

Page 12: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Overview of the course • Review of Computer Organization

• Digital logic design

AND array (AND plane)

OR array (OR

plane)

. . .

. . .

.

.

.

Inputs

Outputs

(a) General programmable combinational logic

(b) PAL: programmable AND array, fixed OR array

8-input ANDs

(c) PLA: programmable AND and OR arrays

6-input ANDs

4-input ORs

Page 13: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Overview of the course • Review of Computer Organization

• Digital logic design – sequential circuits

Dime Dime Quarter

Dime

Quarter

Dime Quarter

Dime Quarter

Reset Reset

Reset

Reset

Reset

Start Quarter

S 00

S 10

S 20

S 25

S 30

S 35

S 10 S 25 S 00

S 00

S 00

S 00

S 00

S 00

S 20 S 35

S 35 S 35

S 35 S 35

S 35 S 30

S 35 S 35

------- Input -------

Dim

e

Qua

rter

Res

et

Current state

S 00 S 35

is the initial state is the final state

Next state

Dime Quarter

S 00

S 10 S 20

S 25

S 30 S 35

Page 14: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Example of sequential circuit design

Page 15: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Overview of the course • Review of Computer Organization

• Computer Arithmetic

Page 16: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Overview of the course • Review of Computer Organization

• machine/assembly language

MIPS instruction set

Page 17: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Overview of the course • Review of Computer Organization

• Data and control path design

Page 18: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Overview of the course • Review of Computer Organization

• Cost performance trade-offs

A processor spends 30% of its time on flp addition, 25% on flp mult, and 10% on flp division. Evaluate the following enhancements, each costing the same to implement:

a. Redesign of the flp adder to make it twice as fast.b. Redesign of the flp multiplier to make it three times as fast.c. Redesign the flp divider to make it 10 times as fast.

Page 19: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Overview of the course • Instruction Set Design Principles

• classification of instruction set architectures

Page 20: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Overview of the course

• addressing modes

Page 21: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Overview of the course • Instruction level parallelism

• overview of pipelining

Page 22: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Overview of the course • overview of pipelining

Page 23: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Spring 2003 CSE P548 23

HazardsWhen the same resource is required by two successive instructions executing different cycles, a hazard occurs.

• read after write x = y + z w = x + tThe second instruction’s “read x” part of the cycle has to wait for the first instruction’s “write x” part of the cycle.

Page 24: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

• branch prediction

x = y + z;if (x > 0) y = y + 1;else z = z +1;

The instruction after x = y + z can only be determined after x has been computed.

x = 1;while (x < 100) do { sum = sum + x; x = x + 1; }

The instruction after x = x + 1 will be almost always sum = sum + x(99 out of 100 times), so if we know that the instruction is in a loop, we predict that the loop is executed and we will be correct most of the time.

Page 25: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

• Cache Memory • cache performance modeling • main memory technology • virtual memory • cache-efficient algorithm design• external memory data structures, algorithms, applications

Page 26: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

• branch prediction• dynamic scheduling

• Cache Memory • cache performance modeling • main memory technology • virtual memory • cache-efficient algorithm design• external memory data structures, algorithms, applications

• Shared-memory multiprocessor system• symmetric multiprocessors • distributed shared-memory machines

• Storage systems • RAID architecture • I/O performance measures

Page 27: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Spring 2003 CSE P548 27

Hitting the Memory Wall

Fig. 17.8 Memory density and capacity have grown along with the CPU power and complexity, but memory speed has not kept pace.

1990 1980 2000 2010 1

10

10

Re

lati

ve p

erf

orm

anc

e

Calendar year

Processor

Memory

3

6

Page 28: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

• branch prediction• dynamic scheduling

• Cache Memory • cache performance modeling • main memory technology • virtual memory • cache-efficient algorithm design• external memory data structures, algorithms, applications

• Shared-memory multiprocessor system• symmetric multiprocessors • distributed shared-memory machines

• Storage systems • RAID architecture • I/O performance measures

Page 29: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

• branch prediction• dynamic scheduling

• Cache Memory • cache performance modeling • main memory technology • virtual memory • cache-efficient algorithm design• external memory data structures, algorithms, applications

• Shared-memory multiprocessor system• symmetric multiprocessors • distributed shared-memory machines

• Storage systems • RAID architecture • I/O performance measures

Page 30: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

• Advanced Topics

• Computer Arithmetic • alternatives to floating-point•Design to minimize size, delay etc. • circuit complexity trade-offs (area, time, energy etc.)

• hardware testing • Fault-models, fault testing • model checking

• external memory computation problems• external memory algorithms • data structures, applications

• Cache-efficient algorithms• Nonstandard models

• quantum computing• bio-computing and neural computing

• Sensor networks

Page 31: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Focus areas

• instruction set architecture, review of computer organization

• instruction-level parallelism (chapters 3 and 4)

• computer arithmetic (Appendix)

• cache-performance analysis and modeling (chapter 5)

• multiprocessor architecture (chapter 7)

• interconnection networks (chapter 8)

• cache-efficient algorithm design

• code optimization and advanced compilation techniques

• FPGA and reconfigurable computing

Page 32: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

History of Computer Architecture

Abacus is probably more than 3000 years old.

The abacus is prepared for use by placing it flat on a table or one's lap and pushing all the beads on both the upper and lower decks away from the beam. The beads are manipulated with either the index finger or the thumb of one hand.

The abacus is still in use today by shopkeepers in Asia. The use of the abacus is still taught in Asian schools, and some schools in the West. Visually impaired children are taught to use the abacus where their sighted counterparts would be taught to use paper and pencil to perform calculations. One particular use for the abacus is teaching children simple mathematics and especially multiplication; the abacus is an excellent substitute for rote memorization of multiplication tables, a particularly detestable task for young children.

Page 33: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

History of Computer Architecture • Mechanical devices for controlling complex operations have been in existence since at least 1500's. (First ones were rotating pegged cylinders in musical boxes.)

The medieval development of camshafts has proven to be of immense technological significance. It allowed the budding medieval industry to transform the rotating movement of waterwheels and windmills into the movements that could be used for the hammering of ore, the sawing of wood and the manufacturing of paper.

Page 34: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

•Pascal developed a mechanical calculator to help in tax work.

• Pascal's calculator contains eight dials that connect to a drum, with a linkage that causes a dial to rotate one notch when a carry is produced from a dial in a lower position.

• Some of Pascal's adding machines which he started to build in 1642, still exist today.

Pascal began work on his calculator in 1642, when he was only 19 years old. He had been assisting his father, who worked as a tax commissioner, and sought to produce a device which could reduce some of his workload. By 1652 Pascal had produced fifty prototypes and sold just over a dozen machines, but the cost and complexity of the Pascaline – combined with the fact that it could only add and subtract, and the latter with difficulty – was a barrier to further sales, and production ceased in that year. By that time Pascal had moved on to other pursuits, initially the study of atmospheric pressure, and later philosophy.

The Pascaline was a decimal machine. This proved to be a liability, however, as the contemporary French currency system was not decimal. It was instead similar to the Imperial pounds ("livres"), shillings ("sols") and pence ("deniers") in use in Britain until the 1970s, and necessitated that the user perform further calculations if the Pascaline was to be used for its intended purposes, as a currency calculator.In 1799 France changed to a metric system, by which time Pascal's basic design had inspired other craftsmen, although with a similar lack of commercial success. Child prodigy Gottfried Wilhelm von Leibniz produced a competing design, the Stepped Reckoner, in 1672 which could perform addition, subtraction, multiplication and division, but calculating machines did not become commercially viable until the early 19th century, when Charles Xavier Thomas de Colmar's Arithmometer, itself based on Von Leibniz's design, was commercially successful.

Page 35: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

• Babbage built in 1800's a computational device called difference engine.

• This machine had features seen in modern computers- means for reading input data, storing data, performing calculations, producing output and automatically controlling the operations of the machine.

Difference engines were forgotten and then rediscovered in 1822 by Charles Babbage, who proposed it in a paper to the Royal Astronomical Society entitled "Note on the application of machinery to the computation of very big mathematical tables." This machine used the decimal number system and was powered by cranking a handle. The British government initially financed the project, but withdrew funding when Babbage repeatedly asked for more money whilst making no apparent progress on building the machine. Babbage went on to design his much more general analytical engine but later produced an improved difference engine design (his "Difference Engine No. 2") between 1847 and 1849. Inspired by Babbage's difference engine plans, Per Georg Scheutz built several difference engines from 1855 onwards; one was sold to the British government in 1859. Martin Wiberg improved Scheutz's construction but used his device only for producing and publishing printed logarithmic tables.

Page 36: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

The principle of a difference engine is Newton's method of differences, It may be illustrated with a small example. Consider the quadratic polynomial p(x) = 2x2 − 3x + 2

To calculate p(0.5) we use the values from the lowest diagonal. We start with the rightmost column value of 0.04. Then we continue the second column by subtracting 0.04 from 0.16 to get 0.12. Next we continue the first column by taking its previous value, 1.12 and subtracting the 0.12 from the second column. Thus p(0.5) is 1.12-0.12 = 1.0. In order to compute p(0.6), we iterate the same algorithm on the p(0.5) values: take 0.04 from the third column, subtract that from the second column's value 0.12 to get 0.08, then subtract that from the first column's value 1.0 to get 0.92, which is p(0.6).

Page 37: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Babbage’s analytical engine

Portion of the mill of the Analytical Engine with printing mechanism, under construction at the time of Babbage’s death.

• Babbage also designed a more sophisticated machine known as analytical engine which had a mechanism for branching and a means for programming using punched cards.

The designs for the Analytical Engine include almost all the essential logical features of a modern electronic digital computer. The engine was programmable using punched cards. It had a ‘store’ where numbers and intermediate results could be held and a separate ‘mill’ where the arithmetic processing was performed. The separation of the ‘store’ (memory) and ‘mill’ (central processor) is a fundamental feature of the internal organisation of modern computers. The Analytical Engine could have `looped’ (repeat the same sequence of operations a predetermined number of times) and was capable of conditional branching (IF… THEN… statements) i.e. automatically take alternative courses of action depending on the result of a cacluation. Had it been built it would have needed to be operated by a steam engine of some kind. Babbage made little attempt to raise funds to build the Analytical Engine. Instead he continued to work on simpler and cheaper methods of manufacturing parts and built a small trial model which was under construction at the time of his death.

Page 38: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

• Analytical engine was never built because technology was not developed to meet the required standards.

• Ada Lovelace (daughter of the poet Byron) worked with Babbage to write the earliest computer programs to solve problems on the analytical engine.

• A version of Babbage's difference engine was actually built by the Science Museum in London in 1991 and can still be viewed today. • It took over a century until the start of World War II, before the next major development in computing machinery took place.

• In England, German U-boat submarines were causing heavy damage on allied shipping. • The U-boats received communications using a secret code which was implemented by a machine made by Siemens known as ENIGMA.

Page 39: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

• The process of encryption used by Enigma was known for a longtime. But decoding was a much harder task.

• Alan Turing, a British Mathematician, and others in England, built an electromechanical machine to decode the message sent by ENIGMA.

• The Colossus was a successful code breaking machine that came out of Turing's research.

• Colossus had all the features of an electronic computers. • Vacuum tubes to store the contents of a paper tape that is fed into the machine. • Computations took place among the vacuum tubes and programming was performed with plug boards.

Page 40: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

• Around the same time as Turing’s efforts, Eckert and Mauchly set out to create a machine to compute tables of ballistic trajectories for the U. S. Army.

• The result of their effort was the Electronic Numerical Integrator and Computer (ENIAC) at Moore School of Engineering at the University of Pennsylvania.

• ENIAC consisted of 18000 vacuum tubes, which made up the computing section of the machines programming and data entry were performed by setting switches.• There was no concept of stored program, and there was no central memory unit. • But these were not serious limitations since ENIAC was intended to do special-purpose calculations. • ENIAC was not ready until the war was over, but it wa successfully use for nine years after the war (1946-1955).

• After ENIAC was completed, von Neumann joined Ekert and Mauchly. Together, they worked on a model for stored program computer called EDVAC.

Page 41: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

• Eckert and Mauchley and von Neumann and Goldstein split up after disputes over credit and differences of opinion.

• But the concept of stored program computer came out of this collaboration. The term von Neumann architecture is used to denote a stored program computer.

• Wilkes in Cambridge University built a stored program computer (EDSAC) that was completed in 1947.

• Atanasoff at Iowa State built a small-scale electronic computer.His work came to light as part of a lawsuit.

• Another early machine that deserves credit is Konrad Zuse’s machine in Germany.

• Historical papers (e.g. by Knuth) have been written on the earliest computer programs ever written on these machines:

• sorting• generate perfect squares, primes

Page 42: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Comparison of early computers

Page 43: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

ENIAC - details

• Decimal (not binary)• 20 accumulators of 10 digits• ENIAC used ten-position ring counters to store digits; each digit used 36 tubes, 10 of which were the dual triodes making up the flip-flops of the ring counter. Arithmetic was performed by "counting" pulses with the ring counters and generating carry pulses if the counter "wrapped around", the idea being to emulate in electronics the operation of the digit wheels of a mechanical adding

machine.

• Programmed manually by switches• 18,000 vacuum tubes• 30 tons• 15,000 square feet• 140 kW power consumption• 5,000 additions per second

Page 44: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

• The basic clock cycle was 200 microseconds, or 5,000 cycles per second for operations on the 10-digit numbers. In one of these cycles, ENIAC could write a number to a register, read a number from a register, or add/subtract two numbers. A multiplication of a 10-digit number by a d-digit number (for d up to 10) took d+4 cycles, so a 10- by 10-digit multiplication took 14 cycles, or 2800 microseconds—a rate of 357 per second. If one of the numbers had fewer than 10 digits, the operation was faster. Division and square roots took 13(d+1) cycles, where d is the number of digits in the result (quotient or square root). So a division or square root took up to 143 cycles, or 28,600 microseconds—a rate of 35 per second. If the result had fewer than ten digits, it was obtained faster.

• By the simple (if expensive) expedient of never turning the machine off, the engineers reduced ENIAC's tube failures to the acceptable rate of one tube every two days. According to a 1989 interview with Eckert the continuously failing tubes story was therefore mostly a myth: "We had a tube fail about every two days and we could locate the problem within 15 minutes."• In 1954, the longest continuous period of operation without a failure was 116 hours (close to five days). This failure rate was remarkably low, and stands as a tribute to the precise engineering of ENIAC.

ENIAC – ALU, Reliability

Page 45: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

von Neumann machine

• Stored Program concept• Main memory storing programs and data• ALU operating on binary data• Control unit interpreting instructions from memory and

executing• Input and output equipment operated by control unit• Princeton Institute for Advanced Studies

• IAS• Completed 1952

Page 46: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Commercial Computers

• 1947 - Eckert-Mauchly Computer Corporation• BINAC

• UNIVAC I (Universal Automatic Computer)• Sold for $1 Million.• Totally 48 machines were built

• US Bureau of Census 1950 calculations• Became part of Sperry-Rand Corporation• Late 1950s - UNIVAC II

• Faster• More memory

Page 47: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

IBM and DEC

• Punched-card processing equipment• Office automation, electric type-writers etc.

• 1953 - the 701• IBM’s first stored program computer• Scientific calculations

• 1955 - the 702• Business applications

• Lead to 700/7000 series• In 1964, IBM invested $5 B to build IBM 360

series of machines. (main-frame)• DEC developed PDP series (minicomputers)

Page 48: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Transistors

• Replaced vacuum tubes• Smaller• Cheaper• Less heat dissipation• Solid State device• Made from Silicon (Sand)• Invented 1947 at Bell Labs• William Shockley, Bardeen and Brittain

Page 49: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Transistor Based Computers

• Second generation machines• NCR & RCA produced small transistor

machines• IBM 7000• DEC - 1957

• Produced PDP-1

Page 50: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Microelectronics

• Literally - “small electronics”• A computer is made up of gates, memory cells

and interconnections• These can be manufactured on a

semiconductor• e.g. silicon wafer

Page 51: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Generations of Computers

• Vacuum tube - 1946-1957• Transistor - 1958-1964• Small scale integration - 1965 on

• Up to 100 devices on a chip• Medium scale integration - to 1971

• 100-3,000 devices on a chip• Large scale integration - 1971-1977

• 3,000 - 100,000 devices on a chip• Very large scale integration - 1978 to date

• 100,000 - 100,000,000 devices on a chip• Ultra large scale integration

• Over 100,000,000 devices on a chip

Page 52: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Introduction

1.1 Introduction 1.2 The Task of a Computer Designer 1.3 Technology and Computer Usage Trends 1.4 Cost and Trends in Cost 1.5 Measuring and Reporting Performance 1.6 Quantitative Principles of Computer Design 1.7 Putting It All Together: The Concept of Memory Hierarchy

Page 53: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

What’s Computer Architecture?

The attributes of a [computing] system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation. Amdahl, Blaaw, and Brooks, 1964

SOFTWARESOFTWARE

Page 54: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

What’s Computer Architecture?

• 1950s to 1960s: Computer Architecture Course Computer Arithmetic was the main focus.

• 1970s to mid 1980s: Computer Architecture Course Instruction Set Design, especially ISA appropriate for compilers. (covered in Chapter 2)

• 1990s and later: Computer Architecture CourseDesign of CPU, memory system, I/O system, Multiprocessors, instruction-level parallelism etc.

Page 55: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

The Task of a Computer Designer

1.1 Introduction 1.2 The Task of a

Computer Designer 1.3 Technology and

Computer Usage Trends

1.4 Cost and Trends in Cost

1.5 Measuring and Reporting Performance

1.6 Quantitative Principles of Computer Design

1.7 Putting It All Together: The Concept of Memory Hierarchy

Evaluate ExistingEvaluate ExistingSystems for Systems for BottlenecksBottlenecks

Simulate NewSimulate NewDesigns andDesigns and

OrganizationsOrganizations

Implement NextImplement NextGeneration SystemGeneration System

TechnologyTrends

Benchmarks

Workloads

ImplementationComplexity

Page 56: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Technology and Computer Usage Trends

1.1 Introduction 1.2 The Task of a Computer

Designer 1.3 Technology and

Computer Usage Trends 1.4 Cost and Trends in Cost 1.5 Measuring and

Reporting Performance 1.6 Quantitative Principles

of Computer Design 1.7 Putting It All Together:

The Concept of Memory Hierarchy

Similarly, Computer Architecture is about working within constraints:

• What will the market buy?• Cost/Performance• Tradeoffs in materials and

processes

When building a Cathedral numerous very practical considerations need to be taken into account:

• available materials• worker skills• willingness of the client to pay

the price.

Page 57: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

TrendsGordon Moore (Founder of Intel) observed in 1965 that the number of

transistors that could be crammed on a chip doubles every year.

This has CONTINUED to be true since then. (Moore’s law)Transistors Per Chip

1.E+03

1.E+04

1.E+05

1.E+06

1.E+07

1.E+08

1970 1975 1980 1985 1990 1995 2000 2005

4004

Power PC 601486

386

80286

8086

Pentium

Pentium Pro

Pentium II

Power PC G3

Pentium 3

Page 58: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Trends

Processor performance, as measured by the SPEC benchmark has also risen dramatically.

0

1 0 0 0

2 0 0 0

3 0 0 0

4 0 0 0

5 0 0 0

D E C A l p h a 2 1 2 6 4 / 6 0 0

D E C A l p h a 5 / 5 0 0

D E C A l p h a 4 / 2 6 6

D E C A X P /

5 0 0S u n

- 4 /2 6 0

I B MR S /

6 0 0 0

M I P S M

2 0 0 0

A l p h a 6 / 8 3 3

Page 59: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Trends

Memory Capacity (and Cost) have changed dramatically in the last 20 years.

size

Year

Bits

1000

10000

100000

1000000

10000000

100000000

1000000000

1970 1975 1980 1985 1990 1995 2000

year size(Mb) cyc time1980 0.0625 250 ns1983 0.25 220 ns1986 1 190 ns1989 4 165 ns1992 16 145 ns1996 64 120 ns2000 256 100 ns

Page 60: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Trends

Based on SPEED, the CPU has increased dramatically, but memory and disk have increased only a little. This has led to dramatic changes in architecture, Operating Systems, and Programming practices.

Capacity Speed (latency)

Logic 2x in 3 years 2x in 3 years

DRAM 4x in 3 years 2x in 10 years

Disk 4x in 3 years 2x in 10 years

Page 61: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA
Page 62: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Growth in Microprocessor performance

• Growth in microprocessor performance since the mid-1980’s has been substantially higher than before 1980’s.

• Figure 1.1 shows this trend. Processor performance has increased by a factor of about 1600 in 16 years.

• There are two graphs in this figure. The graph showing growth rate of about 1.58 per year in the real one; the other one showing a rate of 1.35 per year is an extrapolation from the time prior to 1980.

• Prior to 1980’s performance growth was largely technology driven.

• Growth increases since 1980 is attributable to architectural and organizational ideas.

Page 63: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Performance Trends

Page 64: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Current taxonomy of computer systems

The early computers were just main-frame systems. Current classification of computer systems is broadly:• Desktop computing

• Low-end PC’s• High-end work-stations

• Servers• High throughput• Greater reliability (with back-up)

• Embedded computers• Hand-held devices (video-games, PDA, cell-phones)• Sensor networks • appliances

Page 65: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Summary of the three computing classes

Page 66: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Cost of IC• wafer is manufactured (typically circular) and the dies are cut.• tested, packaged and shipped. (all dies are identical.)

Page 67: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Cost of IC

• wafer is manufactured (typically circular). (See Figure 1.8) and the dies are cut.• tested, packaged and shipped. (all dies are identical.) Cost of die + testing + packaging and final test

Cost of IC = Final test yield Cost of wafer

Cost of Die = Dies per wafer * Die yield

Dies per wafer = (wafer radius)2 * Wafer diameter

------------------- - --------------------- Die area (2 * Die area)0.5

Can you explain the correction factor?

Page 68: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA
Page 69: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA
Page 70: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA
Page 71: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Distribution of cost in a system

Page 72: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Measuring And Reporting Performance

1.1 Introduction 1.2 The Task of a Computer

Designer 1.3 Technology and Computer

Usage Trends 1.4 Cost and Trends in Cost 1.5 Measuring and Reporting

Performance 1.6 Quantitative Principles of

Computer Design 1.7 Putting It All Together: The

Concept of Memory Hierarchy

This section talks about:

1. Metrics – how do we describe in a numerical way the performance of a computer?

2. What tools do we use to find those metrics?

Page 73: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Metrics

• Time to run the task (Exec Time)

• Execution time, response time, latency

• Tasks per day, hour, week, sec, ns … (Performance)

• Throughput, bandwidth

Plane

Boeing 747

Concorde

Speed

610 mph

1350 mph

DC to Paris

6.5 hours

3 hours

Passengers

470

132

Throughput (pmph)

286,700

178,200

Page 74: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Metrics - Comparisons

"X is n times faster than Y" means

ExTime(Y) Performance(X)

--------- = --------------- = n

ExTime(X) Performance(Y)

Speed of Concorde vs. Boeing 747

Throughput of Boeing 747 vs. Concorde

Page 75: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Metrics - ComparisonsPat has developed a new product, "rabbit" about which she wishes to

determine performance. There is special interest in comparing the new product, rabbit to the old product, turtle, since the product was rewritten for performance reasons. (Pat had used Performance Engineering techniques and thus knew that rabbit was "about twice as fast" as turtle.) The measurements showed:

 Performance Comparisons Product Transactions / second Seconds/ transaction Seconds to

process transactionTurtle 30 0.0333 3Rabbit 60 0.0166 1

Which of the following statements reflect the performance comparison of rabbit and turtle?

 

o Rabbit is 100% faster than turtle.o Rabbit is twice as fast as turtle.o Rabbit takes 1/2 as long as turtle.o Rabbit takes 1/3 as long as turtle.o Rabbit takes 100% less time than turtle.

o Rabbit takes 200% less time than turtle.o Turtle is 50% as fast as rabbit.o Turtle is 50% slower than rabbit.o Turtle takes 200% longer than rabbit.o Turtle takes 300% longer than rabbit.

Page 76: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Metrics - Throughput

Compiler

Programming Language

Application

DatapathControl

TransistorsWiresPins

ISA

Function Units

(millions) of Instructions per second: MIPS(millions) of (FP) operations per second: MFLOP/s

Cycles per second (clock rate)

Megabytes per second

Answers per monthOperations per second

Page 77: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Methods For Predicting Performance

• Benchmarks• Hardware: Cost, delay, area, power estimation• Simulation (many levels)

• ISA, RT, Gate, Circuit• Queuing Theory• Rules of Thumb• Fundamental “Laws”/Principles• Trade-offs

Page 78: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Execution time

Execution time can be defined in different ways:• wall-clock time, response time or elapsed time: latency to complete the task. (includes disk accesses, memory accesses, input/output activities etc.), including all possible overheads.• CPU time: does not include the I/O, and other overheads.

• user CPU time: execution of application tasks• system CPU time: execution of system tasks

Throughput vs. efficiency

Page 79: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Benchmarks

• First Round 1989• 10 programs yielding a single number (“SPECmarks”)

• Second Round 1992• SPECInt92 (6 integer programs) and SPECfp92 (14 floating point

programs)• Compiler Flags unlimited. March 93 of DEC 4000 Model 610:

spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)=memcpy(b,a,c)”

wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas

• Third Round 1995• new set of programs: SPECint95 (8 integer programs) and SPECfp95

(10 floating point) • “benchmarks useful for 3 years”• Single flag setting for all programs: SPECint_base95, SPECfp_base95

SPEC: System Performance Evaluation Cooperative

Page 80: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Benchmarks

CINT2000 (Integer Component of SPEC CPU2000):

Program Language What Is It164.gzip C Compression175.vpr C FPGA Circuit Placement and

Routing176.gcc C C Programming Language

Compiler181.mcf C Combinatorial Optimization186.crafty C Game Playing: Chess197.parser C Word Processing252.eon C++ Computer Visualization253.perlbmk C PERL Programming Language254.gap C Group Theory, Interpreter255.vortexC Object-oriented Database256.bzip2 C Compression300.twolf C Place and Route Simulatorhttp://www.spec.org/osg/cpu2000/CINT2000/

Page 81: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Benchmarks

CFP2000 (Floating Point Component of SPEC CPU2000):

Program Language What Is It168.wupwise Fortran 77 Physics / Quantum

Chromodynamics171.swim Fortran 77 Shallow Water Modeling172.mgrid Fortran 77 Multi-grid Solver: 3D Potential Field173.applu Fortran 77 Parabolic / Elliptic Differential Equations177.mesa C 3-D Graphics Library178.galgel Fortran 90 Computational Fluid Dynamics179.art C Image Recognition / Neural

Networks183.equake C Seismic Wave Propagation

Simulation187.facerec Fortran 90 Image Processing: Face

Recognition188.ammpC Computational Chemistry189.lucas Fortran 90 Number Theory / Primality Testing191.fma3dFortran 90 Finite-element Crash Simulation 200.sixtrack Fortran 77 High Energy Physics Accelerator

Design 301.apsi Fortran 77 Meteorology: Pollutant Distribution

Page 82: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Benchmarks Sample Results For SpecINT2000

Base Base Base Peak Peak PeakBenchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio

164.gzip 1400 277 505* 1400 270 518*

175.vpr 1400 419 334* 1400 417 336*

176.gcc 1100 275 399* 1100 272 405*

181.mcf 1800 621 290* 1800 619 291*

186.crafty 1000 191 522* 1000 191 523*

197.parser 1800 500 360* 1800 499 361*

252.eon 1300 267 486* 1300 267 486*

253.perlbmk 1800 302 596* 1800 302 596*

254.gap 1100 249 442* 1100 248 443*

255.vortex 1900 268 710* 1900 264 719*

256.bzip2 1500 389 386* 1500 375 400*

300.twolf 3000 784 382* 3000 776 387*

SPECint_base2000 438

SPECint2000 442

http://www.spec.org/osg/cpu2000/results/res2000q3/cpu2000-20000718-00168.asc

Intel OR840(1 GHz Pentium III processor)

Page 83: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Benchmarks

Performance Evaluation

• “For better or worse, benchmarks shape a field”• Good products created when have:

• Good benchmarks• Good ways to summarize performance

• Given sales is a function in part of performance relative to competition, investment in improving product as reported by performance summary

• If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales;Sales almost always wins!

• Execution time is the measure of computer performance!

Page 84: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Benchmarks

Management would like to have one number.Technical people want more:1. They want to have evidence of reproducibility – there

should be enough information so that you or someone else can repeat the experiment.

2. There should be consistency when doing the measurements multiple times.

How to Summarize Performance

How would you report these results?

Computer A Computer B

Computer C

Program P1 (secs)

1 10 20

Program P2 (secs)

1000 100 20

Total Time (secs) 1001 110 40

Page 85: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Quantitative Principles of Computer Design

1.1 Introduction 1.2 The Task of a Computer

Designer 1.3 Technology and Computer

Usage Trends 1.4 Cost and Trends in Cost 1.5 Measuring and Reporting

Performance 1.6 Quantitative Principles of

Computer Design 1.7 Putting It All Together: The

Concept of Memory Hierarchy

Make the common case fast.Amdahl’s Law:

Relates total speedup of a system to the speedup of some portion of that system.

Page 86: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Amdahl's Law

Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected

Quantitative Design

tEnhancemenWithoutePerformanc

tEnhancemenWithePerformanc

tEnhancemenWithTimeExecution

tEnhancemenWithoutTimeExecutionESpeedup

__

__

___

___)(

Speedup due to enhancement E:

This fraction enhanced

Page 87: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced

Speedupoverall =ExTimeold

ExTimenew

Speedupenhanced

=

1

(1 - Fractionenhanced) + Fractionenhanced

Speedupenhanced

This fraction enhanced

ExTimeold ExTimenew

Quantitative Design - Amdahl’s law

Page 88: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

• Floating point instructions improved to run 2X; but only 10% of actual instructions are FP

Speedupoverall = 1

0.95= 1.053

ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold

Quantitative Design -Amdahl's Law

Page 89: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Quantitative Design

“Instruction Frequency”

Invest Resources where time is Spent!

CPI = (CPU Time * Clock Rate) / Instruction Count

= Cycles / Instruction Count

n

iii ICPITimeCycleTimeCPU

1

**__

n

iii FCPICPI

1

* whereCountnInstructio

Ii

iF _

Number of instructions of type I.

Cycles Per Instruction

Page 90: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Quantitative Design

Base Machine (Reg / Reg)

Op Freq Cycles CPI(i) (% Time)

ALU 50% 1 .5 (33%)

Load 20% 2 .4 (27%)

Store 10% 2 .2 (13%)

Branch 20% 2 .4 (27%)

Total CPI 1.5

Suppose we have a machine where we can count the frequency with which instructions are executed. We also know how many cycles it takes for each instruction type.

How do we get CPI(I)?How do we get %time?

Page 91: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Quantitative Design Locality of Reference

Programs access a relatively small portion of the address space at any instant of time.

There are two different types of locality:

Temporal Locality (locality in time): If an item is referenced, it will tend to be referenced again soon (loops, reuse, etc.)

Spatial Locality (locality in space/location): If an item is referenced, items whose addresses are close by tend to be referenced soon (straight line code, array access, etc.)

Page 92: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

The Concept of Memory Hierarchy

1.1 Introduction 1.2 The Task of a Computer

Designer 1.3 Technology and Computer

Usage Trends 1.4 Cost and Trends in Cost 1.5 Measuring and Reporting

Performance 1.6 Quantitative Principles of

Computer Design 1.7 Putting It All Together: The

Concept of Memory Hierarchy

Fast memory is expensive.

Slow memory is cheap.

The goal is to minimize the price/performance for a particular price point.

Page 93: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Memory Hierarchy

RegistersLevel 1 cache

Level 2Cache

Memory Disk

Typical Size

4 - 64 <16K bytes

<2 Mbytes

<16 Gigabyte

s

> 5 Gigabyte

s

Access Time

1 nsec 3 nsec 15 nsec 150 nsec 5,000,000 nsec

Bandwidth (in

MB/sec)

10,000 – 50,000

2000 - 5000

500 - 1000

500 - 1000

100

Managed By

Compiler Hardware

Hardware

OS OS/User

Page 94: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Memory Hierarchy

• Hit: data appears in some block in the upper level (example: Block X)

• Hit Rate: the fraction of memory access found in the upper level

• Hit Time: Time to access the upper level which consists of

RAM access time + Time to determine hit/miss• Miss: data needs to be retrieved from a block in the lower

level (Block Y)• Miss Rate = 1 - (Hit Rate)• Miss Penalty: Time to replace a block in the upper level

+ Time to deliver the block to the processor

• Hit Time << Miss Penalty (500 instructions on 21264!)

Page 95: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Memory Hierarchy

RegistersLevel 1 cache

Level 2Cache

Memory Disk

What is the cost of executing a program if:• Stores are free (there’s a write pipe)• Loads are 20% of all instructions• 80% of loads hit (are found) in the Level 1

cache• 97 of loads hit in the Level 2 cache.

Page 96: CES 524 Computer Architecture Fall 2006 W 6 to 9 PM Salazar Hall 2008 B. Ravikumar (Ravi) Office: Darwin Hall, 116 I Office hours: TBA

Wrap Up

1.1 Introduction 1.2 The Task of a Computer Designer 1.3 Technology and Computer Usage Trends 1.4 Cost and Trends in Cost 1.5 Measuring and Reporting Performance 1.6 Quantitative Principles of Computer Design 1.7 Putting It All Together: The Concept of Memory

Hierarchy