new trends in computer architecture design saeid nooshabadi arthur sale university of tasmania

35
NEW TRENDS IN NEW TRENDS IN COMPUTER ARCHITECTURE DESIGN COMPUTER ARCHITECTURE DESIGN Saeid Nooshabadi Arthur Sale University of Tasmania

Upload: alexander-benson

Post on 26-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

NEW TRENDS IN NEW TRENDS IN COMPUTER ARCHITECTURE DESIGNCOMPUTER ARCHITECTURE DESIGN

Saeid Nooshabadi

Arthur Sale

University of Tasmania

OutlineOutline

Desktop/Server Microprocessor State of the Art

Current Processors Limit

Embedded Processors Market

Mobile Multimedia Computing as New Direction

Conclusion

Computer in the NewsComputer in the NewsTechnology Marches on (1)Technology Marches on (1)

SANTA CLARA, Calif., March 8, 2000 --

Intel Corporation today introduced the Intel® Pentium® III processor 1.0 GHz (GigaHertz or 1,000 MegaHertz), the world's highest performance microprocessor for PCs. The Pentium III processor at 1 GHz delivers a 15 percent performance gain over the fastest processors on the market today.

Source: http://www.intel.com

Computer in the NewsComputer in the NewsTechnology Marches on (2)Technology Marches on (2)

INTEL DEVELOPER FORUM, Calif., Feb. 15, 2000 -

Intel Corporation Chairman Andrew S. Grove today kicked off the semi-annual Intel Developer Forum by demonstrating the company's fastest microprocessor: a chip running at 1.5 GHz, or 1.5 billion clock cycles per second, at room temperature. Based on a new microarchitecture from Intel, the chip is code-named "Willamette." (To be marketed towards end of the year) Source: http://www.intel.com

Who needs 1.5 GHz Processor?

State of the Art: Alpha 21264State of the Art: Alpha 21264

15M transistors

2 x 64KB caches on chip; 16MB L2 cache off chip

Clock <1.7 nsec, or >600 MHz

(Fastest Cray Supercomputer: T90 2.2 nsec)

90 watts

Superscalar: fetch up to 6 instructions/clock cycle,

retires up to 4 instruction/clock cycle

Execution out-of-order

Processor Limit: DRAM GapProcessor Limit: DRAM Gap

µProc60%/yr..

DRAM7%/yr..

1

10

100

10001980

1981

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

DRAM

CPU

1982

Processor-MemoryPerformance Gap:(grows 50% / year)

Per

form

ance

“Moore’s Law”

Processor-Memory Performance Gap Processor-Memory Performance Gap “Tax” (1)“Tax” (1)

Processor % Area %Transistors

(cost*) (power)

Alpha 21164 37% 77%

StrongArm SA110 61% 94%

Pentium Pro 64% 88%2 dies per package: Proc/I$/D$ + L2$

Caches have no inherent value, only try to close performance gap

* COST =F(Area4)

Processor-Memory Performance Gap Processor-Memory Performance Gap “Tax” (2)“Tax” (2)

Microprocessor-DRAM performance gap

> time of a full cache miss in instructions executed

1st Alpha (7000): 340 ns/5.0 ns =  68 clks x 2 or 136

2nd Alpha (8400): 266 ns/3.3 ns =  80 clks x 4 or 320

3rd Alpha (21264): 180 ns/1.7 ns =108 clks x 6 or 648

> 1/2X latency x 3X clock rate x 3X Instr/clock 5X

Today’s Situation: MicroprocessorToday’s Situation: Microprocessor

MIPS MPUs R5000 R10000 10k/5kClock Rate 200 MHz 195 MHz 1.0x

On-Chip Caches 32K/32K 32K/32K 1.0x

Instructions/Cycle 1(+ FP) 4 4.0x

Pipe stages 5 5-7 1.2x

Model In-order Out-of-order ---

Die Size (mm2) 84 298 3.5x

without cache, TLB 32 205 6.3x

Development (man yr.) 60 300 5.0x

SPECint_base95 5.7 8.8 1.6x

Processors Evaluation MetricsProcessors Evaluation Metrics

SPECint95: Suit of Integer Programs

SPECft95: Suit of Floating Point Programs

TCP-C: On Line Transaction Processing Programs

(OLTP)

All state of the arts processors perform well for

SPECint95 and SPECft95 (scientific and technical

applications)

TCP-C ?

Processor Limits for TPC-CProcessor Limits for TPC-C

SPEC-

Pentium Pro int95 TPC-C

> Multilevel Caches: Miss rate 1MB L2 cache 0.5% 5%

> Superscalar (2-3 instr. retired/clock): % clks 40% 10%

> Out-of-Order Execution speedup 2.0X 1.4X

> Clocks per Instruction 0.8 3.4

% Peak performance 40% 10%source: Bhandarkar, D.; Ding, J. “Performance characterization of the Pentium Pro processor.” Proc. 3rd Int'l. Symp. on High-Performance Computer Architecture, Feb 1997. p. 288-97.

Embedded Processor MarketEmbedded Processor Market

Over 97% of the processors fabricated 50% of the revenues from processor sales Embedded devices cover wide range products

> simple devices such as thermostats and toasters

> complex and mission-critical applications such as avionics systems.

> In between are phones, facsimile machines, ATM switches, digital cameras, automotive applications, set-top boxes, ...

Embedded Processor DesignEmbedded Processor Design

Drives the technology “Post-PC” era

Embedded processors incorporate capabilities

traditionally associated with the conventional CPUs.

They are subject to challenging > cost, > power consumption,> and application- imposed constraints.

Intel Embedded Mobile Celeron ProcessorIntel Embedded Mobile Celeron Processor

Available at 600, 566, 533, 500 and 466 MHz. Dynamic Execution technology. Includes Intel MMX™ media enhancement technology. Intel Streaming SIMD Extensions (available on the Intel Celeron

Processor at 566 and 600 MHz). 32 Kbyte (16 Kbyte/16 Kbyte) Level 1 cache. 128 Kbyte integrated Level 2 cache. 66 MHz Intel P6 micro-architecture's multitransaction system bus. Intel Chipset support: Intel® 810 chipset, Intel® 810E chipset,

Intel® 440BX, Intel® 440EX and the Intel® 440ZX-66 AGPset. Power 17 - 30 Watts Source: http://www.intel.com

Desktop/Server Processors Summary Desktop/Server Processors Summary (1)(1)

SPEC performance doubling / 18 months> Growing CPU-DRAM performance gap & tax

> Running out of ideas, competition? Back to 2X / 2.3 yrs?

Benchmarks: SPEC-int, SPEC-ft, TPC (for OLTP) > Benchmark highest optimization, ship lowest optimization?

Processor tricks not as useful for transactions?> Clock rate increase compensated by CPI increase?> When > 100 MIPS on TPC-C?

Desktop/Server Processors Summary Desktop/Server Processors Summary (2)(2)

Embedded processors promising

> Strong ARM 110: 233 MHz, 268 MIPS, 0.36W typ., $49

> 1/10 cost, 1/100 power, 1/2 integer performance?

Consolidation of desktop industry? Innovation?

Time to look for the computing trends and

applications of tomorrow?

Billion Transistor Architectures and Billion Transistor Architectures and “Stationary Computer” Metrics“Stationary Computer” Metrics

SS++ Trace SMT CMP IA-64* RAWSPEC Int + + + = + =SPEC FP + + + + + =TPC (DataBse) = = + + = –SW Effort + + = = = –Design Scal. – = – = = =Physical – = – = = +

Design Complexity(See IEEE Computer (9/97), Special Issue on Billion Transistor Microprocessors)

> *Very Long Instruction Word (Intel,HP IA-64/Merced)– multiple ops/ instruction, compiler controls parallelism– Coined as the next generation Intel/HP processor– Renamed Itanium™ (October 99)

Current Computer Design with the Bias Current Computer Design with the Bias for the Pastfor the Past

Most Billion Transistor Architectures show high

physical design complexity

Most show impressive performance for SPEC suits

of programs

Suitablity:> suitable for high end traditonal applications> unsuitable for pervasive computing environment of the

future; > high power budget (>180 Watts), > expensive (>$500)

Applications of past to design computers of future

Challenge for Future MicroprocessorsChallenge for Future Microprocessors

“...wires are not keeping pace with scaling of other

features. … In fact, for CMOS processes below 0.25

micron ... an unacceptably small percentage of the

die will be reachable during a single clock cycle.”

“Architectures that require long-distance, rapid

interaction will not scale well ...”

> “Will Physical Scalability Sabotage Performance Gains?”

Matzke, IEEE Computer (9/97)

Computer in the NewsComputer in the NewsExpert TalkingExpert Talking

“Intel specializes in designing microprocessors for the desktop

PC, which in five years may no longer be the most important

type of computer. Its successor may be a personal mobile

computer that integrates the portable computer with a cellular

phone, digital camera, and video game player… Such devices

require low- cost, energy- efficient microprocessors, and Intel

is far from a leader in that area.”

-David Patterson, NY Times, June 9, 1998*

*David Patterson led the design of Berkeley RISC Machine, the first RISC computer. He is also the author/co-author of two of most popular Textbooks on Computer Architecture.

Post PC MotivationPost PC Motivation

Next generation fixes problems of last gen.

1960s: batch processing + slow turnaround

Timesharing

> 15-20 years of performance improvement, cost reduction (minicomputers, semiconductor memory)

1980s: Time sharing + inconsistent response times

Workstations/Personal Computers

> 15-20 years of performance improvement, cost reduction (microprocessors, DRAM memory, disk)

2000s: PCs + difficulty of use/high cost of ownership ???

Computing Trends Post-PC EraComputing Trends Post-PC Era Multimedia Applications

> real time data types; video, speech, animation, & music> 90% of desktop cycles will be spent on media applications by end of

2000.> Multimedia workloads will continue in importance> Image, handwriting, and speech recognition will pose other major

challenges.

Pervasive Mobile Computing Devices> support an expanding range of functions> challenge is in converging them into a single device> keeping the size, weight, and power consumption constant.

Sony Playstation 2000Sony Playstation 2000

Emotion Engine: 6.2 GFLOPS, 75 million polygons per second

(Microprocessor Report, 13:5)

> Superscalar MIPS core + vector coprocessor + graphics/DRAM> Claim: Toy Story realism brought to games!

Intelligent PDA ( 2005?)Intelligent PDA ( 2005?)

Pilot PDA

gameboy, cell phone, radio,

timer, camera, TV remote,

am/fm radio, garage door

opener, ...

Wireless data (WWW)

Speech, vision recog.

Voice output for conversations-Speech control of all devices - Vision to see, - Scan documents, - read bar code, ... - Measure room

Billion Transistor Architectures and Billion Transistor Architectures and “Mobile Multimedia” Metrics“Mobile Multimedia” Metrics

SS++ Trace SMT CMP IA-64 RAW

Design Scal. – = – = = =

Energy/power – – – = = –

Code Size = = = = – =

Real-time – – = = = =

Cont. Data = = = = = =

Memory BW = = = = = =

Fine-grain Par. = = = = = +

Coarse-gr.Par. = = + + = +> “Direction for Computer Architecture Research”, Kozyrakis,

Patterson IEEE Computer (11/98)

New Architecture DirectionsNew Architecture Directions “…media processing will become the dominant force in

computer arch. & microprocessor design.” “... new media-rich applications... involve significant real-time

processing of continuous media streams, and make heavy use of vectors of packed 8-, 16-, and 32-bit integer and Fl. Pt.”

“Needs include high memory BW, high network BW, continuous media data types, real-time response, fine grain parallelism”

“How Multimedia Workloads Will Change Processor Design”, Diefendorff & Dubey, IEEE Computer (9/97)

Some Media-Processing FunctionsSome Media-Processing FunctionsKernel Vector length

Matrix transpose/multiply (3D Gr.) # vertices at once

DCT (video, comm.) image width

FFT (audio) 256-1024

Motion estimation (video) image width, i.w./16

Gamma correction (video) image width

Haar transform (media mining) image width

Median filter (image process.) image width

(from http://www.research.ibm.com/people/p/pradeep/tutor.html)

Challenges for Mobile MultimediaChallenges for Mobile Multimedia

High performance for multimedia functions

Energy and power efficiency (<1 Watt)

Small size (fit in pocket)

Low design complexity and high degree of

scalability (costs few tens of $)

A Better Mobile Multimedia MPUs: A Better Mobile Multimedia MPUs: Logic+DRAMLogic+DRAM

Embedded DRAM processors one possibility

Faster logic in DRAM process

> DRAM vendors offer faster transistors +

same number metal layers as good logic process?

@ ≈ 20% higher cost per wafer?

Called Intelligent RAM (“IRAM”) since most of transistors

will be DRAM

Leave for another presentation> “A Case for Intelligent RAM”Patterson, Anderson, …. IEEE Computer (3/97)

10000X cost-performance increase in “stationary”

computers, consolidation of industry

=> time for architecture/OS/compiler researchers declare

victory, search for new horizons?

Mobile Multimedia offer many new challenges: energy

efficiency, size, real time performance, ...

Apps/metrics of future to design computer of future!

> Suppose PDA replaces desktop as primary computer?

> Work on FPPP on PC vs. Speech on PDA?

Mobile Multimedia ConclusionMobile Multimedia Conclusion

“Personal mobile computing offers a vision of the future with a much

richer and more exciting set of architecture research challenges than

extrapolations of the current desktop architectures and benchmarks.”

“Put another way, which problem would you rather work on: improving

performance of PCs running FPPPP—a 1982 Fortran benchmark used

in SPECfp95—or making speech input practical for PDAs? “

“Direction for Computer Architecture Research”, Kozyrakis, Patterson IEEE Computer (11/98)

From the Horse MouthFrom the Horse Mouth

ReferencesReferences

IEEE Computers; Sept. 97, Jan. 98, Aug. 98, Nov.

98,

IEEE Micro: Dec. 96, Mar. 97, Sept. 97

AcknowledgementAcknowledgement

Thanks to Dr. Vishv Malhotra for lending me some

of his IEEE Computer issues.

Thanks to Prof. Sale for going through the slides

and making useful suggestions.

WAIT FOR THE NEXT TWO SLIDES

Purpose of This TalkPurpose of This Talk

To get Staff and Students excited about the new

opportunities for research.

What would you be doing as a graduate?> Service Windows NT, and if lucky perhaps UNIX?> Develop web pages?> Do more of the same?

Or rather do something really exciting?

We need you if you choose the LATTER!

50 Post Graduate Scholarship for IT up for grab

Our Vision and AimOur Vision and Aim

Achieve Critical Mass in Research

Create a Group of Staff & Students Working on the

Problems of Future.

Pulling Australian IT Research Community

Together

Identifying Niches Where We Can Make

International Contribution.