new trends in computer architecture design saeid nooshabadi arthur sale university of tasmania
TRANSCRIPT
NEW TRENDS IN NEW TRENDS IN COMPUTER ARCHITECTURE DESIGNCOMPUTER ARCHITECTURE DESIGN
Saeid Nooshabadi
Arthur Sale
University of Tasmania
OutlineOutline
Desktop/Server Microprocessor State of the Art
Current Processors Limit
Embedded Processors Market
Mobile Multimedia Computing as New Direction
Conclusion
Computer in the NewsComputer in the NewsTechnology Marches on (1)Technology Marches on (1)
SANTA CLARA, Calif., March 8, 2000 --
Intel Corporation today introduced the Intel® Pentium® III processor 1.0 GHz (GigaHertz or 1,000 MegaHertz), the world's highest performance microprocessor for PCs. The Pentium III processor at 1 GHz delivers a 15 percent performance gain over the fastest processors on the market today.
Source: http://www.intel.com
Computer in the NewsComputer in the NewsTechnology Marches on (2)Technology Marches on (2)
INTEL DEVELOPER FORUM, Calif., Feb. 15, 2000 -
Intel Corporation Chairman Andrew S. Grove today kicked off the semi-annual Intel Developer Forum by demonstrating the company's fastest microprocessor: a chip running at 1.5 GHz, or 1.5 billion clock cycles per second, at room temperature. Based on a new microarchitecture from Intel, the chip is code-named "Willamette." (To be marketed towards end of the year) Source: http://www.intel.com
Who needs 1.5 GHz Processor?
State of the Art: Alpha 21264State of the Art: Alpha 21264
15M transistors
2 x 64KB caches on chip; 16MB L2 cache off chip
Clock <1.7 nsec, or >600 MHz
(Fastest Cray Supercomputer: T90 2.2 nsec)
90 watts
Superscalar: fetch up to 6 instructions/clock cycle,
retires up to 4 instruction/clock cycle
Execution out-of-order
Processor Limit: DRAM GapProcessor Limit: DRAM Gap
µProc60%/yr..
DRAM7%/yr..
1
10
100
10001980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
DRAM
CPU
1982
Processor-MemoryPerformance Gap:(grows 50% / year)
Per
form
ance
“Moore’s Law”
Processor-Memory Performance Gap Processor-Memory Performance Gap “Tax” (1)“Tax” (1)
Processor % Area %Transistors
(cost*) (power)
Alpha 21164 37% 77%
StrongArm SA110 61% 94%
Pentium Pro 64% 88%2 dies per package: Proc/I$/D$ + L2$
Caches have no inherent value, only try to close performance gap
* COST =F(Area4)
Processor-Memory Performance Gap Processor-Memory Performance Gap “Tax” (2)“Tax” (2)
Microprocessor-DRAM performance gap
> time of a full cache miss in instructions executed
1st Alpha (7000): 340 ns/5.0 ns = 68 clks x 2 or 136
2nd Alpha (8400): 266 ns/3.3 ns = 80 clks x 4 or 320
3rd Alpha (21264): 180 ns/1.7 ns =108 clks x 6 or 648
> 1/2X latency x 3X clock rate x 3X Instr/clock 5X
Today’s Situation: MicroprocessorToday’s Situation: Microprocessor
MIPS MPUs R5000 R10000 10k/5kClock Rate 200 MHz 195 MHz 1.0x
On-Chip Caches 32K/32K 32K/32K 1.0x
Instructions/Cycle 1(+ FP) 4 4.0x
Pipe stages 5 5-7 1.2x
Model In-order Out-of-order ---
Die Size (mm2) 84 298 3.5x
without cache, TLB 32 205 6.3x
Development (man yr.) 60 300 5.0x
SPECint_base95 5.7 8.8 1.6x
Processors Evaluation MetricsProcessors Evaluation Metrics
SPECint95: Suit of Integer Programs
SPECft95: Suit of Floating Point Programs
TCP-C: On Line Transaction Processing Programs
(OLTP)
All state of the arts processors perform well for
SPECint95 and SPECft95 (scientific and technical
applications)
TCP-C ?
Processor Limits for TPC-CProcessor Limits for TPC-C
SPEC-
Pentium Pro int95 TPC-C
> Multilevel Caches: Miss rate 1MB L2 cache 0.5% 5%
> Superscalar (2-3 instr. retired/clock): % clks 40% 10%
> Out-of-Order Execution speedup 2.0X 1.4X
> Clocks per Instruction 0.8 3.4
% Peak performance 40% 10%source: Bhandarkar, D.; Ding, J. “Performance characterization of the Pentium Pro processor.” Proc. 3rd Int'l. Symp. on High-Performance Computer Architecture, Feb 1997. p. 288-97.
Embedded Processor MarketEmbedded Processor Market
Over 97% of the processors fabricated 50% of the revenues from processor sales Embedded devices cover wide range products
> simple devices such as thermostats and toasters
> complex and mission-critical applications such as avionics systems.
> In between are phones, facsimile machines, ATM switches, digital cameras, automotive applications, set-top boxes, ...
Embedded Processor DesignEmbedded Processor Design
Drives the technology “Post-PC” era
Embedded processors incorporate capabilities
traditionally associated with the conventional CPUs.
They are subject to challenging > cost, > power consumption,> and application- imposed constraints.
Intel Embedded Mobile Celeron ProcessorIntel Embedded Mobile Celeron Processor
Available at 600, 566, 533, 500 and 466 MHz. Dynamic Execution technology. Includes Intel MMX™ media enhancement technology. Intel Streaming SIMD Extensions (available on the Intel Celeron
Processor at 566 and 600 MHz). 32 Kbyte (16 Kbyte/16 Kbyte) Level 1 cache. 128 Kbyte integrated Level 2 cache. 66 MHz Intel P6 micro-architecture's multitransaction system bus. Intel Chipset support: Intel® 810 chipset, Intel® 810E chipset,
Intel® 440BX, Intel® 440EX and the Intel® 440ZX-66 AGPset. Power 17 - 30 Watts Source: http://www.intel.com
Desktop/Server Processors Summary Desktop/Server Processors Summary (1)(1)
SPEC performance doubling / 18 months> Growing CPU-DRAM performance gap & tax
> Running out of ideas, competition? Back to 2X / 2.3 yrs?
Benchmarks: SPEC-int, SPEC-ft, TPC (for OLTP) > Benchmark highest optimization, ship lowest optimization?
Processor tricks not as useful for transactions?> Clock rate increase compensated by CPI increase?> When > 100 MIPS on TPC-C?
Desktop/Server Processors Summary Desktop/Server Processors Summary (2)(2)
Embedded processors promising
> Strong ARM 110: 233 MHz, 268 MIPS, 0.36W typ., $49
> 1/10 cost, 1/100 power, 1/2 integer performance?
Consolidation of desktop industry? Innovation?
Time to look for the computing trends and
applications of tomorrow?
Billion Transistor Architectures and Billion Transistor Architectures and “Stationary Computer” Metrics“Stationary Computer” Metrics
SS++ Trace SMT CMP IA-64* RAWSPEC Int + + + = + =SPEC FP + + + + + =TPC (DataBse) = = + + = –SW Effort + + = = = –Design Scal. – = – = = =Physical – = – = = +
Design Complexity(See IEEE Computer (9/97), Special Issue on Billion Transistor Microprocessors)
> *Very Long Instruction Word (Intel,HP IA-64/Merced)– multiple ops/ instruction, compiler controls parallelism– Coined as the next generation Intel/HP processor– Renamed Itanium™ (October 99)
Current Computer Design with the Bias Current Computer Design with the Bias for the Pastfor the Past
Most Billion Transistor Architectures show high
physical design complexity
Most show impressive performance for SPEC suits
of programs
Suitablity:> suitable for high end traditonal applications> unsuitable for pervasive computing environment of the
future; > high power budget (>180 Watts), > expensive (>$500)
Applications of past to design computers of future
Challenge for Future MicroprocessorsChallenge for Future Microprocessors
“...wires are not keeping pace with scaling of other
features. … In fact, for CMOS processes below 0.25
micron ... an unacceptably small percentage of the
die will be reachable during a single clock cycle.”
“Architectures that require long-distance, rapid
interaction will not scale well ...”
> “Will Physical Scalability Sabotage Performance Gains?”
Matzke, IEEE Computer (9/97)
Computer in the NewsComputer in the NewsExpert TalkingExpert Talking
“Intel specializes in designing microprocessors for the desktop
PC, which in five years may no longer be the most important
type of computer. Its successor may be a personal mobile
computer that integrates the portable computer with a cellular
phone, digital camera, and video game player… Such devices
require low- cost, energy- efficient microprocessors, and Intel
is far from a leader in that area.”
-David Patterson, NY Times, June 9, 1998*
*David Patterson led the design of Berkeley RISC Machine, the first RISC computer. He is also the author/co-author of two of most popular Textbooks on Computer Architecture.
Post PC MotivationPost PC Motivation
Next generation fixes problems of last gen.
1960s: batch processing + slow turnaround
Timesharing
> 15-20 years of performance improvement, cost reduction (minicomputers, semiconductor memory)
1980s: Time sharing + inconsistent response times
Workstations/Personal Computers
> 15-20 years of performance improvement, cost reduction (microprocessors, DRAM memory, disk)
2000s: PCs + difficulty of use/high cost of ownership ???
Computing Trends Post-PC EraComputing Trends Post-PC Era Multimedia Applications
> real time data types; video, speech, animation, & music> 90% of desktop cycles will be spent on media applications by end of
2000.> Multimedia workloads will continue in importance> Image, handwriting, and speech recognition will pose other major
challenges.
Pervasive Mobile Computing Devices> support an expanding range of functions> challenge is in converging them into a single device> keeping the size, weight, and power consumption constant.
Sony Playstation 2000Sony Playstation 2000
Emotion Engine: 6.2 GFLOPS, 75 million polygons per second
(Microprocessor Report, 13:5)
> Superscalar MIPS core + vector coprocessor + graphics/DRAM> Claim: Toy Story realism brought to games!
Intelligent PDA ( 2005?)Intelligent PDA ( 2005?)
Pilot PDA
gameboy, cell phone, radio,
timer, camera, TV remote,
am/fm radio, garage door
opener, ...
Wireless data (WWW)
Speech, vision recog.
Voice output for conversations-Speech control of all devices - Vision to see, - Scan documents, - read bar code, ... - Measure room
Billion Transistor Architectures and Billion Transistor Architectures and “Mobile Multimedia” Metrics“Mobile Multimedia” Metrics
SS++ Trace SMT CMP IA-64 RAW
Design Scal. – = – = = =
Energy/power – – – = = –
Code Size = = = = – =
Real-time – – = = = =
Cont. Data = = = = = =
Memory BW = = = = = =
Fine-grain Par. = = = = = +
Coarse-gr.Par. = = + + = +> “Direction for Computer Architecture Research”, Kozyrakis,
Patterson IEEE Computer (11/98)
New Architecture DirectionsNew Architecture Directions “…media processing will become the dominant force in
computer arch. & microprocessor design.” “... new media-rich applications... involve significant real-time
processing of continuous media streams, and make heavy use of vectors of packed 8-, 16-, and 32-bit integer and Fl. Pt.”
“Needs include high memory BW, high network BW, continuous media data types, real-time response, fine grain parallelism”
“How Multimedia Workloads Will Change Processor Design”, Diefendorff & Dubey, IEEE Computer (9/97)
Some Media-Processing FunctionsSome Media-Processing FunctionsKernel Vector length
Matrix transpose/multiply (3D Gr.) # vertices at once
DCT (video, comm.) image width
FFT (audio) 256-1024
Motion estimation (video) image width, i.w./16
Gamma correction (video) image width
Haar transform (media mining) image width
Median filter (image process.) image width
(from http://www.research.ibm.com/people/p/pradeep/tutor.html)
Challenges for Mobile MultimediaChallenges for Mobile Multimedia
High performance for multimedia functions
Energy and power efficiency (<1 Watt)
Small size (fit in pocket)
Low design complexity and high degree of
scalability (costs few tens of $)
A Better Mobile Multimedia MPUs: A Better Mobile Multimedia MPUs: Logic+DRAMLogic+DRAM
Embedded DRAM processors one possibility
Faster logic in DRAM process
> DRAM vendors offer faster transistors +
same number metal layers as good logic process?
@ ≈ 20% higher cost per wafer?
Called Intelligent RAM (“IRAM”) since most of transistors
will be DRAM
Leave for another presentation> “A Case for Intelligent RAM”Patterson, Anderson, …. IEEE Computer (3/97)
10000X cost-performance increase in “stationary”
computers, consolidation of industry
=> time for architecture/OS/compiler researchers declare
victory, search for new horizons?
Mobile Multimedia offer many new challenges: energy
efficiency, size, real time performance, ...
Apps/metrics of future to design computer of future!
> Suppose PDA replaces desktop as primary computer?
> Work on FPPP on PC vs. Speech on PDA?
Mobile Multimedia ConclusionMobile Multimedia Conclusion
“Personal mobile computing offers a vision of the future with a much
richer and more exciting set of architecture research challenges than
extrapolations of the current desktop architectures and benchmarks.”
“Put another way, which problem would you rather work on: improving
performance of PCs running FPPPP—a 1982 Fortran benchmark used
in SPECfp95—or making speech input practical for PDAs? “
“Direction for Computer Architecture Research”, Kozyrakis, Patterson IEEE Computer (11/98)
From the Horse MouthFrom the Horse Mouth
ReferencesReferences
IEEE Computers; Sept. 97, Jan. 98, Aug. 98, Nov.
98,
IEEE Micro: Dec. 96, Mar. 97, Sept. 97
AcknowledgementAcknowledgement
Thanks to Dr. Vishv Malhotra for lending me some
of his IEEE Computer issues.
Thanks to Prof. Sale for going through the slides
and making useful suggestions.
WAIT FOR THE NEXT TWO SLIDES
Purpose of This TalkPurpose of This Talk
To get Staff and Students excited about the new
opportunities for research.
What would you be doing as a graduate?> Service Windows NT, and if lucky perhaps UNIX?> Develop web pages?> Do more of the same?
Or rather do something really exciting?
We need you if you choose the LATTER!
50 Post Graduate Scholarship for IT up for grab