cell today and tomorrow - stanford...

30
Systems and Technology Group © 2005 IBM Corporation Cell today and tomorrow H. Peter Hofstee, Ph. D. Cell Chief Scientist and Chief Architect, Cell Synergistic Processor IBM Systems and Technology Group SCEI/Sony Toshiba IBM (STI) Design Center Austin, Texas

Upload: others

Post on 15-Feb-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation

Cell today and tomorrow

H. Peter Hofstee, Ph. D.Cell Chief Scientist andChief Architect, Cell Synergistic ProcessorIBM Systems and Technology GroupSCEI/Sony Toshiba IBM (STI) Design CenterAustin, Texas

Page 2: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation2

Acknowledgements

Cell Broadband Engine (“Cell”) is the result of a deep partnership between SCEI/Sony, Toshiba, and IBM

Cell represents the work of more than 400 people starting in 2001and a design investment of about $400M

Page 3: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation3

AgendaBasics– Performance: Power wall , Memory/Latency wall– Multicore and specialization

Cell– Asynchronous load/store (DMA)– Microarchitecture decisions

Cell Performance– Things that work really well– Things that will likely work well– Question marks

Cell SystemsFuture of Cell and things for Academia to look at

Page 4: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation4

BASICS

Page 5: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation5

Computing Paradigm ShiftToday:

– Single thread performance hitting limits• Architecture and process technology saturated• Small percentage gains expected to remain

But:– Signs of paradigm shift to application

specific system customization• Large multiple gains for specific applications• Cell

–~50x on TRE, ~100x on FFT• Datapower

–XML acceleration• Many examples in embedded markets

Future:– Greater performance demands

• Immersive Interaction–3D, real-time, gaming inspired applications–Rich media, data-intensive content

• Sensory Computing–New network tier–Autonomous agents performing intelligent analysis on streaming data

>A&D: battlefield coordination

Single Thread PerformanceSPECint

Single thread performance

growth rate slows dramatically

Historical Trend45% CGR

Page 6: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation6

Solutions

Memory wall:– More slower threads – Asynchronous loadsEfficiency wall:– More slower threads– Specialized functionPower wall:– Reduce transistor power

• operating voltage• limit oxide thickness scaling• limit channel length

– Reduce switching per function

INCREASE

CONCURRENCY:

Multi-Core

INCREASE

SPECIALIZATION:

Non-Homogeneous

Page 7: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation7

CELL

Page 8: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation8

Motivation: Cell GoalsOutstanding performance, especially on game/multimedia applications.

– Challenges: Power Wall, Frequency Wall, Memory Wall

Real time responsiveness to the user and the network.

– Challenges: Real-time in an SMP environment, Security

Applicable to a wide range of platforms.– Challenge: Maintain programmability while increasing performance

Support an introduction in 2005/6. – Challenge: Structure innovation such that 5yr. schedule can be met

Page 9: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation9

Cell ConceptCompatibility with 64b Power Architecture™– Builds on and leverages IBM investment and community

Increased efficiency and performance– Non Homogenous Coherent Chip Multiprocessor

• Allows an attack on the “Frequency Wall”– Streaming DMA architecture attacks “Memory Wall”

– High design frequency, low operating voltage attacks “Power Wall”

– Highly optimized implementation

Interface between user and networked world– Flexibility and security

– Multi-OS support, including RTOS/non-RTOS

– Architectural extensions for real-time management

Page 10: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation10

Cell Architecture is …

COHERENT BUS

Power

ISA

MMU/BIU

Power

ISA

MMU/BIU

IO

transl.Memory

Incl. coherence/memory

compatible with 32/64b Power Arch. Applications and OS’s

64b Power Architecture™

Page 11: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation11

Cell Architecture is … 64b Power Architecture™

COHERENT BUS (+RAG)

Power

ISA+RMT

MMU/BIU

+RMT

Power

ISA+RMT

MMU/BIU

+RMT

IO

transl.Memory

Plus

Memory

Flow Control (MFC)

MMU/DMA

+RMT

Local Store

Memory

MMU/DMA

+RMT

Local Store

Memory

LS Alias

LS Alias…

Page 12: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation12

Cell Architecture is … 64b Power Architecture™+ MFC

COHERENT BUS (+RAG)

Power

ISA+RMT

MMU/BIU

+RMT

Power

ISA+RMT

MMU/BIU

+RMT

IO

transl.Memory

Plus

Synergistic

Processors

MMU/DMA

+RMT

Local Store

Memory

MMU/DMA

+RMT

Local Store

Memory

LS Alias

LS Alias…

…Syn.

Proc.

ISA

Syn.

Proc.

ISA

Page 13: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation13

Asynchronous Load/Store (DMA)

THE major architectural decision in Cell– Motivated by memory wall

– Enabled by a large market

Fundamental change to programmers.– Transition from demand-fetch to software controlled

prefetch

– Bill Dally’s “plumbing project analogy”

– “Bucket brigade” analogy

Page 14: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation14

Permute UnitLoad-Store Unit

Floating-Point UnitFixed-Point Unit

Branch UnitChannel Unit

Result Forwarding and StagingRegister File

Local Store(256kB)

Single Port SRAM

128B Read 128B Write

DMA Unit

Instruction Issue Unit / Instruction Line Buffer

8 Byte/Cycle 16 Byte/Cycle 128 Byte/Cycle64 Byte/Cycle

On-Chip Coherent Bus

SPE BLOCK DIAGRAM

Page 15: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation15

Other (Micro)Architectural and Decisions

Large shared register file

Local store size tradeoffs

Dual issue, In order

Software branch prediction

Channels

Microarchitecture decisions, more so than architecture decisions

show bias towards compute-intensive codes

Page 16: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation16

Page 17: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation17

First pass hardware measurement in the Lab - Nominal Voltage = 1V

0.9 1 1.1 1.2Supply Voltage

3

3.5

4

4.5

Freq

uenc

y [G

Hz]

Fmax

Hardware Performance Measurement (85°C)

250M transistors … 235mm2

Top frequency >4GHz– Lab conditions

– Most efficient at ~1V

> 200 GFlops (SP) @3.2GHz

> 20 GFlops (DP) @3.2GHz

Up to 25.6 GB/s memory B/W

Up to 70+ GB/s I/O B/W– Practical ~ 50GB/s

100+ simultaneous bus transactions– 16+8 entry DMA queue per SPE

CELL PROCESSOR STATISTICS

Page 18: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation18

CELL PERFORMANCE(AND PROGRAMMING)

Page 19: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation19

Things that work extremely well today ( up to 100x)Problem can be re-codedPredictable non-trivial memory access pattern– Can build scatter-gather lists

Problem can benefit from SIMDFocus on 32b float, or <=32b integer

Examples:– FFTw ( best result about 100GFlops )– Terrain Rendering Engine– Volume rendering

Typical code is double-buffered gather-compute-scatter

Page 20: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation20

Things that work well today ( about 10-20x)

Compute bound codesSmall enough to be rewrittenMain datatype is 32b float or <= 32b IntBenefits from SIMD

Examples:– Crypto codes ( RSA, SHA, DES, etc. etc. etc.)– Media codes ( MPEG 2, MPEG 4, H.264, JPEG )– … many many others …

Page 21: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation21

Things likely to work well

Library .. Device/API based applications– Graphics and physics and sound and …

Scientific codes … library based– No rewrite

– If granularity is ok

Page 22: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation22

Question marks

Can a compiler based approach, without restructuring code specifically for the SPEs result in a chip-level advantage?– About 3-4x more SPEs in same area or power

– But, have to compiler manage local store

Interesting benchmarks: SpecFP, MediaBench, EEMBC, etc.– New more explicitly parallel benchmarks?

Would you ever use an SPE for a SpecInt-type workload?

Page 23: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation23

Cell based systems

Page 24: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation24

Cell Processor Isn't Just for Games.

Innovative Chip is best high-performance embedded processor of 2005

We chose the Cell BE as the best high-performance embedded processor of 2005 because of its innovative design and future potential....Even if the Cell BE accumulates no more design wins, the PlayStation 3 could drive sales to nearly 100 million units over the likely five-year lifespan of the console. That would make the Cell BE one of the most successful microprocessors in history.

“…Cell could power hundreds of new apps, create a new video-processing industry and fuel a multibillion-dollar build out of tech hardware over ten years.”-- Forbes

“It was originally conceived as the microprocessor to power Sony's [PS3], but it is expected to find a home in lots of other broadband-connected consumer items and in servers too.”-- IEEE Spectrum

Page 25: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation25

Cell BE based Systems: SCEI, Mercury, … and IBM!

Page 26: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation26

Toshiba Announces Cell Chip Set and Cell Reference Set

20 September, 2005

Tokyo--Toshiba Corporation today took major steps toward creating a comprehensive development environment for applications based on the Cell microprocessor with the announcement of a Cell Chip Set consisting of the new microprocessor and key peripheral chips, and a Cell Reference Set development platform. The chip set and the reference set will support development of digital consumer products and communication equipment that draw on the powerful broadband capabilities of the Cell microprocessor. "Software developers and other customers will be eager to make full use of Cell's unsurpassed multitasking and real-time processing functions," said Tomotaka Saito, General Manager of Broadband System LSI Division, Toshiba Corporation Semiconductor Company. "The Cell Chip Set and Reference Set will support them in developing products and applications that reach new levels of performance and excitement." The Cell Chip Set is composed of the Cell processor, a Super Companion Chip—the interface between Cell and external audio/visual input/output equipment—and a power supply system chip optimized to drive the Cell microprocessor. The Cell Reference Set development platform consists of a Cell microprocessor, peripheral chips mounted on a printed circuit board with a general-use interface, peripheral equipment, such as DVD and HDD drives, and cooling equipment required for stable operation, all housed in case. The available software includes operating systems and middleware and software development tools. This combination of hardware and software reduces development costs, cuts turnaround time and simplifies testing. Toshiba expects to start marketing the chips set and reference set in April 2006 or later, once it has assured supply of the component chips and all related documentation. Toshiba Corporation will showcase the Cell Chip Set and Cell Reference Set, and demonstrate digital media applications on the Cell Reference Set at the Toshiba booth of CEATEC JAPAN 2005, from October 4 to October 8 at Makuhari Messe. Outlines of Cell Chip Set and Cell Reference Set: Cell Chip Set: Cell microprocessor: Next generation microprocessor jointly developed by IBM, Sony Group and Toshiba. Adopts a multi-core architecture and offers super high-speed data transfer capability. The processor is expected to find application in equipment handling data-rich media applications. Super Companion Chip: Cell's peripheral LSI, which houses audio and image interfaces supporting Cell's super high-speed data transfer capability. The chip also supports a group of interfaces for various systems (video, audio input/output, digital AV interface, IEEE1394, digital tuner interface) and a group of interfaces that make it easier to connect standard input/output devices (standard bus interface, high speed network interface and storage device interface.) Highly efficient power supply system: The supply system is optimized to drive the Cell processor. Includes controller LSI, TB6814FLG, which makes it possible to offer high-speed response and high-accuracy required by Cell. Includes multi-chip module, TB7003FL, which embeds power device in a small 8mm x 8mm package. Realizes small, high-power and high-efficient power supply system which has 4 phases of 1MHz high-speed switches. Cell Reference Set:

Development platform for Cell-based, next generation digital consumer products, High-speed multi-bit wiring technology and wide variety of interfaces that supports broadband system architecture Linux and ITRON are both provided on the hypervisor OS that manages hardware resources. This approach facilitates the reuse of application property. A comprehensive development environment including the Eclipse framework based editor, compiler, debugger, and performance monitor. An audio-visual application model includes simultaneous multiple digital and analog broadcast television reception, recording and playback.

SOURCE: TOSHIBA

Page 27: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation27

Future of Cell andThings for Academia to look at

Page 28: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation28

User Interaction Drives Innovation in Computing

TimePunch Cards

Green Screen/Teletype

Spreadsheet

WWW

Gaming

Main FrameMultitasking

Main FrameBatch

Client/ServerInternet

Mini-ComputerWYSIWYG

Stand Alone PCWindows

Word Processing

Leve

l of I

nter

actio

n

Immersive InteractionOnline Gaming

Source: J.A. Kahle

Page 29: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation29

Characteristics of the Latest Transition in User Interaction

WindowsClick and wait…Client-centricUser data accessible from client onlyDevice-centricConnectedWired, sporadicE-mail/newsgroups

Immersive, 3D interactivityReal-timeDistributed User data accessible everywhereDevice-agnosticCollaborativeWireless, always-onText messaging/blogs

Page 30: Cell today and tomorrow - Stanford Universityweb.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf · PlayStation 3 could drive sales to nearly 100 million units over the likely

Systems and Technology Group

© 2005 IBM Corporation30

Some things for Academia to look at

Specialization in computer architectures– Beyond OS/Application, what specialization makes sense in

a general-(enough) purpose chip/system multiprocessor?

Programming paradigms and compilation techniques to deal with memory wall

New types of applications (often real-time) made possible by a dramatic jump in performance– E.g. gesture and emotion recognition