cellbe architecture
DESCRIPTION
ps3 architectureTRANSCRIPT
-
Cell Broadband EngineProgramming HandbookIncluding the PowerXCell 8i Processor
Version 1.12
April 3, 2009
Title Page
-
Copyright and Disclaimer Copyright International Business Machines Corporation, Sony Computer Entertainment Inc., Toshiba Corporation 2006, 2009.
All Rights ReservedPrinted in the United States of America April 2009
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occur-rence in this information with a trademark symbol ( or ), these symbols indicate U.S. registered or common law trade-marks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at Copyright and trademark infor-mation at www.ibm.com/legal/copytrade.shtml
Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.
Intel is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other company, product, and service names may be trademarks or service marks of others.
All information contained in this document is subject to change without notice. The products described in this document are NOT intended for use in applications such as implantation, life support, or other hazardous uses where malfunction could result in death, bodily injury, or catastrophic property damage. The information contained in this document does not affect or change IBM product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of IBM or third parties. All information contained in this docu-ment was obtained in specific environments, and is presented as an illustration. The results obtained in other operating environments may vary.
THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN AS IS BASIS. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this document.
IBM Systems and Technology Group2070 Route 52, Bldg. 330Hopewell Junction, NY 12533-6351
The IBM home page can be found at ibm.com.The IBM semiconductor solutions home page can be found at ibm.com/chips.
Version 1.12April 3, 2009
-
Programming Handbook
Cell Broadband Engine
Version 1.12April 3, 2009
ContentsPage 3 of 876
Contents
List of Figures ............................................................................................................... 19
List of Tables ................................................................................................................. 23
Preface ........................................................................................................................... 29Related Publications ............................................................................................................................. 29Conventions and Notation ..................................................................................................................... 30Referencing Registers, Fields, and Bit Ranges .................................................................................... 31Terminology .......................................................................................................................................... 32Reserved Regions of Memory and Registers ....................................................................................... 32
Revision Log ................................................................................................................. 33
1. Overview of CBEA Processors ................................................................................ 451.1 Background ..................................................................................................................................... 46
1.1.1 Motivation .............................................................................................................................. 461.1.2 Power, Memory, and Frequency ........................................................................................... 481.1.3 Scope of this Handbook ........................................................................................................ 48
1.2 Hardware Environment ................................................................................................................... 491.2.1 The Processor Elements ....................................................................................................... 491.2.2 Element Interconnect Bus ..................................................................................................... 501.2.3 Memory Interface Controller .................................................................................................. 501.2.4 Cell Broadband Engine Interface Unit ................................................................................... 51
1.3 Programming Environment ............................................................................................................. 521.3.1 Instruction Sets ...................................................................................................................... 521.3.2 Storage Domains and Interfaces ........................................................................................... 521.3.3 Byte Ordering and Bit Numbering .......................................................................................... 541.3.4 Runtime Environment ............................................................................................................ 55
2. PowerPC Processor Element ................................................................................... 572.1 PowerPC Processor Unit ................................................................................................................ 582.2 PowerPC Processor Storage Subsystem ....................................................................................... 602.3 PPE Registers ................................................................................................................................. 602.4 PowerPC Instructions ...................................................................................................................... 63
2.4.1 Data Types ............................................................................................................................ 632.4.2 Addressing Modes ................................................................................................................. 632.4.3 Instructions ............................................................................................................................ 64
2.5 Vector/SIMD Multimedia Extension Instructions ............................................................................. 652.5.1 SIMD Vectorization ................................................................................................................ 652.5.2 Data Types ............................................................................................................................ 672.5.3 Addressing Modes ................................................................................................................. 672.5.4 Instruction Types ................................................................................................................... 682.5.5 Instructions ............................................................................................................................ 682.5.6 Graphics Rounding Mode ...................................................................................................... 68
-
Programming Handbook
Cell Broadband Engine
ContentsPage 4 of 876
Version 1.12April 3, 2009
2.6 Vector/SIMD Multimedia Extension C/C++ Language Intrinsics ..................................................... 682.6.1 Vector Data Types ................................................................................................................. 692.6.2 Vector Literals ........................................................................................................................ 692.6.3 Intrinsics ................................................................................................................................. 69
3. Synergistic Processor Elements .............................................................................. 713.1 Synergistic Processor Unit .............................................................................................................. 71
3.1.1 Local Storage ......................................................................................................................... 723.1.2 Register File ........................................................................................................................... 753.1.3 Execution Units ...................................................................................................................... 763.1.4 Floating-Point Support ........................................................................................................... 76
3.2 Memory Flow Controller .................................................................................................................. 783.2.1 Channels ................................................................................................................................ 803.2.2 Mailboxes and Signalling ....................................................................................................... 803.2.3 MFC Commands and Command Queues .............................................................................. 803.2.4 Direct Memory Access Controller .......................................................................................... 813.2.5 Synergistic Memory Management Unit .................................................................................. 82
3.3 SPU Instruction Set ......................................................................................................................... 823.3.1 Data Types ............................................................................................................................. 823.3.2 Instructions ............................................................................................................................. 83
3.4 SPU C/C++ Language Intrinsics ..................................................................................................... 833.4.1 Vector Data Types ................................................................................................................. 843.4.2 Vector Literals ........................................................................................................................ 843.4.3 Intrinsics ................................................................................................................................. 84
3.5 SPE Isolation Mode ......................................................................................................................... 84
4. Virtual Storage Environment .................................................................................... 874.1 Introduction ...................................................................................................................................... 874.2 PPE Memory Management ............................................................................................................. 88
4.2.1 Memory Management Unit ..................................................................................................... 894.2.2 Address-Translation Sequence .............................................................................................. 904.2.3 Enabling Address Translation ................................................................................................ 914.2.4 Effective-to-Real-Address Translation ................................................................................... 914.2.5 Segmentation ......................................................................................................................... 934.2.6 Paging .................................................................................................................................... 954.2.7 Translation Lookaside Buffer ............................................................................................... 1004.2.8 Real Addressing Mode ......................................................................................................... 1084.2.9 Effective Addresses in 32-Bit Mode ..................................................................................... 111
4.3 SPE Memory Management ........................................................................................................... 1114.3.1 Synergistic Memory Management Unit ................................................................................ 1114.3.2 Enabling Address Translation .............................................................................................. 1124.3.3 Segmentation ....................................................................................................................... 1134.3.4 Paging .................................................................................................................................. 1164.3.5 Translation Lookaside Buffer ............................................................................................... 1164.3.6 Real Addressing Mode ......................................................................................................... 1254.3.7 Exception Handling and Storage Protection ........................................................................ 126
-
Programming Handbook
Cell Broadband Engine
Version 1.12April 3, 2009
ContentsPage 5 of 876
5. Memory Map ............................................................................................................ 1295.1 Introduction ................................................................................................................................... 129
5.1.1 Configuration-Ring Initialization ........................................................................................... 1315.1.2 Allocated Regions of Memory .............................................................................................. 1315.1.3 Reserved Regions of Memory ............................................................................................. 1345.1.4 The Guarded Attribute ......................................................................................................... 134
5.2 PPE Memory Map ......................................................................................................................... 1345.2.1 PPE Memory-Mapped Registers ......................................................................................... 1345.2.2 Predefined Real-Address Locations .................................................................................... 135
5.3 SPE Memory Map ......................................................................................................................... 1355.3.1 SPE Local-Storage Memory Map ........................................................................................ 1365.3.2 SPE Memory-Mapped Registers ......................................................................................... 137
5.4 BEI Memory-Mapped Registers .................................................................................................... 1385.4.1 I/O ........................................................................................................................................ 139
6. Cache Management ................................................................................................ 1416.1 PPE Caches .................................................................................................................................. 141
6.1.1 Configuration ....................................................................................................................... 1426.1.2 Overview of PPE Cache ...................................................................................................... 1426.1.3 L1 Caches ........................................................................................................................... 1446.1.4 Branch History Table and Link Stack .................................................................................. 1496.1.5 L2 Cache ............................................................................................................................. 1496.1.6 Instructions for Managing the L1 and L2 Caches ................................................................ 1546.1.7 Effective-to-Real-Address Translation Arrays ..................................................................... 1576.1.8 Translation Lookaside Buffer ............................................................................................... 1576.1.9 Instruction-Prefetch Queue Management ............................................................................ 1586.1.10 Load Subunit Management ............................................................................................... 158
6.2 SPE Caches .................................................................................................................................. 1586.2.1 Translation Lookaside Buffer ............................................................................................... 1596.2.2 Atomic Unit and Cache ........................................................................................................ 159
6.3 Replacement Management Tables ............................................................................................... 1626.3.1 PPE TLB Replacement Management Table ........................................................................ 1626.3.2 PPE L2 Replacement Management Table .......................................................................... 1656.3.3 SPE TLB Replacement Management Table ........................................................................ 166
6.4 I/O Address-Translation Caches ................................................................................................... 167
7. I/O Architecture ....................................................................................................... 1697.1 Overview ....................................................................................................................................... 169
7.1.1 I/O Interfaces ....................................................................................................................... 1697.1.2 System Configurations ........................................................................................................ 1707.1.3 I/O Addressing ..................................................................................................................... 172
7.2 Data and Access Types ................................................................................................................ 1737.2.1 Data Lengths and Alignments ............................................................................................. 1737.2.2 Atomic Accesses ................................................................................................................. 174
7.3 Registers and Data Structures ...................................................................................................... 1747.3.1 IOCmd Configuration Register ............................................................................................ 1747.3.2 I/O Segment Table Origin Register ..................................................................................... 1747.3.3 I/O Segment Table .............................................................................................................. 1777.3.4 I/O Page Table .................................................................................................................... 179
-
Programming Handbook
Cell Broadband Engine
ContentsPage 6 of 876
Version 1.12April 3, 2009
7.3.5 IOC Base Address Registers ............................................................................................... 1827.3.6 I/O Exception Status Register .............................................................................................. 184
7.4 Inbound I/O Address Translation ................................................................................................... 1847.4.1 Translation Overview ........................................................................................................... 1847.4.2 Translation Steps ................................................................................................................. 186
7.5 I/O Exceptions ............................................................................................................................... 1887.5.1 I/O Exception Causes .......................................................................................................... 1887.5.2 I/O Exception Status Register .............................................................................................. 1897.5.3 I/O Exception Mask Register ............................................................................................... 1897.5.4 I/O-Exception Response ...................................................................................................... 189
7.6 I/O Address-Translation Caches ................................................................................................... 1897.6.1 IOST Cache ......................................................................................................................... 1897.6.2 IOPT Cache ......................................................................................................................... 191
7.7 I/O Storage Model ......................................................................................................................... 1967.7.1 Memory Coherence ............................................................................................................. 1967.7.2 Storage-Access Ordering ..................................................................................................... 1977.7.3 I/O Accesses to Other I/O Units through an IOIF ................................................................. 2027.7.4 Examples ............................................................................................................................. 202
8. Resource Allocation Management ......................................................................... 2098.1 Introduction .................................................................................................................................... 2098.2 Requesters .................................................................................................................................... 212
8.2.1 PPE and SPEs ..................................................................................................................... 2128.2.2 I/O ........................................................................................................................................ 212
8.3 Managed Resources ..................................................................................................................... 2138.4 Tokens ........................................................................................................................................... 214
8.4.1 Tokens Required for Single-CBEA-Processor Systems ...................................................... 2148.4.2 Operations Requiring No Token .......................................................................................... 2188.4.3 Tokens Required for Multi-CBEA-Processor Systems ......................................................... 219
8.5 Token Manager ............................................................................................................................. 2198.5.1 Request Tracking ................................................................................................................. 2198.5.2 Token Granting .................................................................................................................... 2208.5.3 Unallocated RAG ................................................................................................................. 2218.5.4 High-Priority Token Requests .............................................................................................. 2228.5.5 Memory Tokens ................................................................................................................... 2228.5.6 I/O Tokens ........................................................................................................................... 2268.5.7 Unused Tokens .................................................................................................................... 2268.5.8 Memory Banks, IOIF Allocation Rates, and Unused Tokens ............................................... 2268.5.9 Token Request and Grant Example ..................................................................................... 2278.5.10 Allocation Percentages ...................................................................................................... 2318.5.11 Efficient Determination of TKM Priority Register Values .................................................... 2328.5.12 Feedback from Resources to Token Manager ................................................................... 234
8.6 Configuration of PPE, SPEs, MIC, and IOC .................................................................................. 2358.6.1 Configuration Register Summary ......................................................................................... 2358.6.2 SPE Address-Range Checking ............................................................................................ 237
8.7 Changing Resource-Management Registers with MMIO Stores ................................................... 2398.7.1 Changes to the RAID ........................................................................................................... 2398.7.2 Changing a Requesters Token-Request Enable ................................................................. 2408.7.3 Changing a Requesters Address Map ................................................................................ 241
-
Programming Handbook
Cell Broadband Engine
Version 1.12April 3, 2009
ContentsPage 7 of 876
8.7.4 Changing a Requesters Use of Multiple Tokens per Access .............................................. 2428.7.5 Changing Feedback to the TKM .......................................................................................... 2428.7.6 Changing TKM Registers .................................................................................................... 242
8.8 Latency Between Token Requests and Token Grants .................................................................. 2438.9 Hypervisor Interfaces .................................................................................................................... 243
9. PPE Interrupts ......................................................................................................... 2459.1 Introduction ................................................................................................................................... 2459.2 Summary of Interrupt Architecture ................................................................................................ 2469.3 Interrupt Registers ......................................................................................................................... 2509.4 Interrupt Handling .......................................................................................................................... 2519.5 Interrupt Vectors and Definitions ................................................................................................... 252
9.5.1 System Reset Interrupt (Selectable or x00..00000100) ..................................................... 2549.5.2 Machine Check Interrupt (x00..00000200) ......................................................................... 2559.5.3 Data Storage Interrupt (x00..00000300) ............................................................................ 2579.5.4 Data Segment Interrupt (x00..00000380) .......................................................................... 2589.5.5 Instruction Storage Interrupt (x00..00000400) ................................................................... 2599.5.6 Instruction Segment Interrupt (x00..00000480) ................................................................. 2609.5.7 External Interrupt (x00..00000500) .................................................................................... 2609.5.8 Alignment Interrupt (x00..00000600) ................................................................................. 2619.5.9 Program Interrupt (x00..00000700) .................................................................................... 2629.5.10 Floating-Point Unavailable Interrupt (x00..00000800) ..................................................... 2639.5.11 Decrementer Interrupt (x00..00000900) ........................................................................... 2639.5.12 Hypervisor Decrementer Interrupt (x00..00000980) ........................................................ 2649.5.13 System Call Interrupt (x00..00000C00) ............................................................................ 2649.5.14 Trace Interrupt (x00..00000D00) ...................................................................................... 2659.5.15 VXU Unavailable Interrupt (x00..00000F20) .................................................................... 2669.5.16 System Error Interrupt (x00..00001200) .......................................................................... 2669.5.17 Maintenance Interrupt (x00..00001600) ........................................................................... 2679.5.18 Thermal Management Interrupt (x00..00001800) ............................................................ 269
9.6 Direct External Interrupts .............................................................................................................. 2719.6.1 Interrupt Presentation .......................................................................................................... 2719.6.2 IIC Interrupt Registers ......................................................................................................... 2729.6.3 SPU and MFC Interrupts ..................................................................................................... 2779.6.4 Other External Interrupts ..................................................................................................... 278
9.7 Mediated External Interrupts ......................................................................................................... 2839.8 SPU and MFC Interrupts Routed to the PPE ................................................................................ 284
9.8.1 Interrupt Types and Classes ................................................................................................ 2849.8.2 Interrupt Registers ............................................................................................................... 2869.8.3 Interrupt Definitions ............................................................................................................. 2909.8.4 Handling SPU and MFC Interrupts ...................................................................................... 292
9.9 Thread Targets for Interrupts ........................................................................................................ 2949.10 Interrupt Priorities ........................................................................................................................ 2959.11 Interrupt Latencies ...................................................................................................................... 2969.12 Machine State Register Settings Due to Interrupts ..................................................................... 2979.13 Interrupts and Hypervisor ............................................................................................................ 2989.14 Interrupts and Multithreading ...................................................................................................... 2989.15 Checkstop ................................................................................................................................... 298
-
Programming Handbook
Cell Broadband Engine
ContentsPage 8 of 876
Version 1.12April 3, 2009
9.16 Use of an External Interrupt Controller ........................................................................................ 2999.17 Relationship Between CBEA Processor and PowerPC Interrupts .............................................. 299
10. PPE Multithreading ................................................................................................ 30110.1 Multithreading Guidelines ............................................................................................................ 30110.2 Thread Resources ....................................................................................................................... 303
10.2.1 Registers ............................................................................................................................ 30310.2.2 Arrays, Queues, and Other Structures ............................................................................... 30410.2.3 Pipeline Sharing and Support for Multithreading ............................................................... 305
10.3 Thread States .............................................................................................................................. 30710.3.1 Privilege States .................................................................................................................. 30710.3.2 Suspended or Enabled State ............................................................................................. 30810.3.3 Blocked or Stalled State ..................................................................................................... 308
10.4 Thread Control and Status Registers .......................................................................................... 30810.4.1 Machine State Register (MSR) .......................................................................................... 30910.4.2 Hardware Implementation Register 0 (HID0) ..................................................................... 31010.4.3 Logical Partition Control Register (LPCR) ......................................................................... 31110.4.4 Control Register (CTRL) .................................................................................................... 31210.4.5 Thread Status Register Local and Remote (TSRL and TSRR) ......................................... 31310.4.6 Thread Switch Control Register (TSCR) ............................................................................ 31410.4.7 Thread Switch Time-Out Register (TTR) ........................................................................... 315
10.5 Thread Priority ............................................................................................................................. 31510.5.1 Thread-Priority Combinations ............................................................................................ 31510.5.2 Choosing Useful Thread Priorities ..................................................................................... 31610.5.3 Examples of Priority Combinations on Instruction Scheduling ........................................... 318
10.6 Thread Control and Configuration ............................................................................................... 32110.6.1 Resuming and Suspending Threads .................................................................................. 32110.6.2 Setting the Instruction-Dispatch Policy: Thread Priority and Temporary Stalling ............... 32110.6.3 Preventing Starvation: Forward-Progress Monitoring ........................................................ 32310.6.4 Multithreading Operating-State Switch .............................................................................. 324
10.7 Pipeline Events and Instruction Dispatch .................................................................................... 32410.7.1 Instruction-Dispatch Rules ................................................................................................. 32410.7.2 Pipeline Events that Stall Instruction Dispatch ................................................................... 325
10.8 Suspending and Resuming Threads ........................................................................................... 32710.8.1 Suspending a Thread ......................................................................................................... 32710.8.2 Resuming a Thread ........................................................................................................... 32710.8.3 Exception and Interrupt Interactions With a Suspended Thread ....................................... 32910.8.4 Thread Targets and Behavior for Interrupts ....................................................................... 330
11. Logical Partitions and a Hypervisor .................................................................... 33311.1 Introduction .................................................................................................................................. 333
11.1.1 The Hypervisor and the Operating Systems ...................................................................... 33411.1.2 Partitioning Resources ....................................................................................................... 33411.1.3 An Example Flowchart ....................................................................................................... 335
11.2 PPE Logical-Partitioning Facilities ............................................................................................... 33711.2.1 Enabling Hypervisor State ................................................................................................. 33711.2.2 Hypervisor-State Registers ................................................................................................ 33711.2.3 Real Memory Access Control ............................................................................................ 33811.2.4 Controlling Interrupts and Environment ............................................................................. 344
-
Programming Handbook
Cell Broadband Engine
Version 1.12April 3, 2009
ContentsPage 9 of 876
11.3 SPE Logical-Partitioning Facilities .............................................................................................. 34711.3.1 Access Privilege ................................................................................................................ 34711.3.2 Memory-Management Facilities ........................................................................................ 34711.3.3 Controlling Interrupts ......................................................................................................... 35011.3.4 Other SPE Management Facilities .................................................................................... 350
11.4 I/O Address Translation .............................................................................................................. 35211.4.1 IOC Memory Management Units ....................................................................................... 35211.4.2 I/O Segment and Page Tables .......................................................................................... 352
11.5 Resource Allocation Management .............................................................................................. 35311.5.1 Combining Logical Partitions with Resource Allocation ..................................................... 35311.5.2 Resource Allocation Groups and the Token Manager ....................................................... 353
11.6 Power Management .................................................................................................................... 35411.6.1 Entering Low-Power States ............................................................................................... 35411.6.2 Thread State Suspension and Resumption ....................................................................... 354
11.7 Fault Isolation .............................................................................................................................. 35511.8 Code Sample .............................................................................................................................. 355
11.8.1 Error Codes ....................................................................................................................... 35511.8.2 C Functions for PowerPC 64-bit ELF Hypervisor Call ....................................................... 356
12. SPE Context Switching ........................................................................................ 35912.1 Introduction ................................................................................................................................. 35912.2 Data Structures ........................................................................................................................... 360
12.2.1 Local Storage Context Save Area ..................................................................................... 36012.2.2 Context Save Area ............................................................................................................ 360
12.3 Overview of SPE Context-Switch Sequence ............................................................................... 36012.3.1 Save SPE Context ............................................................................................................. 36212.3.2 Restore SPE Context ........................................................................................................ 362
12.4 Implementation Considerations ................................................................................................... 36412.4.1 Locking .............................................................................................................................. 36412.4.2 Watchdog Timers .............................................................................................................. 36412.4.3 Waiting for Events ............................................................................................................. 36412.4.4 PPEs SPU Channel Access Facility ................................................................................. 36412.4.5 SPE Interrupts ................................................................................................................... 36412.4.6 Suspending the MFC DMA Queue .................................................................................... 36512.4.7 SPE Context-Save Sequence and Context-Restore Sequence Code .............................. 36512.4.8 SPE Parameter Passing .................................................................................................... 36512.4.9 Storage for SPE Context-Save Sequence and Context-Restore Sequence Code ............ 36512.4.10 Harvesting an SPE .......................................................................................................... 36612.4.11 Scheduling ....................................................................................................................... 36612.4.12 Light-Weight SPE Context Save ...................................................................................... 366
12.5 Detailed Steps for SPE Context Switch ...................................................................................... 36712.5.1 Context-Save Sequence .................................................................................................... 36712.5.2 Context-Restore Sequence ............................................................................................... 373
12.6 Considerations for Hypervisors ................................................................................................... 381
13. Time Base and Decrementers .............................................................................. 38313.1 Introduction ................................................................................................................................. 383
-
Programming Handbook
Cell Broadband Engine
ContentsPage 10 of 876
Version 1.12April 3, 2009
13.2 Time-Base Facility ....................................................................................................................... 38313.2.1 Clock Domains ................................................................................................................... 38313.2.2 Time-Base Registers ......................................................................................................... 38413.2.3 Time-Base Frequency ........................................................................................................ 38513.2.4 Time-Base Sync Mode Controls ........................................................................................ 38613.2.5 Reading and Writing the TB Register ................................................................................ 39013.2.6 Computing Time-of-Day ..................................................................................................... 391
13.3 Decrementers .............................................................................................................................. 39113.3.1 PPE Decrementers ............................................................................................................ 39113.3.2 SPE Decrementers ............................................................................................................ 39313.3.3 Using an SPU Decrementer to Monitor SPU Code Performance ...................................... 393
14. Objects, Executables, and SPE Loading ............................................................. 39914.1 Introduction .................................................................................................................................. 39914.2 ELF Overview and Extensions .................................................................................................... 400
14.2.1 Overview ............................................................................................................................ 40014.2.2 SPE-ELF Extensions ......................................................................................................... 401
14.3 Runtime Initializations and Requirements ................................................................................... 40314.3.1 PPE Initial Machine State .................................................................................................. 40314.3.2 SPE Initial Machine State for Linux .................................................................................... 407
14.4 Linker Requirements ................................................................................................................... 40914.4.1 SPE Linker Requirements .................................................................................................. 40914.4.2 PPE Linker Requirements .................................................................................................. 410
14.5 The CESOF Format .................................................................................................................... 41014.5.1 CESOF Overview ............................................................................................................... 41114.5.2 CESOF Use Convention of ELF ........................................................................................ 41114.5.3 Embedding an SPE-ELF Executable in a PPE-ELF Object: The .spu.elf Section ............. 41214.5.4 The spe_program_handle Data Structure .......................................................................... 41314.5.5 The TOE: Accessing Symbol Values Defined in EA Space ............................................... 41514.5.6 Future Software Tool Chain Enhancements for CESOF ................................................... 419
14.6 SPE Runtime Loader ................................................................................................................... 42014.6.1 Runtime Loader Overview ................................................................................................. 42014.6.2 SPE Runtime Loader Requirements .................................................................................. 42114.6.3 Example SPE Runtime Loader Framework Definition ....................................................... 423
14.7 SPE Execution Environment ....................................................................................................... 42914.7.1 Signal Types for the SPE Stop-and-Signal Instruction ...................................................... 429
15. Power and Thermal Management ........................................................................ 43115.1 Power Management .................................................................................................................... 431
15.1.1 Slow State .......................................................................................................................... 43215.1.2 PPE Pause (0) State .......................................................................................................... 43315.1.3 SPU Pause State ............................................................................................................... 43415.1.4 MFC Pause State ............................................................................................................... 434
15.2 Thermal Management ................................................................................................................. 43415.2.1 Thermal-Management Operation ....................................................................................... 43515.2.2 Configuration-Ring Settings ............................................................................................... 43715.2.3 Thermal Registers .............................................................................................................. 43715.2.4 Thermal Sensor Status Registers ...................................................................................... 437
-
Programming Handbook
Cell Broadband Engine
Version 1.12April 3, 2009
ContentsPage 11 of 876
15.2.5 Thermal Sensor Interrupt Registers .................................................................................. 43815.2.6 Dynamic Thermal-Management Registers ........................................................................ 440
16. Performance Monitoring ...................................................................................... 44516.1 How It Works ............................................................................................................................... 44616.2 Events (Signals) .......................................................................................................................... 44616.3 Performance Counters ................................................................................................................ 44616.4 Trace Array ................................................................................................................................. 447
17. SPE Channel and Related MMIO Interface ......................................................... 44917.1 Introduction ................................................................................................................................. 449
17.1.1 An SPEs Use of its Own Channels ................................................................................... 44917.1.2 Access to Channel Functions by the PPE and other SPEs ............................................... 45017.1.3 Channel Characteristics .................................................................................................... 45017.1.4 Channel Summary ............................................................................................................. 45117.1.5 Channel Instructions .......................................................................................................... 45417.1.6 Channel Capacity and Blocking ......................................................................................... 455
17.2 SPU Event-Management Channels ............................................................................................ 45517.3 SPU Signal-Notification Channels ............................................................................................... 45617.4 SPU Decrementer ....................................................................................................................... 456
17.4.1 SPU Write Decrementer Channel ...................................................................................... 45617.4.2 SPU Read Decrementer Channel ..................................................................................... 457
17.5 MFC Write Multisource Synchronization Request Channel ........................................................ 45717.6 SPU Read Machine Status Channel ........................................................................................... 45817.7 SPU Write State Save-and-Restore Channel ............................................................................. 45817.8 SPU Read State Save-and-Restore Channel ............................................................................. 45917.9 MFC Command Parameter Channels ......................................................................................... 459
17.9.1 MFC Local Storage Address Channel ............................................................................... 46117.9.2 MFC Effective Address High Channel ............................................................................... 46217.9.3 MFC Effective Address Low or List Address Channel ....................................................... 46217.9.4 MFC Transfer Size or List Size Channel ........................................................................... 46317.9.5 MFC Command Tag Identification Channel ...................................................................... 46417.9.6 MFC Class ID and MFC Command Opcode Channel ....................................................... 465
17.10 MFC Tag-Group Management Channels .................................................................................. 46517.10.1 MFC Write Tag-Group Query Mask Channel .................................................................. 46617.10.2 MFC Read Tag-Group Query Mask Channel .................................................................. 46617.10.3 MFC Write Tag Status Update Request Channel ............................................................ 46617.10.4 MFC Read Tag-Group Status Channel ........................................................................... 46817.10.5 MFC Read List Stall-and-Notify Tag Status Channel ...................................................... 46817.10.6 MFC Write List Stall-and-Notify Tag Acknowledgment Channel ..................................... 469
17.11 MFC Read Atomic Command Status Channel .......................................................................... 47017.12 SPU Mailbox Channels ............................................................................................................. 471
18. SPE Events ............................................................................................................ 47318.1 Introduction ................................................................................................................................. 47318.2 Events and Event-Management Channels .................................................................................. 474
18.2.1 Event Conditions and Bit Definitions for Event-Management Channels ............................ 47418.2.2 Pending Event Register (Internal, SPE-Hidden) ................................................................ 476
-
Programming Handbook
Cell Broadband Engine
ContentsPage 12 of 876
Version 1.12April 3, 2009
18.2.3 SPU Read Event Status ..................................................................................................... 47618.2.4 SPU Write Event Mask ...................................................................................................... 47718.2.5 SPU Write Event Acknowledgment .................................................................................... 47718.2.6 SPU Read Event Mask ...................................................................................................... 478
18.3 SPU Interrupt Facility .................................................................................................................. 47818.4 Interrupt Address Save-and-Restore Channels .......................................................................... 479
18.4.1 SPU Read State Save-and-Restore .................................................................................. 47918.4.2 SPU Write State Save-and-Restore ................................................................................... 47918.4.3 Nested Interrupts Using SPU Write State Save-and-Restore ............................................ 480
18.5 Event-Handling Protocols ............................................................................................................ 48018.5.1 Synchronous Event Handling Using Polling or Stalling ...................................................... 48018.5.2 Asynchronous Event Handling Using Interrupts ................................................................ 48118.5.3 Protecting Critical Sections from Interruption ..................................................................... 482
18.6 Event-Specific Handling Guidelines ............................................................................................ 48318.6.1 Protocol with Multiple Events Enabled ............................................................................... 48318.6.2 Procedure for Handling the Multisource Synchronization Event ........................................ 48518.6.3 Procedure for Handling the Privileged Attention Event ...................................................... 48618.6.4 Procedure for Handling the Lock-Line Reservation Lost Event ......................................... 48718.6.5 Procedure for Handling the Signal-Notification 1 Available Event ..................................... 48818.6.6 Procedure for Handling the Signal-Notification 2 Available Event ..................................... 48918.6.7 Procedure for Handling the SPU Write Outbound Mailbox Available Event ...................... 49018.6.8 Procedure for Handling the SPU Write Outbound Interrupt Mailbox Available Event ........ 49118.6.9 Procedure for Handling the SPU Decrementer Event ........................................................ 49118.6.10 Procedure for Handling the SPU Read Inbound Mailbox Available Event ....................... 49318.6.11 Procedure for Handling the MFC SPU Command Queue Available Event ...................... 49418.6.12 Procedure for Handling the DMA List Command Stall-and-Notify Event ......................... 49418.6.13 Procedure for Handling the Tag-Group Status Update Event .......................................... 496
18.7 Developing a Basic Interrupt Handler .......................................................................................... 49718.7.1 Basic Interrupt Protocol Features and Design ................................................................... 49718.7.2 FLIH Design ....................................................................................................................... 49818.7.3 SLIH Design and Registering SLIH Functions ................................................................... 50018.7.4 Example Application Code ................................................................................................. 502
18.8 Nested Interrupt Handling ........................................................................................................... 50318.8.1 Nested Handler Design ...................................................................................................... 50418.8.2 FLIH Design for Nested Interrupts ..................................................................................... 504
18.9 Using a Dedicated Interrupt Stack ............................................................................................... 50618.10 Sample Applications .................................................................................................................. 508
18.10.1 SPU Decrementer Event .................................................................................................. 50818.10.2 Tag-Group Status Update Event ...................................................................................... 50918.10.3 DMA List Command Stall-and-Notify Event ..................................................................... 51018.10.4 MFC SPU Command Queue Available Event .................................................................. 51218.10.5 SPU Read Inbound Mailbox Available Event ................................................................... 51318.10.6 SPU Signal-Notification Available Event .......................................................................... 51318.10.7 Lock-Line Reservation Lost Event ................................................................................... 51318.10.8 Privileged Attention Event ................................................................................................ 514
19. DMA Transfers and Interprocessor Communication ......................................... 51519.1 Introduction .................................................................................................................................. 515
-
Programming Handbook
Cell Broadband Engine
Version 1.12April 3, 2009
ContentsPage 13 of 876
19.2 MFC Commands ......................................................................................................................... 51619.2.1 DMA Commands ............................................................................................................... 51819.2.2 DMA List Commands ......................................................................................................... 52019.2.3 Synchronization Commands .............................................................................................. 52019.2.4 Atomic Update Commands ................................................................................................ 52019.2.5 Command Modifiers .......................................................................................................... 52119.2.6 Tag Groups ........................................................................................................................ 52119.2.7 MFC Command Issue ........................................................................................................ 52319.2.8 Replacement Class ID and Transfer Class ID ................................................................... 52319.2.9 DMA-Command Completion .............................................................................................. 525
19.3 PPE-Initiated DMA Transfers ...................................................................................................... 52519.3.1 MFC Command Issue ........................................................................................................ 52519.3.2 MFC Command-Queue Control Registers ........................................................................ 52719.3.3 DMA-Command Issue Status and Errors .......................................................................... 527
19.4 SPE-Initiated DMA Transfers ...................................................................................................... 53119.4.1 MFC Command Issue ........................................................................................................ 53219.4.2 MFC Command-Queue Monitoring Channels ................................................................... 53319.4.3 DMA Command Issue Status and Errors .......................................................................... 53419.4.4 DMA List Command Example ........................................................................................... 538
19.5 Performance Guidelines for MFC Commands ............................................................................ 54119.6 Mailboxes .................................................................................................................................... 541
19.6.1 Reading and Writing Mailboxes ......................................................................................... 54219.6.2 Mailbox Blocking ................................................................................................................ 54319.6.3 Dealing with Anticipated Messages ................................................................................... 54319.6.4 Uses of Mailboxes ............................................................................................................. 54419.6.5 SPU Outbound Mailboxes ................................................................................................. 54419.6.6 SPU Inbound Mailbox ........................................................................................................ 549
19.7 Signal Notification ....................................................................................................................... 55319.7.1 SPU Signalling Channels .................................................................................................. 55319.7.2 Uses of Signaling ............................................................................................................... 55419.7.3 Mode Configuration ........................................................................................................... 55519.7.4 SPU Signal Notification 1 Channel .................................................................................... 55519.7.5 SPU Signal Notification 2 Channel .................................................................................... 55519.7.6 Sending Signals ................................................................................................................. 55519.7.7 Receiving Signals .............................................................................................................. 55919.7.8 Differences Between Mailboxes and Signal Notification ................................................... 561
20. Shared-Storage Synchronization ........................................................................ 56320.1 Shared-Storage Ordering ............................................................................................................ 563
20.1.1 Storage Model ................................................................................................................... 56320.1.2 PPE Ordering Instructions ................................................................................................. 56520.1.3 SPU Ordering Instructions ................................................................................................. 56920.1.4 MFC Ordering Mechanisms ............................................................................................... 57220.1.5 MFC Multisource Synchronization Facility ......................................................................... 57820.1.6 Scenarios for Using Ordering Mechanisms ....................................................................... 585
20.2 PPE Atomic Synchronization ...................................................................................................... 58620.2.1 Atomic Synchronization Instructions .................................................................................. 58620.2.2 PPE Synchronization Primitives ........................................................................................ 588
-
Programming Handbook
Cell Broadband Engine
ContentsPage 14 of 876
Version 1.12April 3, 2009
20.3 SPE Atomic Synchronization ....................................................................................................... 59120.3.1 MFC Commands for Atomic Updates ................................................................................ 59120.3.2 The MFC Read Atomic Command Status Channel ........................................................... 59320.3.3 Avoiding Livelocks ............................................................................................................. 59320.3.4 Synchronization Primitives ................................................................................................. 595
21. Parallel Programming ........................................................................................... 60321.1 Challenges .................................................................................................................................. 60321.2 Patterns of Parallel Programming ............................................................................................... 603
21.2.1 Terminology ....................................................................................................................... 60421.2.2 Finding Parallelism ............................................................................................................. 60521.2.3 Strategies for Parallel Programming .................................................................................. 606
21.3 Steps for Parallelizing a Program ................................................................................................ 60821.3.1 Step 1: Understand the Problem ........................................................................................ 60821.3.2 Step 2: Choose Programming Tools and Technology ....................................................... 60821.3.3 Step 3: Develop High-Level Parallelization Strategy ......................................................... 60921.3.4 Step 4: Develop Low-Level Parallelization Strategy .......................................................... 60921.3.5 Step 5: Design Data Structures for Efficient Processing .................................................... 60921.3.6 Step 6: Iterate and Refine .................................................................................................. 61021.3.7 Step 7: Fine-Tune .............................................................................................................. 610
21.4 Levels of Parallelism in the CBEA Processors ............................................................................ 61121.4.1 SIMD Parallelization ........................................................................................................... 61221.4.2 Superscalar Parallelization ................................................................................................ 61221.4.3 Hardware Multithreading .................................................................................................... 61221.4.4 Multiple Execution Units ..................................................................................................... 61221.4.5 Multiple CBEA Processors ................................................................................................. 613
21.5 Tools for Parallelization ............................................................................................................... 61421.5.1 Language Extensions: Intrinsics and Directives ................................................................ 61421.5.2 Compiler Support for Single Shared-Memory Abstraction ................................................. 61521.5.3 OpenMP Directives ............................................................................................................ 61521.5.4 Compiler-Controlled Software Cache ................................................................................ 61721.5.5 Compiler and Runtime Support for Code Partitioning ........................................................ 62021.5.6 Thread Library .................................................................................................................... 621
22. SIMD Programming ............................................................................................... 62322.1 SIMD Basics ................................................................................................................................ 623
22.1.1 Converting Scalar Data to SIMD Data ............................................................................... 62422.1.2 Approaching SIMD Coding Methodically ........................................................................... 62722.1.3 Coding for Effective Auto-SIMDization ............................................................................... 639
22.2 Auto-SIMDizing Compilers .......................................................................................................... 64122.2.1 Challenges ......................................................................................................................... 64222.2.2 Examples of Invalid and Valid SIMDization ....................................................................... 644
22.3 SIMDization Framework for a Compiler ...................................................................................... 64822.3.1 Phase 1: Basic-Block Aggregation ..................................................................................... 65022.3.2 Phase 2: Short-Loop Aggregation ...................................................................................... 65022.3.3 Phase 3: Loop-Level Aggregation ...................................................................................... 65122.3.4 Phase 4: Alignment Devirtualization .................................................................................. 65222.3.5 Phase 5: Length Devirtualization ....................................................................................... 65722.3.6 Phase 6: SIMD Code Generation and Instruction Scheduling ........................................... 658
-
Programming Handbook
Cell Broadband Engine
Version 1.12April 3, 2009
ContentsPage 15 of 876
22.3.7 SIMDization Example: Multiple Sources of SIMD Parallelism ........................................... 65922.3.8 SIMDization Example: Multiple Data Lengths ................................................................... 66222.3.9 Vector Operations and Mixed-Mode SIMDization ............................................................. 667
22.4 Other Compiler Optimizations ..................................................................................................... 66822.4.1 OpenMP ............................................................................................................................ 66822.4.2 Subword Data Types ......................................................................................................... 66822.4.3 Backend Scheduling for SPEs ........................................................................................... 66922.4.4 Interacting with Typical Optimizations ............................................................................... 670
23. Vector/SIMD Multimedia Extension and SPU Programming ............................. 67123.1 Architectural Differences ............................................................................................................. 671
23.1.1 Registers ........................................................................................................................... 67223.1.2 Data Types ........................................................................................................................ 67323.1.3 Instruction-Set Differences ................................................................................................ 674
23.2 Porting SIMD Code from the PPE to the SPEs ........................................................................... 67623.2.1 Code-Mapping Considerations .......................................................................................... 67623.2.2 Simple Macro Translation .................................................................................................. 67723.2.3 Full Functional Mapping .................................................................................................... 68023.2.4 Code-Portability Typedefs ................................................................................................. 68123.2.5 Compiler-Target Definition ................................................................................................. 681
24. SPE Programming Tips ........................................................................................ 68324.1 DMA Transfers ............................................................................................................................ 684
24.1.1 Initiating DMA Transfers from SPEs .................................................................................. 68424.1.2 Overlapping DMA Transfers and Computation .................................................................. 68424.1.3 DMA Transfers and LS Accesses ...................................................................................... 68924.1.4 Using DMA List Transfers .................................................................................................. 690
24.2 SPU Pipelines and Dual-Issue Rules .......................................................................................... 69024.3 Eliminating and Predicting Branches .......................................................................................... 691
24.3.1 Function-Inlining and Loop-Unrolling ................................................................................. 69224.3.2 Predication Using Select-Bits Instruction ........................................................................... 69224.3.3 Branch Hints ...................................................................................................................... 69324.3.4 Program-Based Branch Prediction .................................................................................... 69724.3.5 Profile or Linguistic Branch-Prediction ............................................................................... 69924.3.6 Software Branch-Target Address Cache ........................................................................... 70024.3.7 Using Control Flow to Record Branch History ................................................................... 700
24.4 Loop Unrolling and Pipelining ..................................................................................................... 70124.5 Offset Pointers ............................................................................................................................ 70424.6 Transformations and Table Lookups ........................................................................................... 704
24.6.1 The Shuffle-Bytes Instruction ............................................................................................ 70424.6.2 Fast SIMD 8-Bit Table Lookups ......................................................................................... 705
24.7 Integer Multiplies ......................................................................................................................... 70824.8 Scalar Code ................................................................................................................................ 708
24.8.1 Scalar Loads and Stores ................................................................................................... 70824.8.2 Promoting Scalar Data Types to Vector Data Types ......................................................... 710
24.9 Unaligned Loads ......................................................................................................................... 710
-
Programming Handbook
Cell Broadband Engine
ContentsPage 16 of 876
Version 1.12April 3, 2009
Appendix A. PPE Instruction Set and Intrinsics ....................................................... 715A.1 PowerPC Instruction Set ............................................................................................................... 715
A.1.1 Data Types .......................................................................................................................... 715A.1.2 PPE Instructions .................................................................................................................. 715A.1.3 Microcoded Instructions ....................................................................................................... 725
A.2 PowerPC Extensions in the PPE .................................................................................................. 732A.2.1 New PowerPC Instructions .................................................................................................. 732A.2.2 Implementation-Dependent Interpretation of PowerPC Instructions ................................... 735A.2.3 Optional PowerPC Instructions Implemented ...................................................................... 738A.2.4 PowerPC Instructions Not Implemented .............................................................................. 739A.2.5 Endian Support .................................................................................................................... 739
A.3 Vector/SIMD Multimedia Extension Instructions ........................................................................... 740A.3.1 Data Types .......................................................................................................................... 740A.3.2 Vector/SIMD Multimedia Extension Instructions .................................................................. 740A.3.3 Graphics Rounding Mode .................................................................................................... 744
A.4 C/C++ Language Extensions (Intrinsics) for Vector/SIMD Multimedia Extensions ....................... 746A.4.1 Vector Data Types ............................................................................................................... 746A.4.2 Vector Literals ...................................................................................................................... 747A.4.3 Intrinsics .............................................................................................................................. 748
A.5 Issue Rules ................................................................................................................................... 752A.6 Pipeline Stages ............................................................................................................................. 754
A.6.1 Instruction-Unit Pipeline ....................................................................................................... 754A.6.2 Vector/Scalar Unit Issue Queue .......................................................................................... 756A.6.3 Stall and Flush Points .......................................................................................................... 757
A.7 Compiler Optimizations ................................................................................................................. 759A.7.1 Instruction Arrangement ...................................................................................................... 759A.7.2 Avoiding Slow Instructions and Processor Modes ............................................................... 759A.7.3 Avoiding Dependency Stalls and Flushes ........................................................................... 760A.7.4 General Recommendations ................................................................................................. 762
Appendix B. SPU Instruction Set and Intrinsics ....................................................... 763B.1 SPU Instruction Set ....................................................................................................................... 763
B.1.1 Data Types .......................................................................................................................... 763B.1.2 Instructions .......................................................................................................................... 763B.1.3 Fetch and Issue Rules ......................................................................................................... 771B.1.4 Inline Prefetch and Instruction Runout ................................................................................ 775
B.2 C/C++ Language Extensions (Intrinsics) for SPU Instructions ..................................................... 776B.2.1 Vector Data Types ............................................................................................................... 776B.2.2 Vector Literals ...................................................................................................................... 778B.2.3 Intrinsics .............................................................................................................................. 779B.2.4 Inline Assembly ................................................................................................................... 783B.2.5 Compiler Directives ............................................................................................................. 783
Appendix C. Performance Monitor Signals .................................................