ibm cell broadband engine
TRANSCRIPT
-
8/8/2019 IBM cell broadband engine
1/81
Cell Broadband Engine
Cell Broadband Engine - enabling densitycomputing for data-rich environments
2006 IBM Corporation
Cell BE enabling density computing
for data rich environmentsMichael GschwindBruce DAmoraAlexandre Eichenberger
-
8/8/2019 IBM cell broadband engine
2/81
Cell Broadband Engine
2006 IBM Corporation2 Cell Broadband Engine - enabling density computing for data-rich environments
Cell History IBM, SCEI/Sony, Toshiba Alliance formed in 2000
Design Center opened in March 2001
Based in Austin, Texas
Hardware designed in parallel with software
February 7, 2005: First external technical disclosures
August 15, 2005: First external architecture disclosures
August 25, 2005: Cell Launch - Architecture released
-
8/8/2019 IBM cell broadband engine
3/81
Cell Broadband Engine
2006 IBM Corporation3 Cell Broadband Engine - enabling density computing for data-rich environments
Acknowledgements
Cell is the result of a partnership between SCEI/Sony,
Toshiba, and IBM Cell represents the work of more than 400 people
starting in 2000 and a design investment of about
$400M
-
8/8/2019 IBM cell broadband engine
4/81
Cell Broadband Engine
2006 IBM Corporation4 Cell Broadband Engine - enabling density computing for data-rich environments
More about Cell Broadband Engine
http://www.research.ibm.com/cell
Online resources
SpecificationDocumentation
Open Source and Proprietary Tools
Operating SystemPlatform Simulator
-
8/8/2019 IBM cell broadband engine
5/81
Cell Broadband Engine
2006 IBM Corporation5 Cell Broadband Engine - enabling density computing for data-rich environments
Agenda
8:00 9:00 Motivation and Architecture
9:00 9:30 Heterogeneous Application Model
9:30 10:30 Compilation and Auto-Parallelization
10:30 11:00 BREAK
11:00 12:00 Programming Models & Applications
-
8/8/2019 IBM cell broadband engine
6/81
Cell Broadband Engine
Cell Broadband Engine - enabling densitycomputing for data-rich environments
2006 IBM Corporation
Cell Broadband Engine Architecture
Michael Gschwind
-
8/8/2019 IBM cell broadband engine
7/81
Cell Broadband Engine
2006 IBM Corporation7 Cell Broadband Engine - enabling density computing for data-rich environments
Computer Architecture at the turn of the millenium
The end of architecture was being proclaimed
Frequency scaling as performance driverState of the art microprocessors
multiple instruction issueout of order architectureregister renamingdeep pipelines
Little or no focus on compilers
Questions about the need for compiler research Academic papers focused on microarchitecture tweaks
-
8/8/2019 IBM cell broadband engine
8/81
Cell Broadband Engine
2006 IBM Corporation8 Cell Broadband Engine - enabling density computing for data-rich environments
The age of frequency scaling
SCALING:
Voltage: V/
Oxide: tox/
Wire width: W/
Gate width: L/
Diffusion: xd/
Substrate: * NA
RESULTS:
Higher Density: ~2
Higher Speed: ~
Power/ckt: ~1/ 2
Power Density:~Constant
p substrate, doping *NA
Scaled Device
L/ xd/
GATE
n+source
n+drain
WIRINGVoltage, V /
W/tox/
Source: Dennard et al., JSSC 1974.
0.5
0.6
0.7
0.8
0.9
1
710131619222528313437
Total FO4 Per Stage
Performance
bips
deeper pipeline
-
8/8/2019 IBM cell broadband engine
9/81
Cell Broadband Engine
2006 IBM Corporation9 Cell Broadband Engine - enabling density computing for data-rich environments
Frequency scaling
A trusted standby
Frequency as the tide that raises all boats
Increased performance across all applications
Kept the industry going for a decade
Massive investment to keep going
Equipment cost
Power increase
Manufacturing variability
New materials
-
8/8/2019 IBM cell broadband engine
10/81
Cell Broadband Engine
2006 IBM Corporation10 Cell Broadband Engine - enabling density computing for data-rich environments
but reality was closing in!
0
0.2
0.4
0.6
0.8
1
710131619222528313437
Total FO4 Pe r Stage
RelativetoO
ptimalFO4
bips
bips^3/W
0.001
0.01
0.1
1
10
100
1000
0.010.11
Active PowerLeakage Power
-
8/8/2019 IBM cell broadband engine
11/81
Cell Broadband Engine
2006 IBM Corporation11 Cell Broadband Engine - enabling density computing for data-rich environments
Power crisis
0.1
1
10
100
1000
0.010.11gate length Lgate (m)
classic scaling
Tox()
Vdd
Vt
The power crisis is notnatural
Created by deviating fromideal scaling theory
Vdd and Vt not scaled by
additional performance withincreased voltage
Laws of physics
Pchip = ntransistors * Ptransistor
Marginal performance gainper transistor low
significant power increase
decreased power/performance
efficiency
-
8/8/2019 IBM cell broadband engine
12/81
Cell Broadband Engine
2006 IBM Corporation12 Cell Broadband Engine - enabling density computing for data-rich environments
The power inefficiency of deep pipelining
0
0.2
0.4
0.6
0.8
1
710131619222528313437
Total FO4 Per Stage
Relative
toOptimalF
O4
bips
bips^3/W
Deep pipelining increases number of latches and switching rate
power increases with at least f2
Latch insertion delay limits gains of pipelining
Power-performance optimal Performance optimal
Source: Srinivasan et al., MICRO 2002
deeper pipeline
-
8/8/2019 IBM cell broadband engine
13/81
Cell Broadband Engine
2006 IBM Corporation13 Cell Broadband Engine - enabling density computing for data-rich environments
Hitting the memory wall
Latency gap
Memory speedup lags behindprocessor speedup
limits ILP
Chip I/O bandwidth gap
Less bandwidth per MIPS
Latency gap as applicationbandwidth gap
Typically (much) less than
chip I/O bandwidth
-60
-40
-20
0
20
40
60
80
100
1990 2003
normalized
MFLOPS Memory latency
Source: McKee, Computing Frontiers 2004
Source: Burger et al., ISCA 1996
usable bandwidth avg. request size
roundtrip latencyno. of inflight
requests
-
8/8/2019 IBM cell broadband engine
14/81
Cell Broadband Engine
2006 IBM Corporation14 Cell Broadband Engine - enabling density computing for data-rich environments
Our Y2K challenge
10 performance of desktop systems
1 TeraFlop / second with a four-node configuration
1 Byte bandwidth per 1 Flop
golden rule for balanced supercomputer design
scalable design across a range of design points
mass-produced and low cost
-
8/8/2019 IBM cell broadband engine
15/81
Cell Broadband Engine
2006 IBM Corporation15 Cell Broadband Engine - enabling density computing for data-rich environments
Cell Design Goals
Provide the platform for the future of computing
10 performance of desktop systems shipping in 2005
Computing density as main challenge
Dramatically increase performance per X X = Area, Power, Volume, Cost,
Single core designs offer diminishing returns on
investmentIn power, area, design complexity and verification cost
-
8/8/2019 IBM cell broadband engine
16/81
Cell Broadband Engine
2006 IBM Corporation16 Cell Broadband Engine - enabling density computing for data-rich environments
Necessity as the mother of invention Increase power-performance efficiency
Simple designs are more efficient in terms of power and area
Increase memory subsystem efficiency
Increasing data transaction size
Increase number of concurrently outstanding transactions
Exploit larger fraction of chip I/O bandwidth
Use CMOS density scaling
Exploit density instead of frequency scaling to deliver increasedaggregate performance
Use compilers to extract parallelism from application
Exploit application parallelism to translate aggregate
performance to application performance
-
8/8/2019 IBM cell broadband engine
17/81
Cell Broadband Engine
2006 IBM Corporation17 Cell Broadband Engine - enabling density computing for data-rich environments
Exploit application-level parallelism
Data-level parallelism
SIMD parallelism improves performance with little overhead
Thread-level parallelism
Exploit application threads with multi-core design approach
Memory-level parallelism
Improve memory access efficiency by increasing number ofparallel memory transactions
Compute-transfer parallelism
Transfer data in parallel to execution by exploiting applicationknowledge
-
8/8/2019 IBM cell broadband engine
18/81
Cell Broadband Engine
2006 IBM Corporation18 Cell Broadband Engine - enabling density computing for data-rich environments
Concept phase ideas
ChipMultiprocessor
efficient use ofmemory interface
large architected
register set
high frequencyand low power!
reverse Vddscaling for low
power
modular designdesign reuse
control cost of
coherency
-
8/8/2019 IBM cell broadband engine
19/81
Cell Broadband Engine
2006 IBM Corporation19 Cell Broadband Engine - enabling density computing for data-rich environments
Cell Architecture
Heterogeneous multi-core system
architecture Power Processor
Element for control tasks
Synergistic ProcessorElements for data-
intensive processing
Synergistic ProcessorElement (SPE)
consists of Synergistic Processor
Unit (SPU)
Synergistic Memory Flow
Control (SMF)
16B/cycle (2x)16B/cycle
BIC
FlexIOTM
MIC
DualXDRTM
16B/cycle
EIB (up to 96B/cycle)
16B/cycle
64-bit Power Architecture with VMX
PPE
SPE
LS
SXUSPU
SMF
PXUL1
PPU
16B/cycle
L232B/cycle
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
-
8/8/2019 IBM cell broadband engine
20/81
Cell Broadband Engine
2006 IBM Corporation20 Cell Broadband Engine - enabling density computing for data-rich environments
Shifting the Balance of Power with Cell Broadband Engine
Data processor instead of control system
Control-centric code stable over timeBig growth in data processing needs
Modeling
Games
Digital media
Scientific applications
Todays architectures are built on a 40 year old data modelEfficiency as defined in 1964
Big overhead per data operation
Parallelism added as an after-thought
-
8/8/2019 IBM cell broadband engine
21/81
Cell Broadband Engine
2006 IBM Corporation21 Cell Broadband Engine - enabling density computing for data-rich environments
Powering Cell the Synergistic Processor Unit
VRFVRF
PERMPERM LSULSUVV FF PP UU VV FF XX UU
Local StoreLocal Store
Single PortSingle Port
SRAMSRAMIssue /Issue /BranchBranch
Fetch
ILB
16B x 2
2 instructions
16B x 3 x 2
64B64B
16B16B
128B128B
Source: Gschwind et al., Hot Chips 17, 2005
SMFSMF
C ll B db d E i
-
8/8/2019 IBM cell broadband engine
22/81
Cell Broadband Engine
2006 IBM Corporation22 Cell Broadband Engine - enabling density computing for data-rich environments
Density Computing in SPEs
Today, execution units only fraction of core area and power
Bigger fraction goes to other functions Address translation and privilege levels Instruction reordering Register renaming Cache hierarchy
Cell changes this ratio to increase performance per area andpower
Architectural focus on data processing
Wide datapaths More and wide architectural registers Data privatization and single level processor-local store All code executes in a single (user) privilege level Static scheduling
C ll B db d E i
-
8/8/2019 IBM cell broadband engine
23/81
Cell Broadband Engine
2006 IBM Corporation23 Cell Broadband Engine - enabling density computing for data-rich environments
Streamlined Architecture Architectural focus on simplicity
Aids achievable operating frequency
Optimize circuits for common performance case
Compiler aids in layering traditional hardware functions
Leverage 20 years of architecture research
Focus on statically scheduled data parallelism
Focus on data parallel instructions No separate scalar execution units Scalar operations mapped onto data parallel dataflow
Exploit wide data paths
Data processing Instruction fetch
Address impediments to static scheduling
Large register set Reduce latencies by eliminating non-essential functionality
C ll B db d E i
-
8/8/2019 IBM cell broadband engine
24/81
Cell Broadband Engine
2006 IBM Corporation24 Cell Broadband Engine - enabling density computing for data-rich environments
SPE Highlights RISC organization
32 bit fixed instructions
Load/store architecture
Unified register file User-mode architecture
No page translation within SPU
SIMD dataflow Broad set of operations (8, 16, 32, 64 Bit) Graphics SP-Float
IEEE DP-Float
256KB Local Store
Combined I & D DMA block transfer
using Power Architecture memorytranslation
LS
LS
LS
LS
GPR
FXU ODD
FXU EVN
SFPDP
CONT
ROL
CHANNEL
DMA SMMATO
SBIRTB
BEB
FWD
14.5mm2 (90nm SOI)Source: Kahle, Spring Processor Forum 2005
C ll B db d E i
-
8/8/2019 IBM cell broadband engine
25/81
Cell Broadband Engine
2006 IBM Corporation25 Cell Broadband Engine - enabling density computing for data-rich environments
dataparallelselect
staticprediction
simplerarch
sequentialfetch
localstore
high
frequency
shorterpipeline
large
basicblocks
determ.latency
largeregisterfile
instructionbundling
DLP
widedata
paths
computedensity
Synergistic Processing
single
port
staticscheduling
scalarlayering
ILPopt.
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
26/81
Cell Broadband Engine
2006 IBM Corporation26 Cell Broadband Engine - enabling density computing for data-rich environments
Efficient data-sharing between scalar and SIMD processing
Legacy architectures separate scalar and SIMDprocessing
Data sharing between SIMD and scalar processing unitsexpensive
Transfer penalty between register files
Defeats data-parallel performance improvement in manyscenarios
Unified register file facilitates data sharing for
efficient exploitation of data parallelismAllow exploitation of data parallelism without data transfer
penalty
Data-parallelism alwaysan improvement
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
27/81
Cell Broadband Engine
2006 IBM Corporation27 Cell Broadband Engine - enabling density computing for data-rich environments
Compiling for the Cell Broadband Engine
The lesson of RISC computing
Architecture provides fast, streamlined primitives to compilerCompiler uses primitives to implement higher-level idioms
If the compiler cant target it do not include in architecture
Compiler focus throughout projectPrototype compiler soon after first proposal
Cell compiler team has made significant advances in
Automatic SIMD code generation Automatic parallelization
Data privatization
Programmability
Programmer
Productivity
Raw Hardware
Performance
Cell
Design
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
28/81
Cell Broadband Engine
2006 IBM Corporation28 Cell Broadband Engine - enabling density computing for data-rich environments
SPU PIPELINE FRONT END
SPU PIPELINE BACK END
Branch Instruction
Load/Store Instruction
IF Instruction Fetch
IB Instruction Buffer
ID Instruction Decode
IS Instruction Issue
RF Register File Access
EX ExecutionWB Write Back
Floating Point Instruction
Permute Instruction
EX1 EX2 EX3 EX4
EX1 EX2 EX3 EX4 EX5 EX6
EX1 EX2 WB
WB
RF1 RF2
WB
Fixed Point Instruction
EX1 EX2 EX3 EX4 EX5 EX6 WB
IF1 IF2 IF3 IF4 IF5 IB1 IB2 ID1 ID2 ID3 IS1 IS2
SPU Pipeline
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
29/81
Cell Broadband Engine
2006 IBM Corporation29 Cell Broadband Engine - enabling density computing for data-rich environments
Permute Unit
Load-Store Unit
Floating-Point Unit
Fixed-Point UnitBranch Unit
Channel Unit
Result Forwarding and Staging
Register File
Local Store(256kB)
Single Port SRAM
128B Read 128B Write
DMA Unit
Instruction Issue Unit / Instruction Line Buffer
8 Byte/Cycle 16 Byte/Cycle 128 Byte/Cycle64 Byte/Cycle
On-Chip Coherent Bus
SPE Block Diagram
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
30/81
Cell Broadband Engine
2006 IBM Corporation30 Cell Broadband Engine - enabling density computing for data-rich environments
SPU Communication: Channel Architecture
SPU uses channels to communicate with
environment (incl. SMF)Access to special purpose registers
Processor status
Communication channels
Requests to SMF
SMF status
Mailboxes
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
31/81
Cell Broadband Engine
2006 IBM Corporation31 Cell Broadband Engine - enabling density computing for data-rich environments
SPU Channel Features
Communication using channels numbered 0 - 127
Implementation dependent
Unidirectional
Have capacity Burst without SPU execution stop if capacity available
Channel operations are blocking
On write if full On read if empty
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
32/81
Cell Broadband Engine
2006 IBM Corporation32 Cell Broadband Engine - enabling density computing for data-rich environments
SPU Channel Access
rdch RT, ca RT
-
8/8/2019 IBM cell broadband engine
33/81
Cell Broadband Engine
2006 IBM Corporation33 Cell Broadband Engine - enabling density computing for data-rich environments
Synergistic Memory Flow Control
Data Bus
Snoop BusControl BusTranslate Ld/StMMIO
DMA Engine DMAQueue
RMTMMUAtomicFacility
Bus I/F Control
LS
SXU
SPU
MMIO
SPE
SMF
SMF implements memorymanagement and mapping
SMF operates in parallel to SPU
Independent compute and transferCommand interface from SPU
DMA queue decouples SMF & SPU
MMIO-interface for remote nodes
Block transfer between systemmemory and local store
SPE programs reference systemmemory using user-leveleffective address space
Ease of data sharing
Local store to local store transfers
Protection
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
34/81
Cell Broadband Engine
2006 IBM Corporation34 Cell Broadband Engine - enabling density computing for data-rich environments
Synergistic Memory Flow Control
Data Bus
Snoop BusControl BusTranslate Ld/StMMIO
DMA Engine DMAQueue
RMTMMUAtomicFacility
Bus I/F Control
LS
SXU
SPU
MMIO
SPE
SMF
SMF implements memorymanagement and mapping DMA for data transfer
MMU for page translation AF for coherent data
BIF for access to Element InterconnectBus
RMT for resource management
SMF operates in parallel to SPU
Independent compute and transfer
Channel command interface fromSPU DMA queue decouples SMF SPU
SPU can synchronize with SMF
MMIO-based interface for remote nodes
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
35/81
Cell Broadband Engine
2006 IBM Corporation35 Cell Broadband Engine - enabling density computing for data-rich environments
System-wide Virtual Memory Architecture
Data Bus
Snoop BusControl BusTranslate Ld/StMMIO
DMA Engine DMAQueue
RMTMMUAtomicFacility
Bus I/F Control
LS
SXU
SPU
MMIO
SPE
SMF
SMF MMU follows PowerArchitecture Virtual MemoryarchitectureTwo-level translation
Segmentation and paging
PPE and SPEs share memory map
SPE programs reference systemmemory using user-level effectiveaddress space
Ease of data sharing
Local store to local store transfers
Protection
Exceptions delivered to PPESLB miss
Page fault
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
36/81
Cell Broadband Engine
2006 IBM Corporation36 Cell Broadband Engine - enabling density computing for data-rich environments
Data Transfer with SMF DMAC
Data Bus
Snoop BusControl BusTranslate Ld/StMMIO
DMA Engine DMAQueue
RMTMMUAtomicFacility
Bus I/F Control
LS
SXU
SPU
MMIO
SPE
SMF
DMA Unit implements block datatransfer transfer specifies system memory and
local store address
system memory address is effectiveaddress
translated to physical address by SMFMMU
variable block size from 1B to 16KB
DMA transfers LS system memory
LS LS
LS I/O Transfers
DMA list command SMF program transfers data in parallel
to computation
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
37/81
Ce oadba d g e
2006 IBM Corporation37 Cell Broadband Engine - enabling density computing for data-rich environments
Synergistic Memory Flow Control
Bus Interface Up to 16 outstanding DMA
requests Requests up to 16KByte
Token-based Bus AccessManagement
Source: Clark et al., Hot Chips 17, 2005
Element Interconnect Bus Four 16 byte data rings
Multiple concurrent transfers
96B/cycle peak bandwidth
Over 100 outstanding requests
200+ GByte/s @ 3.2+ GHz
16B 16B 16B 16B
Data Arb
16B 16B 16B 16B
16B 16B16B 16B16B 16B16B 16B
16B
16B16B
16B
16B
16B16B
16B
SPE0 SPE2 SPE4 SPE6
SPE7SPE5SPE3SPE1
MIC
PPE
BIF/IOIF0
IOIF1
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
38/81
g
2006 IBM Corporation38 Cell Broadband Engine - enabling density computing for data-rich environments
Memory efficiency as key to application performance
Greatest challenge in translating peak into app performance
Peak Flops useless without way to feed data
Cache miss provides too little data too late
Inefficient for streaming / bulk data processing
Initiates transfer when transfer results are already needed
Application-controlled data fetch avoids not-on-time data delivery
SMF is a better way to look at an old problem
Fetch data blocks based on algorithm
Blocks reflect application data structures
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
39/81
g
2006 IBM Corporation39 Cell Broadband Engine - enabling density computing for data-rich environments
Traditional memory usage
Long latency memory access operations exposed
Cannot overlap 500+ cycles with computation Memory access latency severely impacts application performance
computation mem protocol mem idle mem contention
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
40/81
g
2006 IBM Corporation40 Cell Broadband Engine - enabling density computing for data-rich environments
Exploiting memory-level parallelism Reduce performance impact of memory accesses with concurrent
access
Carefully scheduled memory accesses in numeric code Out-of-order execution increases chance to discover more
concurrent accesses
Overlapping 500 cycle latency with computation using OoO illusory
Bandwidth limited by queue size and roundtrip latency
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
41/81
g
2006 IBM Corporation41 Cell Broadband Engine - enabling density computing for data-rich environments
Exploiting MLP with Chip Multiprocessing More threads more memory level parallelism
overlap accesses from multiple cores
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
42/81
2006 IBM Corporation42 Cell Broadband Engine - enabling density computing for data-rich environments
A new form of parallelism: CTP Compute-transfer parallelism
Concurrent execution of compute and transfer increasesefficiency
Avoid costly and unnecessary serialization
Application thread has two threads of control
SPU computation thread
SMF transfer thread
Optimize memory access and data transfer at the
application levelExploit programmer and application knowledge about
data access patterns
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
43/81
2006 IBM Corporation43 Cell Broadband Engine - enabling density computing for data-rich environments
Memory access in Cell with MLP and CTP Super-linear performance gains observed
decouple data fetch and use
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
44/81
2006 IBM Corporation44 Cell Broadband Engine - enabling density computing for data-rich environments
multicore
manyoutstanding
transfers
sequentialfetch
highthroughput
timelydata
delivery
shared
I & D
eliminatecache miss
logic
parallelcompute &
transfer
improvedcode gen
decouplefetch andaccess
explicitdatatransfer
storagedensity
Synergistic Memory Management
single
port
applicationcontrol
reducedcoherence
cost
lowlatencyaccess
blockfetch
deeperfetchqueue
localstore
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
45/81
2006 IBM Corporation45 Cell Broadband Engine - enabling density computing for data-rich environments
Cell BE Processor Components
Power Core(PPE)
L2 Cache
NCU
N
N
In the Beginning
the Power Architecture Processor
Custom Designed for high frequency, area
and power efficiency
Power Processor Element (PPE)Industry-standard 64-bit IBM PowerArchitecture processor
PowerPC AS 2.0.2
2-Way Hardware MultithreadedL1 : 32KB I ; 32KB D
L2 : 512KB
Coherent load/store
VMX
3.2+ GHz
Realtime Control
Locking L2 Cache & TLB
Bandwidth Reservation
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
46/81
2006 IBM Corporation46 Cell Broadband Engine - enabling density computing for data-rich environments
Cell BE Processor Components
Element Interconnect Busdata ring for internal communication
Four 16 byte data rings,supporting multiple transfers
96B/cycle peak bandwidth
Over 100 outstanding requests
200+ GByte/s @ 3.2+ GHz
96 Byte/Cycle 200+GB/sec @ 3.2+GHz
Element Interconnect Bus
Power Core(PPE)
L2 Cache
NCU
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
47/81
2006 IBM Corporation47 Cell Broadband Engine - enabling density computing for data-rich environments
Cell BE Processor Components
SPE provides computationalperformance
Dual issue 32-bit SIMD architecture
Dedicated resources
128-entry 128-bit VRF256KB Local Store
Each SMF can be dynamicallyconfigured to protect resources
Dedicated DMA engineUp to 16 outstanding requests
96 Byte/Cycle 200+GB/sec @ 3.2+GHz
Element Interconnect Bus
Power Core(PPE)
L2 Cache
NCU
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
48/81
2006 IBM Corporation48 Cell Broadband Engine - enabling density computing for data-rich environments
Cell BE Processor Components
SMF provides memory management& mapping
SPE Local Store aliased into systemmemory map
SMF controls SPE DMA accessesImplements page translation and
protection
Using Power Architecture systemmemory map
System memory map compatiblewith Power Architecture VirtualMemory architecture
S/W controllable from PPE MMIO
DMA 1,2,4,8,16,128 Byte
16Kbytetransfers for memory and I/O access
96 Byte/Cycle 200+GB/sec @ 3.2+GHz
Element Interconnect Bus
Power Core(PPE)
L2 Cache
NCU
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
49/81
2006 IBM Corporation49 Cell Broadband Engine - enabling density computing for data-rich environments
Cell BE Processor Components
I/O provides high bandwidthDual XDR controller
25.6GB/s @ 3.2Gbps
Two configurable interfaces
76.8GB/s @ 6.4Gbps
Configurable number of Bytes
Coherent or I/O ModeInterconnect
Supports multiple systemconfigurations
25 GB / sDRAM
MIC
IOIF1
IOIF0
Southbridge
I/O
20 GB / sBIF or IOIF1
5 GB / s
96 Byte/Cycle 200+GB/sec @ 3.2+GHz
Element Interconnect Bus
Power Core(PPE)
L2 Cache
NCU
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
MIC
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
50/81
2006 IBM Corporation50 Cell Broadband Engine - enabling density computing for data-rich environments
Cell BE Processor Components
IIC Internal Interrupt ControllerHandles SPE Interrupts
Handles External Interrupts
From Coherent Interconnect
From IOIF0 or IOIF1
Interrupt Priority Level Control
Duplicated for each PPEhardware thread
25 GB / sDRAM
MIC
IOIF1
IOIF0
Southbridge
I/O
20 GB / sBIF or IOIF1
5 GB / s
96 Byte/Cycle 200+GB/sec @ 3.2+GHz
Element Interconnect Bus
Power Core(PPE)
L2 Cache
NCU
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
MIC
IIC
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
51/81
2006 IBM Corporation51 Cell Broadband Engine - enabling density computing for data-rich environments
Cell BE Processor Components
IOT implements I/O Bus MasterTranslation
Translates bus address to systemaddress
Two Level translationI/O Segments: 256 MB
I/O Pages: 4KB, 64KB,1MB, 16MB
I/O Device Identifier / page for LPAR
IOST and IOPT Cachehardware/software managed
25 GB / sDRAM
MIC
IOIF1
IOIF0
Southbridge
I/O
20 GB / sBIF or IOIF1
5 GB / s
96 Byte/Cycle 200+GB/sec @ 3.2+GHz
Element Interconnect Bus
Power Core(PPE)
L2 Cache
NCU
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
MIC
IIC IOT
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
52/81
2006 IBM Corporation52 Cell Broadband Engine - enabling density computing for data-rich environments
Cell BE Processor ComponentsToken Manager provides Bandwidth
Reservation for shared resourcesOptionally used for RT tasks or LPAR
Multiple Resource Allocation Groups
Generates access tokens at configurablerate for each allocation group
1 per each memory bank (16 total)
2 for each IOIF (4 total)
Requestors assigned RAG ID byOS/hypervisor
Each SPEPPE L2 / NCU
IOIF 0 Bus Master
IOIF 1 Bus Master
Priority order for using another RAGs
unused tokensResource overcommit warning interrupt
25 GB / sDRAM
MIC
IOIF1
IOIF0
Southbridge
I/O
20 GB / sBIF or IOIF1
5 GB / s
96 Byte/Cycle 200+GB/sec @ 3.2+GHz
Element Interconnect Bus
Power Core(PPE)
L2 Cache
NCU
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
L
ocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
LocalStore
SPU
MFC
N
AUC
MIC
IIC IOTTKM
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
53/81
2006 IBM Corporation53 Cell Broadband Engine - enabling density computing for data-rich environments
Cell BE Implementation Characteristics
241M transistors
235mm2
Design operates across widefrequency range
Optimize for power & yield
> 200 GFlops (SP) @3.2GHz
> 20 GFlops (DP) @3.2GHz
Up to 25.6 GB/s memory bandwidth
Up to 75 GB/s I/O bandwidth
100+ simultaneous bustransactions
16+8 entry DMA queue per SPE
Relative
Frequency Increase vs. Power Consumption
Voltage
Source: Kahle, Spring Processor Forum 2005
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
54/81
2006 IBM Corporation54 Cell Broadband Engine - enabling density computing for data-rich environments
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
55/81
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
56/81
2006 IBM Corporation56 Cell Broadband Engine - enabling density computing for data-rich environments
Compiling and linking an integrated executable
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
57/81
2006 IBM Corporation57 Cell Broadband Engine - enabling density computing for data-rich environments
Loading and execution of a program
.text:pu_main:
spu_create_thread(0, spu0, spu0_main);spu_create_thread(1, spu1, spu1_main);
printf:
spu0:
spu1:
.data:
EIB
PPE
PXUL1
PPU
16B/cycle
L232B/cycle
SPE
.text:
spu0_main:
.data:
SXU
SPU
SMF
LS
SXU
SPU
SMF
LS
SXU
SPU
SMF
LS
SXU
SPU
SMF
LS
SXU
SPU
SMF
LS
SXUSPU
SMF
LS
SXU
SPU
SMF
LS
SXU
SPU
SMF
LS
SXU
SPU
SMF
.text:
spu0_main:
.data:
.text:
spu1_main:
.data:
System memory
PPE image loadsand executes;
PPE initiates
SMF transfer; SMF data transfer start SPU at
specified address;
SMF starts SPUexecution
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
58/81
2006 IBM Corporation58 Cell Broadband Engine - enabling density computing for data-rich environments
SPE thread creation
PPE transfers mini-loader and parameters
256b program
Mini-loader transfers thread memory image
Embedded in Power Architecture executable
Provided to spe_create_thread() call
SPE side program load more efficient
SPE has access more queue entriesParallel loading on 8 SPEs
Direct channel access vs. MMIO access
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
59/81
2006 IBM Corporation59 Cell Broadband Engine - enabling density computing for data-rich environments
Application Source& Libraries
PPE object files SPE object files
Physical SPEs
Cell Broadband Engine-aware OS (Linux)
SPE Virtualization / Scheduling Layer
SPE threadsPPE threads
Physical PPE
PPE
T1 T2 SPESPE SPE SPE
SPESPE SPE SPE
Heterogeneous Multi-threading and OS
management Heterogeneous Multi-
Threading Model
PPE ThreadsSPE Threads
SPE DMA EA = PPEProcess EA Space
OS supports Create &Destroy SPE tasks
Atomic Update Primitivesused for Mutex
SPE Context Fully Managed
OS assignment of SPEthreads
Programmer directed using
affinity mask
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
60/81
2006 IBM Corporation60 Cell Broadband Engine - enabling density computing for data-rich environments
Linux on Cell BE All software in STIDC written on Linux OS
Started with Linux 2.4 PPC64 on Cell Simulator
SPEs exposed as I/O Devices (function offload model)
SPE DMA required pre-pinned memory
Inflexible programming model
Moved to 2.6.3
Added heterogenous thread model via system call moved to SPUFS in 2.6.12
SPE thread API created (similar to pthreads library)
User mode direct and indirect SPE access models
Full pre-emptive SPE context management
spe_ptrace() added for gdb support
spe_schedule() for thread to physical SPE assignment
currently FIFO run to completion
SPE threads share address space with parent PPE process (through DMA)
Demand paging for SPE accesses
Shared hardware page table with PPE
SPE Error, Event and Signal handling directed to parent PPE thread SPE elf objects wrapped into PPE shared objects with extended gld
SPE-side mini-loader
madvise() extended for L2 cache and TLB locking/preloading (realtime feature)
All patches for Cell in architecture dependent layer (subtree of PPC64)
Except for a few shameless hacks - being removed in 2.6.12 Publishing Initial Cell BE Patches for 2.6.12
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
61/81
2006 IBM Corporation61 Cell Broadband Engine - enabling density computing for data-rich environments
SPE data transfer (from SPE or PPE) Embedded in program
At compile time
At runtime Direct store to SPU LS
Using memory map alias when SPU LS mapped into memory map
Mailbox
Channel in Synergistic Processor Architecture MMIO in IBM Power Architecture core
Externally initiated transfer
Using SMF block transfer capabilities
From PPE or remote SPE
SPU-initiated transfer
Based on an address provided using one of these four methods
Based on address computed from data obtained by these five methods
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
62/81
2006 IBM Corporation62 Cell Broadband Engine - enabling density computing for data-rich environments
Data sharing between PPE and SPE
SPE addresses system memory via SMF copy-in/copy-out using effective address
System memory pointers can be shared between PPEand SPE
PPE can access SPE local store using SMF orusing memory accesses
PPE enqueues SMF requests via memory mapped I/O
Aliasing of SPE LS gives PPE addressability as systemaddress
Add LS base address of local store to use in PPE
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
63/81
2006 IBM Corporation63 Cell Broadband Engine - enabling density computing for data-rich environments
Access to data using common effective addressesppe_sum_all(float *a){for (i=0; i
-
8/8/2019 IBM cell broadband engine
64/81
2006 IBM Corporation64 Cell Broadband Engine - enabling density computing for data-rich environments
Access to data using common effective addressesppe_sum_all(float *a){for (i=0; i
-
8/8/2019 IBM cell broadband engine
65/81
2006 IBM Corporation65 Cell Broadband Engine - enabling density computing for data-rich environments
Access to data using common effective addressesppe_sum_all(float *a){for (i=0; i
-
8/8/2019 IBM cell broadband engine
66/81
2006 IBM Corporation66 Cell Broadband Engine - enabling density computing for data-rich environments
Access to data using common effective addressesppe_sum_all(float *a){for (i=0; i
-
8/8/2019 IBM cell broadband engine
67/81
2006 IBM Corporation67 Cell Broadband Engine - enabling density computing for data-rich environments
Access to data using common effective addressesppe_sum_all(float *a){for (i=0; i
-
8/8/2019 IBM cell broadband engine
68/81
2006 IBM Corporation68 Cell Broadband Engine - enabling density computing for data-rich environments
Access to data using common effective addressesppe_sum_all(float *a){for (i=0; i
-
8/8/2019 IBM cell broadband engine
69/81
2006 IBM Corporation69 Cell Broadband Engine - enabling density computing for data-rich environments
Mailbox communication
SPE
unsigned int mbox = spu_read_in_mbox();
spu_write_out_mbox(mbox);
PPE
while (spe_stat_in_mbox(speid) == 0);spe_write_in_mbox(speid,data);
unsigned int rmbox = spe_read_out_mbox(speid);
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
70/81
2006 IBM Corporation70 Cell Broadband Engine - enabling density computing for data-rich environments
Atomic updates and semaphores
Atomic updates
atomic_set((atomic_ea_t)(ptrAtomicData),0xffffffff);atomic_set, atomic_add, atomic_sub,atomic_dec_and_test,...
Mutex lock/unlock
mutex_lock (cond_mutex_ea);
cond_signal (cond_ea);
mutex_unlock (cond_mutex_ea);
-
8/8/2019 IBM cell broadband engine
71/81
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
72/81
2006 IBM Corporation72 Cell Broadband Engine - enabling density computing for data-rich environments
Cell Real Address Memory Map
Local storage of each SPE aliased in system memory map Direct (uncacheable) access by PPE
Used for LS LS transfer Access control via page table
QoS memory is pinned system memory bandwidth and latency guarantee
managed by O/S
I/O devices external to BE defined by system and I/O architecture
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
73/81
2006 IBM Corporation73 Cell Broadband Engine - enabling density computing for data-rich environments
System management of Cell BE resources
Cell BE implements full set of Power Architecture
virtualization and dynamic partitioningSupport of partition configuration state
Logical Partition ID etc.
Full state management by PPEAccess via memory mapped I/O registers
Grouped by privilege level
Access control to MMIO facilities controlled by page accesscontrol
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
74/81
2006 IBM Corporation74 Cell Broadband Engine - enabling density computing for data-rich environments
Per SPE Resources (PPE Side)
Problem State
8 Entry MFC Command Queue Interface
DMA Command and Queue Status
DMA Tag Status Query Mask
DMA Tag Status
32 bit Mailbox Status and Data from SPU
32 bit Mailbox Status and Data to SPU
4 deep FIFO
Signal Notification 1
Signal Notification 2
SPU Run Control
SPU Next Program Counter
SPU Execution Status
4K Physical Page Boundary
4K Physical Page Boundary
Optionally Mapped 256K Local Store
Privileged 1 State (OS)
4K Physical Page Boundary
SPU Privileged Control
SPU Channel Counter Initialize
SPU Channel Data Initialize
SPU Signal Notification Control
SPU Decrementer Status & Control
MFC DMA Control
MFC Context Save / Restore Registers
SLB Management Registers
Privileged 2 State(OS or Hypervisor)
4K Physical Page Boundary
SPU Master Run Control
SPU ID
SPU ECC Control
SPU ECC Status
SPU ECC AddressSPU 32 bit PU Interrupt Mailbox
MFC Interrupt Mask
MFC Interrupt Status
MFC DMA Privileged Control
MFC Command Error Register
MFC Command Translation Fault Register
MFC SDR (PT Anchor)MFC ACCR (Address Compare)
MFC DSSR (DSI Status)
MFC DAR (DSI Address)
MFC LPID (logical partition ID)
MFC TLB Management Registers
Optionally Mapped 256K Local Store
4K Physical Page Boundary
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
75/81
2006 IBM Corporation75 Cell Broadband Engine - enabling density computing for data-rich environments
Per SPE Resources (SPU Side)
128 - 128 bit GPRs
External Event Status (Channel 0)
Decrementer Event
Tag Status Update Event
DMA Queue Vacancy Event
SPU Incoming Mailbox Event
Signal 1 Notification Event
Signal 2 Notification Event
Reservation Lost Event
External Event Mask (Channel 1)
External Event Acknowledgement (Channel 2)
Signal Notification 1 (Channel 3)
Signal Notificaiton 2 (Channel 4)
Set Decrementer Count (Channel 7)
Read Decrementer Count (Channel 8)
16 Entry MFC Command Queue Interface (Channels 16-21)
DMA Tag Group Query Mask (Channel 22)
Request Tag Status Update (Channel 23)
Immediate
Conditional - ALL
Conditional - ANY
Read DMA Tag Group Status (Channel 24)
DMA List Stall and Notify Tag Status (Channel 25)
DMA List Stall and Notify Tag Acknowledgement (Channel 26)
Lock Line Command Status (Channel 27)
Outgoing Mailbox to PU (Channel 28)
Incoming Mailbox from PU (Channel 29)Outgoing Interrupt Mailbox to PU (Channel 30)
SPU Direct Access Resources SPU Indirect Access Resources
(via EA Addressed DMA)
System Memory
Memory Mapped I/OThis SPU Local Store
Other SPU Local Store
Other SPU Signal Registers
Atomic Update (Cacheable Memory)
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
76/81
2006 IBM Corporation76 Cell Broadband Engine - enabling density computing for data-rich environments
Memory Flow Controller Commands
LSA - Local Store Address (32 bit)
EA - Effective Address (32 or 64 bit)
TS - Transfer Size (16 bytes to 16K bytes)LS - DMA List Size (8 bytes to 16 K bytes)
TG - Tag Group(5 bit)
CL - Cache Management / Bandwidth Class
DMA Commands
Command Parameters
Put - Transfer from Local Store to EA space
Puts - Transfer and Start SPU execution
Putr- Put Result - (Arch. Scarf into L2)
Putl - Put using DMA List in Local Store
Putrl - Put Result using DMA List in LS (Arch)
Get - Transfer from EA Space to Local StoreGets - Transfer and Start SPU execution
Getl - Get using DMA List in Local Store
Sndsig - Send Signal to SPU
Command Modifiers:
f: Embedded Tag Specific Fence
Command will not start until all previous commandsin same tag group have completed
b: Embedded Tag Specific Barrier
Command and all subsiquent commands in same
tag group will not start until previous commands in same
tag group have completed
SL1 Cache Management Commands
sdcrt - Data cache region touch (DMA Get hint)
sdcrtst - Data cache region touch for store (DMA Put hint)
sdcrz - Data cache region zero
sdcrs - Data cache region storesdcrf - Data cache region flush
Synchronization Commands
Lockline (Atomic Update) Commands:
getllar- DMA 128 bytes from EA to LS and set Reservation
putllc - Conditionally DMA 128 bytes from LS to EA
putlluc - Unconditionally DMA 128 bytes from LS to EAbarrier- all previous commands complete before subsequent
commands are started
mfcsync - Results of all previous commands in Tag group
are remotely visible
mfceieio - Results of all preceding Puts commands in same
group visible with respect to succeeding Get commands
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
77/81
2006 IBM Corporation77 Cell Broadband Engine - enabling density computing for data-rich environments
Raising the bar with parallelism
Data-parallelism and static ILP in both core types
Results in low overhead per operation Multithreaded programming is key to great Cell BE performance
Exploit application parallelism with 9 cores
Regardless of whether code exploits DLP & ILP
Challenge regardless of homogeneity/heterogeneity
Leverage parallelism between data processing and data transfer
A new level of parallelism exploiting bulk data transfer
Simultaneous processing on SPUs and data transfer on SMFs
Offers superlinear gains beyond MIPS-scaling
-
8/8/2019 IBM cell broadband engine
78/81
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
79/81
2006 IBM Corporation79 Cell Broadband Engine - enabling density computing for data-rich environments
System configurations
Cell BEProcessor
XDR XDR
IOIF
BIF
Cell BEProcessor
XDR XDR
IOIF
CellBE
Processor
XDR
XDR
IOIF
BIF
CellBE
Processor
XDR
XDR
IOIFswitch
Cell BEProcessor
XDR XDR
IOIF BIF
Cell BEProcessor
XDR XDR
IOIF
Cell BEProcessor
XDR XDR
IOIF0 IOIF1
Game console systems
Blades
HDTV
Home media servers
Supercomputers
Cell
Design
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
80/81
2006 IBM Corporation80 Cell Broadband Engine - enabling density computing for data-rich environments
Cell: a Synergistic System Architecture Cell is not a collection of different processors, but a synergistic whole
Operation paradigms, data formats and semantics consistent
Share address translation and memory protection model
SPE optimized for efficient data processing
SPEs share Cell system functions provided by Power Architecture
SMF implements interface to memory
Copy in/copy out to local storage
Power Architecture provides system functions
Virtualization
Address translation and protection
External exception handling
EIB integrates system as data transport hub
Cell Broadband Engine
-
8/8/2019 IBM cell broadband engine
81/81
Copyright International Business Machines Corporation 2006.All Rights Reserved. Printed in the United States June 2006.
The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both.IBM IBM Logo Power Architecture
Other company, product and service names may be trademarks or service marks of others.
All information contained in this document is subject to change without notice. The products described in this document areNOT intended for use in applications such as implantation, life support, or other hazardous uses where malfunction could resultin death, bodily injury, or catastrophic property damage. The information contained in this document does not affect or change
IBM product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnityunder the intellectual property rights of IBM or third parties. All information contained in this document was obtained in specificenvironments, and is presented as an illustration. The results obtained in other operating environments may vary.
While the information contained herein is believed to be accurate, such information is preliminary, and should not be reliedupon for accuracy or completeness, and no representations or warranties of accuracy or completeness are made.
THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN "AS IS" BASIS. In no event will IBM be liablefor damages arising directly or indirectly from any use of the information contained in this document.