texas instruments multi-core processor with rapidio · ti confidential – nda restrictions...
TRANSCRIPT
TI Confidential – NDA Restrictions
TI Confidential – NDA Restrictions
Texas Instruments Multi-core Processor with RapidIO
RapidIO Trade Association 4Q13
1
TI Confidential – NDA Restrictions
Agenda
• Popular past & present DSP interconnects • KeyStone DSP RapidIO Features • KeyStone DSP Interconnection with SRIO • High performance SRIO DSP Processors in TI • Q&A
TI Confidential – NDA Restrictions
Past interconnection based on TI DSP for Baseband application
Inte
rface
to F
E Bo
ard
(Bac
kpla
ne)
Inte
rface
to N
etw
ork
(Bac
kpla
ne)
DSP
DSP ASIC
ASIC
ASIC
ASIC
DSP
DSP
DSP
Host
EMIF
EMIF
Ethernet
EMIF
EMIF HPI
TI Confidential – NDA Restrictions
Current Baseband Uplink Evolution
To N
etw
ork
I/F B
oard
(Bac
kpla
ne)
Custom Interface
Serial RapidIO Switch
Base Band Processing
RapidIO Link Host
C66x W/ Serial RapidIO
Bridge
C66x W/ Serial RapidIO
C66x W/ Serial RapidIO
C66x W/ Serial RapidIO
C66x W/ Serial RapidIO
C66x W/ Serial RapidIO
C66x W/ Serial RapidIO
C66x W/ Serial RapidIO
C66x W/ Serial RapidIO
TI Confidential – NDA Restrictions
Keystone Architecture The successful foundation for your next innovation
Tera
Net
Multicore Navigator
ARM CorePacs
DSP CorePacs
Security AccelerationPac
Audio AccelerationPac
Packet AccelerationPac
Switching and I/O
Multicore Shared Memory Controller
011100 100010 001111
* <
+
-
• Best Low Power SoC Architecture – State of the art SoC Infrastructure – Enable high performance applications – Twice the on chip memory with half the
power consumption vs. alternatives
• High Degree of Integration – Accelerate from user interface to IP – On-Chip Ethernet switching – Unparallel High throughput IOs – Lowest total cost of ownership
• Scalable and portable – From consumer, industrial to
infrastructure – SW design reuse, easy SW migration
from Keystone I to Keystone II
• Best return on Investment – Reliable, best quality product – Field proven technology with rich eco-
system, speed time to market
Industrial Communication Subsystem
TI Confidential – NDA Restrictions
KeyStone Serial RapidIO Overview
• KeyStone Serial RapidIO Support – 1.25, 2.5, 3.125, 5G Baud/sec per link (1.0, 2.0 2.5 4 GBits/sec w/ 8B/10B encoding)
• Up to four 1x links (each 1x link is bidirectional) • One 4x link (bi-directional pipe), which provides up to 20 GBaud/sec raw
– Up to 10Gbps for 4x link with 8B/10B encoding • Also supports two 2x and two 1x + one 2x modes
– Supports DSP to DSP on the same board, DSP to Switch, etc. • Layered architecture to minimize impact on software • Non-proprietary multi-vendor support • Flexible and scalable • Reduces chip count, board area and system cost
Serial RapidIO is a high-performance, packet-switched, interconnect technology that addresses the embedded industry's need for:
RELIABILITY INCREASED BANDWIDTH FASTER BUS SPEEDS
TI Confidential – NDA Restrictions
RapidIO Daisy Chain
• Connect 2 / 4 lanes from DSP to DSP or other devices with RapidIO Interface • Ideal for applications where multiple hops is acceptable • Can be used without switch
Daisy chain via two 1x links
C66x C66x C66x
C66x C66x C66x
Switch, Board
Connector or
other Devices
Switch, Board
Connector or
other Devices
Daisy chain via four 1x links
TI Confidential – NDA Restrictions
SWITCH SWITCH
RapidIO Switch Interconnect
C66x
C66x
C66x
C66x
C66x
C66x
C66x
C66x
C66x
C66x
C66x
C66x
C66x
C66x
FPGA
ASIC
• Connect multiple DSPs to RapidIO switch via 1x or 4x lanes for DSP Farm
• For specific function where latency is critical or continuous data exchange is expected, local interconnection can be established
• Ideal where high bandwidth is needed to each device (vs daisy chain)
TI Confidential – NDA Restrictions
Direct Interconnected RapidIO
C66x
C66x
C66x C66x
C66x
• Five DSPs are completely connected • Provides direct connection from 1 device to any other
device • Reduces latency but limited to 1x lane bandwidth to
each device
TI Confidential – NDA Restrictions
Mesh Network Configuration
C66x
C66x
C66x
C66x
C66x
C66x
C66x
C66x
C66x
• Mesh configuration with four 1x lanes
• Can be used in different mesh configurations depending upon number of RapidIO lanes available
TI Confidential – NDA Restrictions
Combination Interconnect
SWITCH
FPGA
ASIC
Bac
k pl
ane
• Use of any combination of options is possible
• Connectivity via RapidIO to back plane can also be used
TI Confidential – NDA Restrictions
KeyStone – Portfolio Highlights C6670 Wireless Accelerators
C6678/4/2/1 Pin Compatible 1 to 8 Core DSP
C6657/5/4 Low cost & Low power
Product range • Processors for low/mid/high end
applications
Reduces BOM cost • Integrated ARM, accelerators, and I/O –
smaller PCBs and fewer add-on chips
Solutions reduce time-to-market • Mature tools, free software, excellent
support and rich 3rd party ecosystem • Existing customers protect investment:
100% backward compatible with C64x/C674x based TI DSPs
Reliable processor supplier • 10+ years supply guarantee, strong roadmap
and TI’s financial stability
Low cost EVM + free tools & software
Pin Compatible
Pin Compatible
12
KeyStone I
KeyStone II
TI Confidential – NDA Restrictions
KeyStone Processors Available Today
1-2x C66x DSP cores Up to 10GHz fixed and floating pt
Best performance per Watt in the industry!
C66x DSP + ARM A15
C667X • Up to 8 C66x cores @ 1GHz to 1.25GHz • 8 cores: 320GMACs/160GFLOPs @ ~10W • Network acceleration Pacs
66AK2HXX • 4 A15 + 8 C66x: 352GMACs/198.4GFLOPs • 2 A15 + 4 C66x: 176GMACs/99.2GFLOPs • 13-16W depending upon DSP cores, speed,
temp & other factors
C6657 • 2 C66x up to 1.25GHz • 64GMACs/32GFLOPs @3.5W
C6655 • 1 C66x up to 1.25GHz • 32GMACs/16GFLOPs @2.5W
C6654 • 1 C66x up to 850MHz • 27.2GMACs/13.6GFLOPs @<2W
C667x and 66AK2Hx C665x
13
Power Optimized for Portable Applications
TI Confidential – NDA Restrictions
KeyStone Processors shipping in 1Q14
Multicore ARM Processor with additional DSP
4x ARM A15 + 1x C66x
66AK2E05 • 4 ARM A15s up to 1.4GHz • 1 C66x DSP up to 1.4GHz • 89.6GMACs/67.2GFLOPs @ ~8.5W • 2x 10Gb + 8x 1Gb Ethernet • 7MB on-chip RAM
AM5K2E04 • 4 ARM A15s at 1.4GHz • 2x 10Gb + 8x 1Gb Ethernet • 6MB on-chip RAM • 44.8GMACs/GFLOPs @ ~8W
AM5K2E02 • 2 ARM A15s up to 1.4GHz • 22.4GMACs/GFLOPS
66AK2Exx AM5K2Exx
14
Multicore ARM-Only Processors
4x ARM A15 up to 1.4GHz
2x ARM A15 up to 1.4GHz
TI Confidential – NDA Restrictions
KeyStone I
TI Confidential – NDA Restrictions
TMS320C6671/2/4/8 Functional Diagram
24mm x 24mm package
Tera
Net
Multicore Navigator
Security AccelerationPac
Packet AccelerationPac
+ * - <<
C66x DSP
Debug DDR3 72b -1600
512KB L2 Per C66x Core
1G Ethernet Switch
+ * - <<
C66x DSP
40 nm
Multicore Shared Memory Controller
4MB MSMC SRAM
011100 100010 001111
EMIF16
1GbE 2x
EMIF and IO High Speed SerDes Lanes HyperLink
4x
I2C UART
GPIO x16
PCIe 2x
SRIO 4x
+ * - <<
C66x DSP
+ * - <<
C66x DSP System Services
Power Manager
Debug
System Monitor
EDMA PktDMA + *
- << C66x DSP
+ * - <<
C66x DSP
+ * - <<
C66x DSP
+ * - <<
C66x DSP
SPI
TSIP x2
• Cores & Memory – Up to 8 C66x DSP 1GHz-1.25GHz – 320 GMACs, 160 GFLOPS – 8MB on chip memory w/ECC
• Multicore Infrastructure – Navigator with 8k queues, 1600 MIPS – Non blocking Network on Chip – Multicore Shared Memory Controller reduce
external memory access latency
• Switches – 1GbE: 2 external port switch
• Network, Transport – Crypto: IPsec, ESP, AH Tunneling, SRTP – Packet Acceleration – Multiple IP Addr,
1.5Mpps @ full wire-rate, QoS support
• Connectivity – 82Gbps – HyperLink(50), PCIe(10), SRIO(20), 1GbE(2)
• Power Optimized – <10W at 1GHz nominal temp – Advanced power optimization including Smart
Reflex
TI Confidential – NDA Restrictions
2MB MSMC SRAM
Multicore Shared Memory Controller
17
TMS320C6670 Functional Diagram
DUC/DDC/ CFR/DPD
Tera
Net
Multicore Navigator
Security AccelerationPac Air and IP
Packet AccelerationPac
+ * - <<
C66x DSP
System Services
Power Manager
Debug
System Monitor
EDMA PktDMA
DDR3 72b -1600
1MB L2 Per C66x Core
1G Ethernet Switch
+ * - <<
C66x DSP
Radio AccelerationPac
40 nm
011100 100010 001111
Front End 3 FFTC 2 RAC 1 TAC
Bit Rate 3 TCP3d 4 VCP2 1 TCP3e
SPI
1GbE 2x
EMIF and IO High Speed SerDes Lanes CPRI/OBSAI
6x HyperLink
4x
I2C UART
GPIO x16
PCIe 2x
SRIO 4x
+ * - <<
C66x DSP
+ * - <<
C66x DSP
• Cores & Memory – 4 C66x DSP 1GHz-1.2GHz – 153 GMACs, 76 GFLOPS – 6MB on chip memory w/ECC
• Multicore Infrastructure – Navigator with 8k queues, 1600 MIPS – Non blocking Network on Chip – Multicore Shared Memory Controller reduce
external memory access latency
• Switches – 1GbE: 2 external port switch
• Network, Transport – Crypto: IPsec, ESP, AH Tunneling, SRTP – Packet Acceleration – Multiple IP Addr,
1.5Mpps @ full wire-rate, QoS support
• Physical Layer Acceleration – Accelerate 2G/3G/4G wireless communication
signal chain
• Connectivity – 118Gbps – HyperLink(50), PCIe(10), SRIO(20),
CPRI/OBSAI(36), 1GbE(2) 24mm x 24mm package
TI Confidential – NDA Restrictions
21mm x 21mm package
TMS320C6655/57
1MB MSMC SRAM
Multicore Shared Memory Controller
Tera
Net
Multicore Navigator
+ * - <<
C66x DSP
System Services
Power Manager
Debug
System Monitor
EDMA PktDMA DDR3 36b -1333
1MB L2 Per C66x Core
Communication AccelerationPac
40 nm
011100 100010 001111
2 VCP2 TCP3
EMIF16
1GbE 1x
EMIF and IO High Speed SerDes Lanes HyperLink
4x
I2C UART
SPI PCIe 2x
SRIO 4x
+ * - <<
C66x DSP
McBSP
uPP
• Cores & Memory – 1 or 2 C66x DSP at 850MHz, 1GHz & 1.25GHz – 80 GMACs, 40 GFLOPS – 3MB on chip memory w/ECC
• Multicore Infrastructure – Navigator with 8k queues, 1600 MIPS – Non blocking Network on Chip – Multicore Shared Memory Controller reduce
external memory access latency – 36bit DDR3 with ECC at 1333MHz, 8GB
addressable external memory
• Communication Acceleration – 1xTCP3 – Turbo Decoder – 2xVCP2 – Viterbi Decoder
• Connectivity – 81Gbps – HyperLink(50), PCIe(10), SRIO(20), 1GbE(1)
• Power Optimized – 2.5W C6655 and 3.5W C6657 @ 1.0GHz at
85C case temperature – Advanced power optimization including Smart
Reflex – -55C to 100C Extended temperature options
TI Confidential – NDA Restrictions TI Confidential – NDA Restrictions
KeyStone II
TI Confidential – NDA Restrictions
Multicore Shared Memory Controller
66AK2H12/06
Tera
Net
Security AccelerationPac
Packet AccelerationPac
+ * - <<
C66x DSP
+ * - <<
C66x DSP ARM A15 System Services
Power Manager
Debug
System Monitor
EDMA PktDMA
DDR3 72b - 1600
DDR3 72b - 1600
4MB ARM Shared L2
1MB L2 Per C66x Core
1G Ethernet Switch
+ * - <<
C66x DSP
+ * - <<
C66x DSP ARM A15 + * - <<
C66x DSP
+ * - <<
C66x DSP ARM A15 + * - <<
C66x DSP
+ * - <<
C66x DSP ARM A15
EMIF16 USB3 SPI x3
2 HyperLink 8x
1GbE 4x
EMIF and IO High Speed SerDes Lanes
• Cores & Memory – 4x/8x C66x DSP up to 1.2GHz – 2x/4x ARM Cortex A15 up to 1.4GHz – 18MB on chip memory w/ECC – 2 x 72 bit DDR3 w/ECC, 10GB
addressable memory
• Multicore Infrastructure – Navigator with 16k queues, 3200 MIPS – 2.2 Tbps Network on Chip – 2.8 Tbps Shared Memory Controller
• Switches – 1GbE: 4 external port switch
• Network, Transport – 1.5 Mpps @ full wire-rate – Crypto: 6.4 Gbps, IPsec, SRTP – Accelerate layer 2,3 and transport
• Connectivity – 134Gbps – HyperLink(100), PCIe(10), SRIO(20),
1GbE(4)
SRIO 4x
PCIe 2x
Multicore Navigator 28 nm
I2C x3
UART x2
GPIO x32
6MB MSMC SRAM
011100 100010 001111
TI Confidential – NDA Restrictions
Multicore Shared Memory Controller
AM5K2E04/02
Tera
Net
Security AccelerationPac
Packet AccelerationPac
System Services
Power Manager
Debug
System Monitor
EDMA PktDMA
DDR3/3L 72b - 1600
4MB ARM Shared L2
EMIF16 USB3 x2
SPI x3
HyperLink 4x
1GbE 8x
EMIF and IO High Speed SerDes Lanes
• Cores & Memory – 4x/2x Cortex A15 1.25GHz – 1.4GHz – 6MB on chip memory w/ECC – 72 bit DDR3/3L w/ECC, 8GB
addressable memory
• Multicore Infrastructure – Navigator with 16k queues, 3200 MIPS – 2.2 Tbps Network on Chip – 2.8 Tbps Shared Memory Controller
• Switches – 1GbE: 8 external port switch – 10GbE: 2 external port switch
• Network, Transport – 1.5 Mpps @ full wire-rate – Crypto: 6.4 Gbps, IPsec, SRTP – Accelerate layer 2,3 and transport
• Connectivity – 94Gbps – HyperLink(50), PCIe(20), 10GbE(20),
1GbE(4)
• Power Optimized – 8.1W typical use case at 55C for K2E04
10GbE 2x
2 x PCIe 2x
Multicore Navigator 28 nm
I2C x3
UART x2 TSIP
2MB MSMC SRAM
011100 100010 001111
10G Ethernet Switch
1G Ethernet Switch
ARM A15 ARM A15 ARM A15 ARM A15
TI Confidential – NDA Restrictions
Multicore Shared Memory Controller
66AK2E02/05
Tera
Net
Security AccelerationPac
Packet AccelerationPac
System Services
Power Manager
Debug
System Monitor
EDMA PktDMA
DDR3/3L 72b - 1600
4MB ARM Shared L2
EMIF16 USB3 x2
SPI x3
HyperLink 4x
1GbE 8x
EMIF and IO High Speed SerDes Lanes
• Cores & Memory – 1x C66x DSP up to 1.2GHz – 1x/4x ARM Cortex A15 up to 1.4GHz – 6MB on chip memory w/ECC – 72 bit DDR3/3L w/ECC, 8GB addressable
memory
• Multicore Infrastructure – Navigator with 16k queues, 3200 MIPS – 2.2 Tbps Network on Chip – 2.8 Tbps Shared Memory Controller
• Switches – 1GbE: 8 external port switch – 10GbE: 2 external port switch
• Network, Transport – 1.5 Mpps @ full wire-rate – Crypto: 6.4 Gbps, IPsec, SRTP – Accelerate layer 2,3 and transport
• Connectivity – 98Gbps – HyperLink(50), PCIe(20), 10GbE(20),
1GbE(8)
• Power Optimized – 8.6W typical use case at 55C for K2E05
10GbE 2x
2 x PCIe 2x
Multicore Navigator 28 nm
I2C x3
UART x2 TSIP
2MB MSMC SRAM
011100 100010 001111
10G Ethernet Switch
1G Ethernet Switch
+ * - <<
C66x DSP ARM A15
ARM A15 ARM A15
ARM A15
1MB L2
TI Confidential – NDA Restrictions
TI Confidential – NDA Restrictions
Q&A
TI Confidential – NDA Restrictions
High Performance DSP Roadmap
2013 2014 Future
Next Mid-Range Processor • Multicore ARM and DSP • Industrial control and
communications
Next DSP Low • Multicore ARM and DSP
devices • Industrial, Audio and Communicaitons …
66AK2H12/06 • 2x/4x ARM A15, 1.4 GHz • 4x/8x C66x, 1.2GHz • PCIe, USB, GbE, SRIO • 40 x 40mm
Sampling Development Concept Production
OMAP L138 • 1xARM A9, 456 MHz • 1xC674x, 456 MHz • EMAC, USB2, TDM • 13mm2, 16mm2
C6748 • 1xC674x, 456 MHz • EMAC, USB2, McASP • 13mm2, 16mm2
C6678/4/2/1 • 1x/2x/4x/8x C66x • 1.25GHz, 8MB of L2 • Pcie, GbE, SRIO • 24 x 24mm
C6657 • 1x/2x C66x • 1.25 GHz, 3MB L2 • PCIe, USB, GbE, SRIO • 21 x 21mm
Next High End Multicore • High performance Multicore ARM + DSP • Large L2, 2x DDR4 • High speed serial I/O
Per
form
ance
, Sca
labl
e S
olut
ions
Future DSP Low Low Power DSP only
Industrial comms, Audio
TI Confidential – NDA Restrictions
High Performance ARM Roadmap
2013 2014 Future
Next Mid-Range Processor • Multicore ARM and DSP • Industrial control and
communications
Next High End Multicore • High performance Multicore ARM + DSP • Large L2, 2x DDR4 • High speed serial I/O 66AK2H12/06
• 2x/4x ARM A15, 1.4 GHz • 4x/8x C66x, 1.2GHz • 18MB on chip memory • PCIe, USB, GbE, SRIO
Sampling Development Concept Production
AM5K2E 04/02 • 2x/4x ARM A15 cores • 1.4 GHz, 4MB of L2 • PCIe, USB, 10GbE
66AK2E05 • 4 ARM A15, 1.4 GHz • 1 C66x, 1.2 GHz • PCIe USB, 10GbE
Per
form
ance
, Sca
labl
e S
olut
ions
Next ARM uP - Low • ARM Cortex A15 device • Industrial, Audio and Communications …
TI Confidential – NDA Restrictions
Improved DSP Interface with RapidIO
• Direct DSP-to-DSP interconnect with optional local Host / ASIC communication
• Used in complete baseband processor card
• ASIC used for custom applications
• Based upon MAC processing needed optional HOST processor is used
Host Processor (optional)
Bac
kpla
ne to
Rad
io &
Net
wor
k
ASIC (Optional)
ASIC (Optional)
1Ghz DSP 1Ghz DSP
1Ghz DSP 1Ghz DSP