a deep dive on the qoriq t1040 l2 switch - nxp...
TRANSCRIPT
External Use
TM
A Deep Dive on the QorIQ T1040
L2 Switch
FTF-NET-F0007
F e b . 2 1 . 2 0 1 4
Suchit Lepcha | Application Engineering Manager
TM
External Use 2
T1040 Architecture Processor
• 4x e5500, 64b, up to 1.4GHz
• Each with 256KB backside L2 cache
• 256KB Shared Platform Cache w/ECC
• Supports up to 64GB addressability (36 bit physical
addressing)
Memory SubSystem
• 32/64b DDR3L/4 Controller up to 1600MT/s
Cygnus Switch Fabric
High Speed Serial IO
• 4x PCIe Gen2 Controllers
• 2x SATA 2.0, 3Gb/s
• 2x USB 2.0 with PHY
Network IO
• FMan packet Parse/Classify/Distribute
• Lossless Flow Control, IEEE 1588
• Up to 4x 10/100/1000 Ethernet Controllers
• 8-Port Gigabit Ethernet Switch
• QUICC Engine
• HDLC, 2x TDM
• Green Energy Operation
• Fanless operation quad-core 1.2GHz
• Packet lossless deepsleep
• Programmable wake-on-packet
• Wake-on-timer/GPIO/USB/IRQ
Datapath Acceleration
• SEC- crypto acceleration
• PME- Reg-ex Pattern
Matcher
Device
• 28HPM Process
• 780-pin 3-2-3 C4 FC package
• 23x23mm, 0.8mm pitch
Power targets
• Enable Convection cooled
system design
Peripheral
Access Mgmt Unit
CoreNet™ Coherency Fabric
Watchpoint Cross
Trigger
Perf Monitor
CoreNet Trace
PAMU PAMU PAMU
Real Time Debug
Security Fuse
Processor
Security Monitor
16b IFC
Power Management
SD/MMC+
2x DUART
2x I2C
SPI, GPIO
64-bit
DDR2/3
Memory
Controller
32/64-bit
DDR3L/4
Memory
Controller
PAMU
Queue
Mgr.
Buffer
Mgr.
Pattern
Match
Engine
2.0
Security 5.x
(XoR,
CRC)
Parse, Classify,
Distribute
8-Lane 5GHz SERDES
2x USB 2.0 w/PHY 1G 1G 1G
1G 1G 1G
1G
1G
8 Port
Switch
1G 1G 1G
TD
M/H
DLC
QUICC
Engine
TD
M/H
DLC
256KB
Platform Cache
Power Architecture®
e5500
D-Cache I-Cache
256 KB
Backside
L2
Cache 32 KB 32 KB
PC
Ie
PC
Ie
2xDMA
PC
Ie
PC
Ie
SA
TA
2.0
SA
TA
2.0
DIU
1G
TM
External Use 3
L2 Switch Summary
• Fully non-blocking wire speed Ethernet switch with WRED
− 8x 1G user facing ports
− 2Mbit packet memory
− 8k MAC addresses
− 4k VLAN support
− Jumbo frame support (10kB)
− 8x QoS, 8x Queues/Port
TM
External Use 4
T1040: Gigabit Ethernet Switch
• Advanced Features
− Priority flow control - lossless
− Lower latency and shared buffer management
− Advanced classification, shaping and policing
• Power savings
− With support for latest standards including IEEE 802.3az Energy Efficient Ethernet (EEE)
• Cost savings
− Through switch integration, low-pin count QSGMII connectivity and port count / cost optimization
• Increased ROI - Lower TTM and high re-use
− Integrated solution kit with software reuse potential
• Support for Full featured L2 software stacks
Parse, Classify,Distribute
QManI/F
BManI/F
Fabric I/F FMan
QSGMIIQSGMII
8K MACs4K VLANs
RMON Counters
ManagementI/F
5GHz SERDES
2.5GMAC
1GMAC
1GMAC
1GMAC
1GMAC
1GMAC
1GMAC
1GMAC
1GMAC
1G
MA
C
1G
MA
C
1G
MA
C
2.5GMAC
2.5GMAC
2.5GMAC
TCAM 1K
L2- SwitchIEEE 1588v2
IEEE 1588v2
MACSec
SGM
II
SGM
II
SGM
II
SGM
II
SGM
II
SGM
II
Quad
PHY
Quad
PHY 4 x SGMII or 2 x SGMII or
TM
External Use 5
Packet Flow
Parse, Classify,Distribute
QManI/F
BManI/F
Fabric I/F FMan
QSGMIIQSGMII
8K MACs4K VLANs
RMON Counters
ManagementI/F
5GHz SERDES
2.5GMAC
1GMAC
1GMAC
1GMAC
1GMAC
1GMAC
1GMAC
1GMAC
1GMAC
1G
MA
C
1G
MA
C
1G
MA
C
2.5GMAC
2.5GMAC
2.5GMAC
TCAM 1K
L2- SwitchIEEE 1588v2
IEEE 1588v2
MACSec
SGM
II
SGM
II
SGM
II
SGM
II
SGM
II
SGM
II
Quad
PHY
Quad
PHY 4 x SGMII or 2 x SGMII or
E5500
Control Packets
Packet forwarding
WAN traffic
TM
External Use 6
Generic Enterprise Router Features
• Higher QoS – benefit – lossless behavior
− 8 queues/port
− PFC (Priority based Flow Control)
− Sophisticated classification
• Complex classification requirements - benefit – treat user traffic differently and offload the processor
• Higher ACL requirements - benefit – redirect/drop/deny access
• Delivering all of these in low power
Ban
dw
idth
, co
st a
nd
po
wer
Features
Enterprise
Gateway
SME
Enterprise
Router
TM
External Use 7
Agenda
• Overview
• Switch Functions
− Block Diagram
− Forward Frames
− Learning
− Avoid loops
− System Interface
• Software
• Conclusion
TM
External Use 8
Block Diagram
MAC
1G Port
Module
#0
MAC
1G Port
Module
#7
MAC
2.5G
Port
Module
#8
MAC
2.5G
Port
Module
#9
Port Module Interface
Ingress Statistics
IS1 TCAM
Frame Classification
(QoS VLAN)
Translation/Remarking
IS2 TCAM
Security Enforcement
MAC/IP Binding
DLB Policers
L2 Forwarding
Ingress Processing
Egress Statistics
ES0 TCAM
Rewriter
VLAN Translation
Push/pop tags
DSCP remapping
Egress Processing
Shared Memory
Pool
2Mbit
Memory Controller
Shapers and Schedulers
Shared Queue System
MIIM
Controller
Register
Access
CPU Port
Module
Port #10
CPU Frame Extraction
and Rejection
System Bus
10 switch ports: 8x 1GbE + 2x 1GbE/
2.5GbE
v Switch Core
v TCAM Packet Processing Ingress and Egress
v Buffer Memory MIIM
Control Signals
System Clock
156MHz
System
Reset
v
CPU Interface
TM
External Use 9
Agenda
• Overview
• Switch Functions
− Block Diagram
− Forward Frames
MAC Interface
Ingress Processing
Shared Queue System
Egress Processing
− Learning
− Avoid loops
− System Interface
• Software
• Conclusion
TM
External Use 10
MAC Block
MAC
1G Port
Module
#0
MAC
1G Port
Module
#7
MAC
2.5G
Port
Module
#8
MAC
2.5G
Port
Module
#9
Port Module Interface
Ingress Statistics
IS1 TCAM
Frame Classification
(QoS VLAN)
Translation/Remarking
IS2 TCAM
Security Enforcement
MAC/IP Binding
DLB Policers
L2 Forwarding
Ingress Processing
Egress Statistics
ES0 TCAM
Rewriter
VLAN Translation
Push/pop tags
DSCP remapping
Egress Processing
Shared Memory
Pool
2Mbit
Memory Controller
Shapers and
Schedulers
Shared Queue System
MIIM
Controller
Register
Access
CPU Port
Module
Port #10
CPU Frame Extraction
and Rejection
System Bus
10 switch ports: 8x 1GbE + 2x 1GbE/ 2.5GbE
MIIM
Control Signals
System Clock
156MHz
System Reset
ATPG Enable
TM
External Use 11
MAC Functions
• VLAN Tag aware frame size check
• Frame Check Sequence (FCS) check
• Pause frame identification
• Energy Efficient Ethernet (EEE) IEEE 802.3az
TM
External Use 12
IEEE 802.3az
• Saves power during low data utilization periods
− Works in 100BASE-TX & 1000BASE-T speeds
− Additionally, new 10BASE-Te mode reduces 10Mbit transmit from 5Vpp to 3.3V
• When both link partners support 802.3az:
− during auto-negotiation, PHYs advertise their EEE idle capabilities
− ~0%-60% per port power is saved on both systems depending upon link utilization in the PHY; 0%-35% typical at the uP/switch/PHY level
Actual measurements will need to be made for T1040 + F104 (QSGMII PHY)
• Backward compatible to support non-802.3az PHYs
− However, for 802.3az to save energy, both link partners must support 802.3az
TM
External Use 13
Agenda
• Overview
• Switch Functions − Block Diagram
− Forward Frames MAC Interface
Ingress Processing
• Basic Classification
• Advanced Classification
• Policing
• L2 Forwarding
Shared Queue System
Egress Processing
− Learning
− Avoid loops
− System Interface
• Software
• Conclusion
TM
External Use 14
Ingress Processing Block
MAC
1G Port
Module
#0
MAC
1G Port
Module
#7
MAC
2.5G Port
Module
#8
MAC
2.5G Port
Module #9
Port Module Interface
Ingress Statistics
IS1 TCAM
Frame Classification (QoS
VLAN)
Translation/Remarking
IS2 TCAM
Security Enforcement
MAC/IP Binding
DLB Policers
L2 Forwarding
Ingress Processing
Egress Statistics
ES0 TCAM
Rewriter
VLAN Translation
Push/pop tags
DSCP remapping
Egress Processing
Shared Memory
Pool
2Mbit
Memory Controller
Shapers and Schedulers
Shared Queue System
MIIM
Controller
Register
Access
CPU Port
Module
Port #10
CPU Frame Extraction and
Rejection
System Bus
v TCAM Packet Processing Ingress and Egress
MIIM
Control Signals
System Clock
156MHz
System Reset
ATPG Enable
TM
External Use 15
Basic and Advanced Frame Classification
Frame Acceptance
Basic Classification
Untagged, S-tagged, C-tagged Special frames
VLAN
VLAN tag from frame Port VLAN
QoS, DP, and DSCP
PCP from tag (inner or outer) DSCP
from frame, trusted values only
Remap/rewrite of DSCP Port default
Aggregation Code
L2-L4 frame data
IS1 First Lookup
Advanced Classification
IS1 Second Lookup
IS1 Third Lookup
IS1
Frame Data
Discard
VLAN Tag Header
VLAN pop count
QoS Class
DP Level
DSCP Value
Aggregation Code
QoS Class
DP Level
Classified DSCP
Frame
Data
Classified VLAN
VLAP pop cnt
Custom lookup
Key:
• port mask
• inner and outer VLAN tags
• SMAC, DMAC
• SIP, DIP
• TCP/UDP ports
• frame type, DSCP, range
checkers
TM
External Use 16
Basic Classification
• Frame acceptance − Valid VLAN tags
− Valid MAC addresses
• VLAN classification − Untagged port are part of default VLAN
− Tagged ports classified based on TCI (PCP, DEI, and VID) and TPID (C-tag or S-tag)
• QoS, DP, and DSCP − frames colored green/yellow based on QoS and DP
• Aggregation Code − Based on information from MAC/IP address, TCP/IP port numbers
Preamble
Destination MAC Source MAC
Ether Type/S
ize Payload CRS/FCS Inter frame Gap SFD
1 2 3 4 5 6 7 8 1 2 3 4 5 6 1 2 3 4 5 6 1 2 1 . . n 1 2 3 4 1 2 3 4 5 6 7 8 9 10 11 12
Preamble
Destination MAC Source MAC 802.1Q Header
Ether Type/S
ize Payload CRS/FCS Inter frame Gap SFD
1 2 3 4 5 6 7 8 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 1 2 1 . . n 1 2 3 4 1 2 3 4 5 6 7 8 9 10 11 12
TPID 3 1 12
TCI
PCP DEI VID
TM
External Use 17
Advance Multi-stage Classification
• Three TCAMs with different purposes:
− IS1: L3-aware Ethernet classification
− IS2: Security handling (ACLs), other control protocols
− ES0: Egress handling (QoS, VLAN)
• TCAM sizes:
− IS1, IS2, ES0: entries depend on complexity of rules
• Classification Results
− QoS handling
− VLAN handling
− ACL actions
TM
External Use 18
IS1 Action
• Each IS1 lookup results in an action vector
• Following fields can be overwritten as action:
− DSCP value
− QoS value
− DP value
− PAG value
− VID (VLAN ID)
− FID (Filter identifier)
− PCP/DEI
− Custom ACE Type: Custom lookup in IS2
TM
External Use 19
Comprehensive Classification and Statistics
• Ingress classification with the following parameters:
− Mapping to policers
− Ingress statistics (bytes and frames)
Green/Yellow/Red Arrivals
Green/Yellow discards related to L2 forward and congestion avoidance (WRED)
• Egress TCAM lookup for per-port encapsulation and statistics
− Egress statistics (bytes and frames)
Green/Yellow Departures
TM
External Use 21
Policing & Shaping
• Supports policing
ingress and egress
traffic
• Supports shaping of
egress traffic
Time
Time Time
Time
Policing
Shaping
TM
External Use 22
Policers
• 3-levels of hierarchical policing
− Up to 3 policers per frame: Queue, Port, and VCAP IS2 policers
− MEF-compliant DLB policers
• A total of 163 DLB policers
− 88 queue policers
Eight policers per port
− 11 port policers
One policer for each port
− 64 VCAP IS2 policers
• Four global storm policers for all the ingress traffic
• Erroneous frames, pause frames or control frames are not
presented to policers
TM
External Use 23
Layer 2 Forwarding
• The switch has 8K MAC Table and 4K VLAN Table
• L2 forwarding done on the basis of :
− VLAN classification
− Security enforcement (result of IS2)
− MAC addresses
− Learning (disabled/unsecure/secure)
− Link aggregation
− Mirroring
TM
External Use 24
Agenda
• Overview
• Switch Functions
− Block Diagram
− Forward Frames
− Learning
MAC Addresses
VLAN
Multicast
− Avoid loops
− System Interface
• Software
• Conclusion
TM
External Use 25
Learning: MAC Addresses
• 8,192 MAC addresses
• 4,096 VLANs (IEEE 802.1Q)
• Wire speed hardware based learning
• Per-port CPU-based learning with option for secure CPU-based
learning
− Learning can also be disabled
− CPU can add entries in the MAC table
TM
External Use 26
Learning: VLAN
• Independent VLAN learning
− MAC addresses are learnt separately on each VLAN in independent
VLAN
• Shared VLAN learning
− A MAC table entry is shared among multiple VLANs
• Provider Bridging (VLAN Q-in-Q) support (IEEE 802.1ad)
− Choice between using inner or outer VLAN tags
TM
External Use 27
Learning: Multicast
• Upto 8,192 multicast groups
• Internet Group Management Protocol (IGMPv2/v3) support
• Multicast Listener Discovery (MLD) support
• Multicast Learning
− IGMP and MLD frames are copied to CPU
− CPU can create entries of multicast addresses in MAC table
− Multicast addresses in MAC table do no age
− Multicast frames with unknown addresses are forwarded to all the ports
TM
External Use 28
Agenda
• Overview
• Switch Functions
− Block Diagram
− Forward Frames
− Learning
− Avoid loops
Loop Problems
Spanning Tree Protocol
− System Interface
• Software
• Conclusion
TM
External Use 29
Spanning Tree Protocol
• IEEE802.1D standardized Spanning Tree Protocol
• Cisco introduced Per-VLAN Spanning Tree (PVST) and Per-VLAN
Spanning Tree Plus (PVST+)
• The IEEE defined Rapid Spanning Tree Protocol (RSTP) as 802.1w
and Multiple Spanning Tree Protocol (MSTP) in IEEE 802.1s
(later merged in IEEE 802.1Q-2005)
TM
External Use 30
STP Evolution
• Rapid Spanning Tree Protocol (RSTP)
− While STP takes 30-50 seconds to respond a topology change, RSTP
does it in few seconds (typically 2-6 seconds)
− RSTP added couple of new port classification
Alternate port: An alternate path to the root bridge
Back port: A backup/redundant path to a segment where another bridge port
already connects
• Multiple Spanning Tree Protocol (MSTP)
− MSTP configures a separate Spanning Tree for each VLAN group
− Balances port utilization
TM
External Use 31
STP Support
• BPDUs are terminated by the switch core
• The switch stack running on e5500 core responsible for
implementing the protocol
• The switch supports
− Redirecting BPDU frames to CPU
− Configuring ports as
State per
VLAN
BPDU
Reception
BPDU
Generation
Frame
Forwarding
SMAC
learning
Discarding Yes Yes No No
Learning (not
supported per
VLAN)
Yes Yes No Yes
Forwarding Yes Yes Yes Yes
TM
External Use 32
Agenda
• Overview
• Switch Functions
− Block Diagram
− Forward Frames
− Learning
− Avoid loops
− System Interface
• Software
• Conclusion
TM
External Use 33
CPU Interface Block
MAC
1G Port
Module
#0
MAC
1G Port
Module
#7
MAC
2.5G
Port
Module
#8
MAC
2.5G
Port
Module
#9
Port Module Interface
Ingress Statistics
IS1 TCAM
Frame Classification
(QoS VLAN)
Translation/Remarking
IS2 TCAM
Security Enforcement
MAC/IP Binding
DLB Policers
L2 Forwarding
Ingress Processing
Egress Statistics
ES0 TCAM
Rewriter
VLAN Translation
Push/pop tags
DSCP remapping
Egress Processing
Shared Memory
Pool
2Mbit
Memory Controller
Shapers and Schedulers
Shared Queue System
MIIM
Controller
Register
Access
CPU Port
Module
Port #10
CPU Frame Extraction
and Rejection
System Bus
MIIM
Control Signals
System Clock
156MHz
System
Reset
v
CPU Interface
TM
External Use 34
System Interface
• System bus interface (32b)
− Switch register access
• MIIM/MDIO master ctrl
− Connects to TBI SerDes Phy
− For external Phys, MIIM interface of
FMAN should be used
• Three control signals per port
− Link status
− Next page
− Autoneg status
MIIM
Controller
Register
Access
CPU Port
Module
Port #10
CPU Frame Extraction
and Rejection
System Bus
MIIM
Control Signals
System Clock
156MHz
System Reset
ATPG Enable
v
CPU Interface
TM
External Use 35
MAC Interfaces
• Port 0-7 : Eight 1G ports
− 10/100/1000 Mbps in full-duplex mode and 10/100 Mbps in half-duplex mode
− SerDes supports 6x 1G ports or 2x QSGMII ports
• Port 8-9: Two 2.5G ports
− These ports are connected to FMAN MAC
• Port 10: One internal CPU Port
− This is a logical port to be used as management interface
− CPU port is through the CPU extraction queue
Parse, Classify,Distribute
QManI/F
BManI/F
Fabric I/F FMan
QSGMIIQSGMII
8K MACs4K VLANs
RMON Counters
ManagementI/F
5GHz SERDES
2.5GMAC
1GMAC
1GMAC
1GMAC
1GMAC
1GMAC
1GMAC
1GMAC
1GMAC
1G
MA
C
1G
MA
C
1G
MA
C
1G/2.5GMAC
2.5GMAC
1G/2.5GMAC
TCAM 1K
L2- SwitchIEEE 1588v2
IEEE 1588v2
MACSec
SGM
II
SGM
II
SGM
II
SGM
II
SGM
II
SGM
II
TM
External Use 37
SW Background
• 2 different stacks/applications
− L2 control stack (Switch)
− L3/L4 network stack (Router)
• Legacy operation:
− Separate SoC – dedicated cores.
− Dedicated devices, drivers, even operating systems.
• T1040 operation:
− Share cores using affinity or partitions (AMP)
− Dedicated devices/portals for L2 and L3/L4 traffic.
− Clean separation of control and data-path traffic.
− Clean separation of configuration of L2 (switch driver) and L3/L4 traffic (network stack).
T1040: Switch + Router SoC (Option 1)
PP
C C
ore
1
PP
C C
ore
2
L2
Switch DPAA
Eth
Eth
Eth
Eth
Eth Eth
L2 Control
Stack
Switch
Driver
L3/L4
NWStack
Ethernet
Driver
Eth
Eth
Legacy Router SoC External Switch
MIP
S C
ore
PP
C C
ore
L2
Switch DPAA
Eth
Eth
Eth
Eth
Eth Eth
L2 Control
Stack
Switch
Driver
L3/L4
NWStack
Ethernet
Driver
Eth
Eth
Registers Portals
Registers Portals
L2 Control
traffic
L2 data
traffic
L3/L4
traffic
L3/L4
traffic
L2 Control
traffic
L2 data
traffic
TM
External Use 38
L2 Switch API
Linux Non-Linux Linux
L2 Switch FM
FM-
Lib
QM-Lib
BM-Lib
Eth / SEC Driver
Linux Network
Stack (L3/L4) U
se
r-Sp
ac
e
Kern
el G
PL
FLib
H
W
SEC-Lib
PME-Lib
ASF
Linux L3,L4,
SEC Control
Apps
LAN-LAN LAN-WAN L2 Control
N
on
-Lin
ux
L2-switch SW - T1040 – What we offer
QM/BM
User-S
pace G
PL
L2
Switch
Driver
Customer
L2 Stack
VTSS
SMBStaX
Linux
Customer
L2 Stack
Customer
Mgmt
Customer
Mgmt
VTSS Mgmt
API (GPL)
GPL
L2
Stack
JSON/RPC
Switch Configuration
TM
External Use 39
Summary
• The T1040 include an integrated gigabit Ethernet switch that
supports wire-speed switching for all packet sizes
− The enterprise class switch supports features like VLAN, QoS, STP,
IGMP etc
• Variety of Switch SW solutions to suit different customers
− Switch API
− Vitesse Stack
TM
External Use 40
Introducing The
QorIQ LS2 Family
Breakthrough,
software-defined
approach to advance
the world’s new
virtualized networks
New, high-performance architecture built with ease-of-use in mind Groundbreaking, flexible architecture that abstracts hardware complexity and
enables customers to focus their resources on innovation at the application level
Optimized for software-defined networking applications Balanced integration of CPU performance with network I/O and C-programmable
datapath acceleration that is right-sized (power/performance/cost) to deliver
advanced SoC technology for the SDN era
Extending the industry’s broadest portfolio of 64-bit multicore SoCs Built on the ARM® Cortex®-A57 architecture with integrated L2 switch enabling
interconnect and peripherals to provide a complete system-on-chip solution
TM
External Use 41
QorIQ LS2 Family Key Features
Unprecedented performance and
ease of use for smarter, more
capable networks
High performance cores with leading
interconnect and memory bandwidth
• 8x ARM Cortex-A57 cores, 2.0GHz, 4MB L2
cache, w Neon SIMD
• 1MB L3 platform cache w/ECC
• 2x 64b DDR4 up to 2.4GT/s
A high performance datapath designed
with software developers in mind
• New datapath hardware and abstracted
acceleration that is called via standard Linux
objects
• 40 Gbps Packet processing performance with
20Gbps acceleration (crypto, Pattern
Match/RegEx, Data Compression)
• Management complex provides all
init/setup/teardown tasks
Leading network I/O integration
• 8x1/10GbE + 8x1G, MACSec on up to 4x 1/10GbE
• Integrated L2 switching capability for cost savings
• 4 PCIe Gen3 controllers, 1 with SR-IOV support
• 2 x SATA 3.0, 2 x USB 3.0 with PHY
SDN/NFV
Switching
Data
Center
Wireless
Access
TM
External Use 42
See the LS2 Family First in the Tech Lab!
4 new demos built on QorIQ LS2 processors:
Performance Analysis Made Easy
Leave the Packet Processing To Us
Combining Ease of Use with Performance
Tools for Every Step of Your Design
TM
© 2014 Freescale Semiconductor, Inc. | External Use
www.Freescale.com