a deep dive on the qoriq t1040 l2 switch - nxp...

44
External Use TM A Deep Dive on the QorIQ T1040 L2 Switch FTF-NET-F0007 Feb.21.2014 Suchit Lepcha | Application Engineering Manager

Upload: phamdan

Post on 09-Apr-2018

313 views

Category:

Documents


7 download

TRANSCRIPT

External Use

TM

A Deep Dive on the QorIQ T1040

L2 Switch

FTF-NET-F0007

F e b . 2 1 . 2 0 1 4

Suchit Lepcha | Application Engineering Manager

TM

External Use 1

Agenda

• Overview

• Switch Functions

• Software

• Conclusion

TM

External Use 2

T1040 Architecture Processor

• 4x e5500, 64b, up to 1.4GHz

• Each with 256KB backside L2 cache

• 256KB Shared Platform Cache w/ECC

• Supports up to 64GB addressability (36 bit physical

addressing)

Memory SubSystem

• 32/64b DDR3L/4 Controller up to 1600MT/s

Cygnus Switch Fabric

High Speed Serial IO

• 4x PCIe Gen2 Controllers

• 2x SATA 2.0, 3Gb/s

• 2x USB 2.0 with PHY

Network IO

• FMan packet Parse/Classify/Distribute

• Lossless Flow Control, IEEE 1588

• Up to 4x 10/100/1000 Ethernet Controllers

• 8-Port Gigabit Ethernet Switch

• QUICC Engine

• HDLC, 2x TDM

• Green Energy Operation

• Fanless operation quad-core 1.2GHz

• Packet lossless deepsleep

• Programmable wake-on-packet

• Wake-on-timer/GPIO/USB/IRQ

Datapath Acceleration

• SEC- crypto acceleration

• PME- Reg-ex Pattern

Matcher

Device

• 28HPM Process

• 780-pin 3-2-3 C4 FC package

• 23x23mm, 0.8mm pitch

Power targets

• Enable Convection cooled

system design

Peripheral

Access Mgmt Unit

CoreNet™ Coherency Fabric

Watchpoint Cross

Trigger

Perf Monitor

CoreNet Trace

PAMU PAMU PAMU

Real Time Debug

Security Fuse

Processor

Security Monitor

16b IFC

Power Management

SD/MMC+

2x DUART

2x I2C

SPI, GPIO

64-bit

DDR2/3

Memory

Controller

32/64-bit

DDR3L/4

Memory

Controller

PAMU

Queue

Mgr.

Buffer

Mgr.

Pattern

Match

Engine

2.0

Security 5.x

(XoR,

CRC)

Parse, Classify,

Distribute

8-Lane 5GHz SERDES

2x USB 2.0 w/PHY 1G 1G 1G

1G 1G 1G

1G

1G

8 Port

Switch

1G 1G 1G

TD

M/H

DLC

QUICC

Engine

TD

M/H

DLC

256KB

Platform Cache

Power Architecture®

e5500

D-Cache I-Cache

256 KB

Backside

L2

Cache 32 KB 32 KB

PC

Ie

PC

Ie

2xDMA

PC

Ie

PC

Ie

SA

TA

2.0

SA

TA

2.0

DIU

1G

TM

External Use 3

L2 Switch Summary

• Fully non-blocking wire speed Ethernet switch with WRED

− 8x 1G user facing ports

− 2Mbit packet memory

− 8k MAC addresses

− 4k VLAN support

− Jumbo frame support (10kB)

− 8x QoS, 8x Queues/Port

TM

External Use 4

T1040: Gigabit Ethernet Switch

• Advanced Features

− Priority flow control - lossless

− Lower latency and shared buffer management

− Advanced classification, shaping and policing

• Power savings

− With support for latest standards including IEEE 802.3az Energy Efficient Ethernet (EEE)

• Cost savings

− Through switch integration, low-pin count QSGMII connectivity and port count / cost optimization

• Increased ROI - Lower TTM and high re-use

− Integrated solution kit with software reuse potential

• Support for Full featured L2 software stacks

Parse, Classify,Distribute

QManI/F

BManI/F

Fabric I/F FMan

QSGMIIQSGMII

8K MACs4K VLANs

RMON Counters

ManagementI/F

5GHz SERDES

2.5GMAC

1GMAC

1GMAC

1GMAC

1GMAC

1GMAC

1GMAC

1GMAC

1GMAC

1G

MA

C

1G

MA

C

1G

MA

C

2.5GMAC

2.5GMAC

2.5GMAC

TCAM 1K

L2- SwitchIEEE 1588v2

IEEE 1588v2

MACSec

SGM

II

SGM

II

SGM

II

SGM

II

SGM

II

SGM

II

Quad

PHY

Quad

PHY 4 x SGMII or 2 x SGMII or

TM

External Use 5

Packet Flow

Parse, Classify,Distribute

QManI/F

BManI/F

Fabric I/F FMan

QSGMIIQSGMII

8K MACs4K VLANs

RMON Counters

ManagementI/F

5GHz SERDES

2.5GMAC

1GMAC

1GMAC

1GMAC

1GMAC

1GMAC

1GMAC

1GMAC

1GMAC

1G

MA

C

1G

MA

C

1G

MA

C

2.5GMAC

2.5GMAC

2.5GMAC

TCAM 1K

L2- SwitchIEEE 1588v2

IEEE 1588v2

MACSec

SGM

II

SGM

II

SGM

II

SGM

II

SGM

II

SGM

II

Quad

PHY

Quad

PHY 4 x SGMII or 2 x SGMII or

E5500

Control Packets

Packet forwarding

WAN traffic

TM

External Use 6

Generic Enterprise Router Features

• Higher QoS – benefit – lossless behavior

− 8 queues/port

− PFC (Priority based Flow Control)

− Sophisticated classification

• Complex classification requirements - benefit – treat user traffic differently and offload the processor

• Higher ACL requirements - benefit – redirect/drop/deny access

• Delivering all of these in low power

Ban

dw

idth

, co

st a

nd

po

wer

Features

Enterprise

Gateway

SME

Enterprise

Router

TM

External Use 7

Agenda

• Overview

• Switch Functions

− Block Diagram

− Forward Frames

− Learning

− Avoid loops

− System Interface

• Software

• Conclusion

TM

External Use 8

Block Diagram

MAC

1G Port

Module

#0

MAC

1G Port

Module

#7

MAC

2.5G

Port

Module

#8

MAC

2.5G

Port

Module

#9

Port Module Interface

Ingress Statistics

IS1 TCAM

Frame Classification

(QoS VLAN)

Translation/Remarking

IS2 TCAM

Security Enforcement

MAC/IP Binding

DLB Policers

L2 Forwarding

Ingress Processing

Egress Statistics

ES0 TCAM

Rewriter

VLAN Translation

Push/pop tags

DSCP remapping

Egress Processing

Shared Memory

Pool

2Mbit

Memory Controller

Shapers and Schedulers

Shared Queue System

MIIM

Controller

Register

Access

CPU Port

Module

Port #10

CPU Frame Extraction

and Rejection

System Bus

10 switch ports: 8x 1GbE + 2x 1GbE/

2.5GbE

v Switch Core

v TCAM Packet Processing Ingress and Egress

v Buffer Memory MIIM

Control Signals

System Clock

156MHz

System

Reset

v

CPU Interface

TM

External Use 9

Agenda

• Overview

• Switch Functions

− Block Diagram

− Forward Frames

MAC Interface

Ingress Processing

Shared Queue System

Egress Processing

− Learning

− Avoid loops

− System Interface

• Software

• Conclusion

TM

External Use 10

MAC Block

MAC

1G Port

Module

#0

MAC

1G Port

Module

#7

MAC

2.5G

Port

Module

#8

MAC

2.5G

Port

Module

#9

Port Module Interface

Ingress Statistics

IS1 TCAM

Frame Classification

(QoS VLAN)

Translation/Remarking

IS2 TCAM

Security Enforcement

MAC/IP Binding

DLB Policers

L2 Forwarding

Ingress Processing

Egress Statistics

ES0 TCAM

Rewriter

VLAN Translation

Push/pop tags

DSCP remapping

Egress Processing

Shared Memory

Pool

2Mbit

Memory Controller

Shapers and

Schedulers

Shared Queue System

MIIM

Controller

Register

Access

CPU Port

Module

Port #10

CPU Frame Extraction

and Rejection

System Bus

10 switch ports: 8x 1GbE + 2x 1GbE/ 2.5GbE

MIIM

Control Signals

System Clock

156MHz

System Reset

ATPG Enable

TM

External Use 11

MAC Functions

• VLAN Tag aware frame size check

• Frame Check Sequence (FCS) check

• Pause frame identification

• Energy Efficient Ethernet (EEE) IEEE 802.3az

TM

External Use 12

IEEE 802.3az

• Saves power during low data utilization periods

− Works in 100BASE-TX & 1000BASE-T speeds

− Additionally, new 10BASE-Te mode reduces 10Mbit transmit from 5Vpp to 3.3V

• When both link partners support 802.3az:

− during auto-negotiation, PHYs advertise their EEE idle capabilities

− ~0%-60% per port power is saved on both systems depending upon link utilization in the PHY; 0%-35% typical at the uP/switch/PHY level

Actual measurements will need to be made for T1040 + F104 (QSGMII PHY)

• Backward compatible to support non-802.3az PHYs

− However, for 802.3az to save energy, both link partners must support 802.3az

TM

External Use 13

Agenda

• Overview

• Switch Functions − Block Diagram

− Forward Frames MAC Interface

Ingress Processing

• Basic Classification

• Advanced Classification

• Policing

• L2 Forwarding

Shared Queue System

Egress Processing

− Learning

− Avoid loops

− System Interface

• Software

• Conclusion

TM

External Use 14

Ingress Processing Block

MAC

1G Port

Module

#0

MAC

1G Port

Module

#7

MAC

2.5G Port

Module

#8

MAC

2.5G Port

Module #9

Port Module Interface

Ingress Statistics

IS1 TCAM

Frame Classification (QoS

VLAN)

Translation/Remarking

IS2 TCAM

Security Enforcement

MAC/IP Binding

DLB Policers

L2 Forwarding

Ingress Processing

Egress Statistics

ES0 TCAM

Rewriter

VLAN Translation

Push/pop tags

DSCP remapping

Egress Processing

Shared Memory

Pool

2Mbit

Memory Controller

Shapers and Schedulers

Shared Queue System

MIIM

Controller

Register

Access

CPU Port

Module

Port #10

CPU Frame Extraction and

Rejection

System Bus

v TCAM Packet Processing Ingress and Egress

MIIM

Control Signals

System Clock

156MHz

System Reset

ATPG Enable

TM

External Use 15

Basic and Advanced Frame Classification

Frame Acceptance

Basic Classification

Untagged, S-tagged, C-tagged Special frames

VLAN

VLAN tag from frame Port VLAN

QoS, DP, and DSCP

PCP from tag (inner or outer) DSCP

from frame, trusted values only

Remap/rewrite of DSCP Port default

Aggregation Code

L2-L4 frame data

IS1 First Lookup

Advanced Classification

IS1 Second Lookup

IS1 Third Lookup

IS1

Frame Data

Discard

VLAN Tag Header

VLAN pop count

QoS Class

DP Level

DSCP Value

Aggregation Code

QoS Class

DP Level

Classified DSCP

Frame

Data

Classified VLAN

VLAP pop cnt

Custom lookup

Key:

• port mask

• inner and outer VLAN tags

• SMAC, DMAC

• SIP, DIP

• TCP/UDP ports

• frame type, DSCP, range

checkers

TM

External Use 16

Basic Classification

• Frame acceptance − Valid VLAN tags

− Valid MAC addresses

• VLAN classification − Untagged port are part of default VLAN

− Tagged ports classified based on TCI (PCP, DEI, and VID) and TPID (C-tag or S-tag)

• QoS, DP, and DSCP − frames colored green/yellow based on QoS and DP

• Aggregation Code − Based on information from MAC/IP address, TCP/IP port numbers

Preamble

Destination MAC Source MAC

Ether Type/S

ize Payload CRS/FCS Inter frame Gap SFD

1 2 3 4 5 6 7 8 1 2 3 4 5 6 1 2 3 4 5 6 1 2 1 . . n 1 2 3 4 1 2 3 4 5 6 7 8 9 10 11 12

Preamble

Destination MAC Source MAC 802.1Q Header

Ether Type/S

ize Payload CRS/FCS Inter frame Gap SFD

1 2 3 4 5 6 7 8 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 1 2 1 . . n 1 2 3 4 1 2 3 4 5 6 7 8 9 10 11 12

TPID 3 1 12

TCI

PCP DEI VID

TM

External Use 17

Advance Multi-stage Classification

• Three TCAMs with different purposes:

− IS1: L3-aware Ethernet classification

− IS2: Security handling (ACLs), other control protocols

− ES0: Egress handling (QoS, VLAN)

• TCAM sizes:

− IS1, IS2, ES0: entries depend on complexity of rules

• Classification Results

− QoS handling

− VLAN handling

− ACL actions

TM

External Use 18

IS1 Action

• Each IS1 lookup results in an action vector

• Following fields can be overwritten as action:

− DSCP value

− QoS value

− DP value

− PAG value

− VID (VLAN ID)

− FID (Filter identifier)

− PCP/DEI

− Custom ACE Type: Custom lookup in IS2

TM

External Use 19

Comprehensive Classification and Statistics

• Ingress classification with the following parameters:

− Mapping to policers

− Ingress statistics (bytes and frames)

Green/Yellow/Red Arrivals

Green/Yellow discards related to L2 forward and congestion avoidance (WRED)

• Egress TCAM lookup for per-port encapsulation and statistics

− Egress statistics (bytes and frames)

Green/Yellow Departures

TM

External Use 20

IS2 Actions

• Permit

• Deny

• Police

• Redirect

• CPU copy

• Mirror

TM

External Use 21

Policing & Shaping

• Supports policing

ingress and egress

traffic

• Supports shaping of

egress traffic

Time

Time Time

Time

Policing

Shaping

TM

External Use 22

Policers

• 3-levels of hierarchical policing

− Up to 3 policers per frame: Queue, Port, and VCAP IS2 policers

− MEF-compliant DLB policers

• A total of 163 DLB policers

− 88 queue policers

Eight policers per port

− 11 port policers

One policer for each port

− 64 VCAP IS2 policers

• Four global storm policers for all the ingress traffic

• Erroneous frames, pause frames or control frames are not

presented to policers

TM

External Use 23

Layer 2 Forwarding

• The switch has 8K MAC Table and 4K VLAN Table

• L2 forwarding done on the basis of :

− VLAN classification

− Security enforcement (result of IS2)

− MAC addresses

− Learning (disabled/unsecure/secure)

− Link aggregation

− Mirroring

TM

External Use 24

Agenda

• Overview

• Switch Functions

− Block Diagram

− Forward Frames

− Learning

MAC Addresses

VLAN

Multicast

− Avoid loops

− System Interface

• Software

• Conclusion

TM

External Use 25

Learning: MAC Addresses

• 8,192 MAC addresses

• 4,096 VLANs (IEEE 802.1Q)

• Wire speed hardware based learning

• Per-port CPU-based learning with option for secure CPU-based

learning

− Learning can also be disabled

− CPU can add entries in the MAC table

TM

External Use 26

Learning: VLAN

• Independent VLAN learning

− MAC addresses are learnt separately on each VLAN in independent

VLAN

• Shared VLAN learning

− A MAC table entry is shared among multiple VLANs

• Provider Bridging (VLAN Q-in-Q) support (IEEE 802.1ad)

− Choice between using inner or outer VLAN tags

TM

External Use 27

Learning: Multicast

• Upto 8,192 multicast groups

• Internet Group Management Protocol (IGMPv2/v3) support

• Multicast Listener Discovery (MLD) support

• Multicast Learning

− IGMP and MLD frames are copied to CPU

− CPU can create entries of multicast addresses in MAC table

− Multicast addresses in MAC table do no age

− Multicast frames with unknown addresses are forwarded to all the ports

TM

External Use 28

Agenda

• Overview

• Switch Functions

− Block Diagram

− Forward Frames

− Learning

− Avoid loops

Loop Problems

Spanning Tree Protocol

− System Interface

• Software

• Conclusion

TM

External Use 29

Spanning Tree Protocol

• IEEE802.1D standardized Spanning Tree Protocol

• Cisco introduced Per-VLAN Spanning Tree (PVST) and Per-VLAN

Spanning Tree Plus (PVST+)

• The IEEE defined Rapid Spanning Tree Protocol (RSTP) as 802.1w

and Multiple Spanning Tree Protocol (MSTP) in IEEE 802.1s

(later merged in IEEE 802.1Q-2005)

TM

External Use 30

STP Evolution

• Rapid Spanning Tree Protocol (RSTP)

− While STP takes 30-50 seconds to respond a topology change, RSTP

does it in few seconds (typically 2-6 seconds)

− RSTP added couple of new port classification

Alternate port: An alternate path to the root bridge

Back port: A backup/redundant path to a segment where another bridge port

already connects

• Multiple Spanning Tree Protocol (MSTP)

− MSTP configures a separate Spanning Tree for each VLAN group

− Balances port utilization

TM

External Use 31

STP Support

• BPDUs are terminated by the switch core

• The switch stack running on e5500 core responsible for

implementing the protocol

• The switch supports

− Redirecting BPDU frames to CPU

− Configuring ports as

State per

VLAN

BPDU

Reception

BPDU

Generation

Frame

Forwarding

SMAC

learning

Discarding Yes Yes No No

Learning (not

supported per

VLAN)

Yes Yes No Yes

Forwarding Yes Yes Yes Yes

TM

External Use 32

Agenda

• Overview

• Switch Functions

− Block Diagram

− Forward Frames

− Learning

− Avoid loops

− System Interface

• Software

• Conclusion

TM

External Use 33

CPU Interface Block

MAC

1G Port

Module

#0

MAC

1G Port

Module

#7

MAC

2.5G

Port

Module

#8

MAC

2.5G

Port

Module

#9

Port Module Interface

Ingress Statistics

IS1 TCAM

Frame Classification

(QoS VLAN)

Translation/Remarking

IS2 TCAM

Security Enforcement

MAC/IP Binding

DLB Policers

L2 Forwarding

Ingress Processing

Egress Statistics

ES0 TCAM

Rewriter

VLAN Translation

Push/pop tags

DSCP remapping

Egress Processing

Shared Memory

Pool

2Mbit

Memory Controller

Shapers and Schedulers

Shared Queue System

MIIM

Controller

Register

Access

CPU Port

Module

Port #10

CPU Frame Extraction

and Rejection

System Bus

MIIM

Control Signals

System Clock

156MHz

System

Reset

v

CPU Interface

TM

External Use 34

System Interface

• System bus interface (32b)

− Switch register access

• MIIM/MDIO master ctrl

− Connects to TBI SerDes Phy

− For external Phys, MIIM interface of

FMAN should be used

• Three control signals per port

− Link status

− Next page

− Autoneg status

MIIM

Controller

Register

Access

CPU Port

Module

Port #10

CPU Frame Extraction

and Rejection

System Bus

MIIM

Control Signals

System Clock

156MHz

System Reset

ATPG Enable

v

CPU Interface

TM

External Use 35

MAC Interfaces

• Port 0-7 : Eight 1G ports

− 10/100/1000 Mbps in full-duplex mode and 10/100 Mbps in half-duplex mode

− SerDes supports 6x 1G ports or 2x QSGMII ports

• Port 8-9: Two 2.5G ports

− These ports are connected to FMAN MAC

• Port 10: One internal CPU Port

− This is a logical port to be used as management interface

− CPU port is through the CPU extraction queue

Parse, Classify,Distribute

QManI/F

BManI/F

Fabric I/F FMan

QSGMIIQSGMII

8K MACs4K VLANs

RMON Counters

ManagementI/F

5GHz SERDES

2.5GMAC

1GMAC

1GMAC

1GMAC

1GMAC

1GMAC

1GMAC

1GMAC

1GMAC

1G

MA

C

1G

MA

C

1G

MA

C

1G/2.5GMAC

2.5GMAC

1G/2.5GMAC

TCAM 1K

L2- SwitchIEEE 1588v2

IEEE 1588v2

MACSec

SGM

II

SGM

II

SGM

II

SGM

II

SGM

II

SGM

II

TM

External Use 36

Agenda

• Overview

• Switch Functions

• Software

• Conclusion

TM

External Use 37

SW Background

• 2 different stacks/applications

− L2 control stack (Switch)

− L3/L4 network stack (Router)

• Legacy operation:

− Separate SoC – dedicated cores.

− Dedicated devices, drivers, even operating systems.

• T1040 operation:

− Share cores using affinity or partitions (AMP)

− Dedicated devices/portals for L2 and L3/L4 traffic.

− Clean separation of control and data-path traffic.

− Clean separation of configuration of L2 (switch driver) and L3/L4 traffic (network stack).

T1040: Switch + Router SoC (Option 1)

PP

C C

ore

1

PP

C C

ore

2

L2

Switch DPAA

Eth

Eth

Eth

Eth

Eth Eth

L2 Control

Stack

Switch

Driver

L3/L4

NWStack

Ethernet

Driver

Eth

Eth

Legacy Router SoC External Switch

MIP

S C

ore

PP

C C

ore

L2

Switch DPAA

Eth

Eth

Eth

Eth

Eth Eth

L2 Control

Stack

Switch

Driver

L3/L4

NWStack

Ethernet

Driver

Eth

Eth

Registers Portals

Registers Portals

L2 Control

traffic

L2 data

traffic

L3/L4

traffic

L3/L4

traffic

L2 Control

traffic

L2 data

traffic

TM

External Use 38

L2 Switch API

Linux Non-Linux Linux

L2 Switch FM

FM-

Lib

QM-Lib

BM-Lib

Eth / SEC Driver

Linux Network

Stack (L3/L4) U

se

r-Sp

ac

e

Kern

el G

PL

FLib

H

W

SEC-Lib

PME-Lib

ASF

Linux L3,L4,

SEC Control

Apps

LAN-LAN LAN-WAN L2 Control

N

on

-Lin

ux

L2-switch SW - T1040 – What we offer

QM/BM

User-S

pace G

PL

L2

Switch

Driver

Customer

L2 Stack

VTSS

SMBStaX

Linux

Customer

L2 Stack

Customer

Mgmt

Customer

Mgmt

VTSS Mgmt

API (GPL)

GPL

L2

Stack

JSON/RPC

Switch Configuration

TM

External Use 39

Summary

• The T1040 include an integrated gigabit Ethernet switch that

supports wire-speed switching for all packet sizes

− The enterprise class switch supports features like VLAN, QoS, STP,

IGMP etc

• Variety of Switch SW solutions to suit different customers

− Switch API

− Vitesse Stack

TM

External Use 40

Introducing The

QorIQ LS2 Family

Breakthrough,

software-defined

approach to advance

the world’s new

virtualized networks

New, high-performance architecture built with ease-of-use in mind Groundbreaking, flexible architecture that abstracts hardware complexity and

enables customers to focus their resources on innovation at the application level

Optimized for software-defined networking applications Balanced integration of CPU performance with network I/O and C-programmable

datapath acceleration that is right-sized (power/performance/cost) to deliver

advanced SoC technology for the SDN era

Extending the industry’s broadest portfolio of 64-bit multicore SoCs Built on the ARM® Cortex®-A57 architecture with integrated L2 switch enabling

interconnect and peripherals to provide a complete system-on-chip solution

TM

External Use 41

QorIQ LS2 Family Key Features

Unprecedented performance and

ease of use for smarter, more

capable networks

High performance cores with leading

interconnect and memory bandwidth

• 8x ARM Cortex-A57 cores, 2.0GHz, 4MB L2

cache, w Neon SIMD

• 1MB L3 platform cache w/ECC

• 2x 64b DDR4 up to 2.4GT/s

A high performance datapath designed

with software developers in mind

• New datapath hardware and abstracted

acceleration that is called via standard Linux

objects

• 40 Gbps Packet processing performance with

20Gbps acceleration (crypto, Pattern

Match/RegEx, Data Compression)

• Management complex provides all

init/setup/teardown tasks

Leading network I/O integration

• 8x1/10GbE + 8x1G, MACSec on up to 4x 1/10GbE

• Integrated L2 switching capability for cost savings

• 4 PCIe Gen3 controllers, 1 with SR-IOV support

• 2 x SATA 3.0, 2 x USB 3.0 with PHY

SDN/NFV

Switching

Data

Center

Wireless

Access

TM

External Use 42

See the LS2 Family First in the Tech Lab!

4 new demos built on QorIQ LS2 processors:

Performance Analysis Made Easy

Leave the Packet Processing To Us

Combining Ease of Use with Performance

Tools for Every Step of Your Design