advanced switching overview
TRANSCRIPT
-
8/4/2019 Advanced Switching Overview
1/116
Copyright 2004, PCI-SIG, All Rights Reserved 1
Advanced Switching OverviewAdvanced Switching Overview
Seth Zirin Joe Bennett
Principal Engineer Principal Engineer
Intel Corporation Intel Corporation
ASI-SIG FMS WG Chair ASI-SIG PI-8 WG Chair
Seth Zirin Joe Bennett
Principal Engineer Principal Engineer
Intel Corporation Intel Corporation
ASI-SIG FMS WG Chair ASI-SIG PI-8 WG Chair
-
8/4/2019 Advanced Switching Overview
2/116
Copyright 2004, PCI-SIG, All Rights Reserved 2PCI-SIG Developers Conference
Advanced Switching OverviewAdvanced Switching Overview ASI Technical Introduction
PI-8 Technical Review
Advanced SwitchingAdvanced SwitchingPCI Express*PCI Express*
Star Dual Star Mesh
*Other names and brands may be claimed as the property of others
-
8/4/2019 Advanced Switching Overview
3/116
Copyright 2004, PCI-SIG, All Rights Reserved 3
ASI Technical IntroductionASI Technical Introduction
Seth ZirinPrincipal Engineer
Intel Corporation
ASI-SIG FMS Workgroup Chair
Seth ZirinPrincipal Engineer
Intel Corporation
ASI-SIG FMS Workgroup Chair
-
8/4/2019 Advanced Switching Overview
4/116
Copyright 2004, PCI-SIG, All Rights Reserved 4PCI-SIG Developers Conference
AgendaAgenda Introduction
Core AS Architecture
Protocol Interfaces
Configuration Structures
Software & Management
Advanced SwitchingAdvanced SwitchingPCI ExpressPCI Express
Star Dual Star Mesh
-
8/4/2019 Advanced Switching Overview
5/116
Copyright 2004, PCI-SIG, All Rights Reserved 5PCI-SIG Developers Conference
2.5 Gbps Copper
Point-to-Point Data Link
PCI Express Protocol
PCI Express* Layer ReusePCI Express* Layer Reuse
PCI Express
ProtocolAS Protocol
PCI PnP Model
(init, enum, conf)
PCI PnP Model
(init, enum, conf)
AS Fabric Model
(init, enum, conf)
PCI* Software
PCI* Software AS Software
Physical
Data Link
Transaction
Software
Serial, Dual-Simplex
Reliable Transport
Packet Based
Any Protocol
Any Topology
Peer-to-Peer / Multicast
Quality of Service
PCI Express Advanced Switching
*Other names and brands may be claimed as the property of others
-
8/4/2019 Advanced Switching Overview
6/116Copyright 2004, PCI-SIG, All Rights Reserved 6PCI-SIG Developers Conference
Protocol EncapsulationProtocol Encapsulation
Core Management Encapsulations Device Configuration
Events
Chained Encapsulations Multicast
Flow Labeling
SAR Functions
Optional Encapsulations PCI Express Tunneling
Load/Store Push-Pull Queuing/Messaging
Socket Data Transport
Extensible Via Vendor/End-User Encapsulations Hardware or Software Implementations
PI-4 PI-5
PI-1 PI-2PI-0
SQ SDTSLSPI-8
-
8/4/2019 Advanced Switching Overview
7/116Copyright 2004, PCI-SIG, All Rights Reserved 7PCI-SIG Developers Conference
Path-Based RoutingPath-Based Routing
EP3
EP4
SW4
AS
SwitchSW3
EP1
EP2
SW1
SW2
Source Destination Device Path Turn List
EP1 EP2 SW1 2
EP1 EP3 SW1, SW2, SW4 0, 3, 0
EP1 EP3 SW1, SW3, SW4 1, 3, 1
-
8/4/2019 Advanced Switching Overview
8/116
-
8/4/2019 Advanced Switching Overview
9/116Copyright 2004, PCI-SIG, All Rights Reserved 9PCI-SIG Developers Conference
Physical Layer
Reliable Data Link Layer
PCI ExpressTransaction Layer
AS Endpoint VariationsAS Endpoint Variations
AS Transaction Layer
PI-8Core
PIsSLS SQ
Other
PIs
ArbitrationArbitration
Host Interface Host InterfaceHost Interface
PCI
Device Drivers
Advanced Switching
Device Drivers
-
8/4/2019 Advanced Switching Overview
10/116Copyright 2004, PCI-SIG, All Rights Reserved 10PCI-SIG Developers Conference
AgendaAgenda
Introduction
Core AS ArchitectureCore AS Architecture
Protocol Interfaces
Configuration Structures
Software & Management
Advanced SwitchingAdvanced SwitchingPCI ExpressPCI Express
Star Dual Star Mesh
-
8/4/2019 Advanced Switching Overview
11/116Copyright 2004, PCI-SIG, All Rights Reserved 11PCI-SIG Developers Conference
Link Layer EnhancementsLink Layer Enhancements
Modified PCI Express Link State Model
Protected State for Fabric Access Security
Credit-Based Flow Control
Single Credit Category for Headers & Payloads
Single Credit Category for Completions & Writes
Credit Denomination is 64 Bytes (Not 32 Bytes)
AS Leverages PHY & Link of PCI ExpressAS Leverages PHY & Link of PCI Express
-
8/4/2019 Advanced Switching Overview
12/116Copyright 2004, PCI-SIG, All Rights Reserved 12PCI-SIG Developers Conference
AS Packet FramingAS Packet Framing
PHY Layer
Link Layer
Transaction Layer
Frame
Frame
SEQ#
AS
HeaderP-CRC
Payload
(0-2KB)L-CRC
-
8/4/2019 Advanced Switching Overview
13/116Copyright 2004, PCI-SIG, All Rights Reserved 13PCI-SIG Developers Conference
Transaction Layer ProtocolTransaction Layer Protocol
Three General Classes of Protocol in AS
Native Protocols
Management, Congestion Control,Segmentation/Reassembly, etc.
Encapsulated Protocols
e.g., PCI Express, Ethernet, etc.
Proprietary Protocols
Vendor-Provided for Closed Systems
Protocol Interface (PI) Header Field
Defines Payload Content & Format
Payload Interpreted Only by Endpoints
Multiple Simultaneous Encapsulations
Per VC, Link, Endpoint, etc.
AS Payload
(PI Defines Format)
AS PayloadAS Payload
(PI Defines Format)(PI Defines Format)
AS HeaderAS HeaderAS HeaderPIPI
-
8/4/2019 Advanced Switching Overview
14/116Copyright 2004, PCI-SIG, All Rights Reserved 14PCI-SIG Developers Conference
General AS Packet HeaderGeneral AS Packet Header
FECN: Forward Explicit Congestion NotificationTS: Type Specific
OO: Ordered-Only
PCRC: Payload CRC
P: Perishable (Discard Eligibility)
PI: Protocol Interface
D: Direction (Forward / Reverse)
1
PI
6
P
7
P
C
R
C
891
0
2
Turn PoolD
Traffic
Class
O
O
T
S
Credits
Required
F
E
C
N
Turn PointerHeader CRC
03451
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
2
0
2
1
2
2
2
3
2
4
2
5
2
6
2
7
2
8
2
9
3
0
3
1
-
8/4/2019 Advanced Switching Overview
15/116Copyright 2004, PCI-SIG, All Rights Reserved 15PCI-SIG Developers Conference
Enet
PI-XEnet
PI-X
Protocol Interface (PI) ChainingProtocol Interface (PI) Chaining
PIs Can be Chained Together Within Single Packet
Multicast
Congestion/Flow ID
SAR
Example: SAR of Ethernet Packet
AS
Header
Enet
PI-XPayload CRC
SAR
PI-2
AS
HeaderCRCPayload
SAR
PI-2
AS
HeaderCRCPayload
-
8/4/2019 Advanced Switching Overview
16/116
Copyright 2004, PCI-SIG, All Rights Reserved 16PCI-SIG Developers Conference
1
3
5
0 2
46
Switch #1:Switch #1:
8 Port8 PortIngressIngress
0
1
35
2
4Switch #2:Switch #2:16 Port16 Port
Path RoutingPath Routing
Turn Pool is Unique Signature
Simplifies Switches: No Unicast Lookup Tables or CAMs
Packets Easily Returned to Sender Instead of Dropped
Ideal for Redundancy with Extremely Fast Failover
0101100 F
EgressEgress0101100 F
AS HeaderAS HeaderTurn Pool
Pointer Dir
0101100 F
Direction Bit
F - ForwardB - Backward
-
8/4/2019 Advanced Switching Overview
17/116
Copyright 2004, PCI-SIG, All Rights Reserved 17PCI-SIG Developers Conference
2
0
6
3 1
5
SwitchSwitch#1#1
4
0101100 B
SwitchSwitch#2#2
Fault ConditionsFault Conditions
(2) Switch #2 Re-InjectsPacket, Normally
(4) Source Receivesthe Packet it Sent
(1) Direction Bitis Flipped(3) Packet Follows the
Reverse Path Backto its Source
BadLink
Reliable Link Layer Detects Inability to Forward Packet
Packets Can be Automatically Routed Back to Source
0101100 FB
PacketPacketEgressEgress
0101100 B
4
3
1
5
2
0
AS HeaderAS HeaderTurn Pool
Pointer Dir
-
8/4/2019 Advanced Switching Overview
18/116
Copyright 2004, PCI-SIG, All Rights Reserved 18PCI-SIG Developers Conference
Route RedundancyRoute Redundancy
Source Destination Device Path Turn List
PrimaryPrimary EP1 EP3 SW1, SW2, SW4 0, 3, 0
BackupBackup EP1 EP3 SW1, SW3, SW4 1, 3, 1
BackupBackup EP1 EP3 SW1, SW2, SW3, SW4 0, 4, 1, 1BackupBackup EP1 EP3 SW1, SW3, SW2, SW4 1, 1, 4, 0
EP3
EP4
SW4
AS
SwitchSW3
EP1
EP2
SW1
SW2
-
8/4/2019 Advanced Switching Overview
19/116
Copyright 2004, PCI-SIG, All Rights Reserved 19PCI-SIG Developers Conference
AS Quality of ServiceAS Quality of Service
Same TC/VC Mechanism as PCI Express
Traffic Class (TC): Packet Tags for Traffic Differentiation
3-bit Tag is Invariant Through the Fabric
TCs are Mapped to VCs Cost/Performance Flexibility
AS Supports Deadlock-Free Encapsulation of PCI Express
Output
Port
VC #0
VC #N
Map Packets
to VC Queues
Based on3-Bit TC
Packet
Ingress
1 - 8 VC Queues(0 N 7)
AS HeaderAS Header
3-bit
Traffic Class
Flexible Differentiation of TrafficFlexible Differentiation of Traffic
-
8/4/2019 Advanced Switching Overview
20/116
Copyright 2004, PCI-SIG, All Rights Reserved 20PCI-SIG Developers Conference
Virtual Channels (VC)Virtual Channels (VC) AS Defines Three VC Types
Unicast VCs with Bypass Capability
Required for Load/Store Protocols(e.g., PCI Express, SLS)
Architecture Support for 8 BVCs
Minimum Packet Size of 192 Bytes
Optional Unicast VCs with No Bypass Comms-Oriented, Ordered-Only Flows
Architecture Support for 8 OVCs
Minimum Packet Size of 64 Bytes
Optional Multicast VCs Also Ordered-Only Flows
Architecture Support for 8 MVCs
Minimum Packet Size of 64 Bytes
Ordered
Queue
Bypass
Queue
Ordered
Queue
Multicast
Queue
-
8/4/2019 Advanced Switching Overview
21/116
Copyright 2004, PCI-SIG, All Rights Reserved 21PCI-SIG Developers Conference
Traffic Classes (TC)Traffic Classes (TC)
TC Used to Group Flows of Traffic
Enables Differentiated Service Through Fabric
Eight Traffic Classes per VC Type
TC Value Carried End-to-End in AS Header
Fixed TC to VC Mappings Within VC Type Independent Mappings per VC Type
(Bypass, Ordered, Multicast)
Mapping is Function of Active Number of VCs on Port
-
8/4/2019 Advanced Switching Overview
22/116
Copyright 2004, PCI-SIG, All Rights Reserved 22PCI-SIG Developers Conference
VC0
VC1
VC2VC3
VC4
VC5
VC6
VC7
TC-VC Mapping ExamplesTC-VC Mapping Examples
Switch
TC[0:1]
TC[2:3]
TC[4-6]
TC7
Endnode
TC[0:6]
TC[7]
TC[0:6]
TC[7]
Link
TC0
TC1
TC6
TC2
TC7
TC3
TC4
TC5
TC0
TC1
TC6
TC2
TC7
TC3
TC4
TC5
Endnodes
TC[0:7]TC[0:7]
VC3
VC2
VC1
VC0
TC[2:3]
TC[4-6]
TC7
TC[0:1]
VC1
VC0
VC0
-
8/4/2019 Advanced Switching Overview
23/116
Copyright 2004, PCI-SIG, All Rights Reserved 23PCI-SIG Developers Conference
AS MulticastAS Multicast Maximum of 64K Multicast Groups, Minimum of 1
Switch Lookup Tables Specify Output Ports 16-bit Multicast Group ID Field in Packet Header
Software is Required for Setup, Supervision & Teardown
Endpoints Can Write, Listen, or Both Single or Multiple Writers or Listeners; Loopback Supported
Applications: Conferencing, Media Broadcast, Control,Management, Sync, Heartbeat, etc.
AS Switch
OutputPort
OutputPort
OutputPort
OutputPort
Replication
HeaderHeader Group
ID
PayloadPayload
AS Packet Index Ports
Multicast LUT
-
8/4/2019 Advanced Switching Overview
24/116
Copyright 2004, PCI-SIG, All Rights Reserved 24PCI-SIG Developers Conference
AS Multicast Packet HeaderAS Multicast Packet Header
FECN: Forward Explicit Congestion Notification
PCRC: Payload CRCP: Perishable (Discard Eligibility)
PI: Protocol Interface
R: Reflected
PI
(0000000b)P
PC
R
C
Traffic
Class00b
Credits
Required
FE
C
N
Turn PointerHeader CRC
Turn PoolR
01234567891
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
2
0
2
1
2
2
2
3
2
4
2
5
2
6
2
7
2
8
2
9
3
0
3
1
Secondary PIOrigin Specific DataMulticast Group Index
Turn Pool & Turn Pointer Built Along The WayTurn Pool & Turn Pointer Built Along The Way
-
8/4/2019 Advanced Switching Overview
25/116
Copyright 2004, PCI-SIG, All Rights Reserved 25PCI-SIG Developers Conference
AS Congestion ManagementAS Congestion Management
Enables Over-Subscription
Regulates Traffic Flows to Avoid Congesting Links & Components
Builds on PCI Express Base (VCs, Credit-Based Flow Control) Balances Performance & Cost
Minimizes Rather than Eliminates Congestion
Supports End-to-End CM via PIs and/or Upper-Layer Protocols
New Congestion Management Mechanisms
Status-Based, Per-TC Link Flow Control (SBFC)
Minimum Bandwidth Scheduler (Switch Egress Scheduling)
Endpoint Source Injection Rate Limiting
Normalized Control Interfaces for Interoperability
Vendor-Specific Implementation Options Possible
-
8/4/2019 Advanced Switching Overview
26/116
Copyright 2004, PCI-SIG, All Rights Reserved 26PCI-SIG Developers Conference
AS Congestion ManagementAS Congestion Management
ASFabric
ASTrans
PCI ExLink
PCI ExPhy
Ingress
AS
Fabric
ASTrans
PCI ExLink
PCI ExPhy
Ingress Stacks Egress Stacks
Switch Switch
Communication Flows
End to End Flow Control
Feedback (PEI defined & optional)
Tunneling Flows
AS Ingress Scheduled Flows
LocallyScheduled
Local Status
Feedback
VC Arbitrated
Flow
VC FC Credits
ASFlow CM
Model
Ingress
Flow
Multiplexing
TunnelingPI
TunneledProtocol
TunneledProtocol
TunnelingPI
TunnelingPI
TunneledProtocol
TunneledProtocol
TunnelingPI
LocallyScheduled
Local Status
Feedback
VC Arbitrated
Flow
VC FC Credits
ASFabric
ASTrans
PCI ExLink
PCI ExPhy
LocallyScheduled
Local Status
Feedback
VC Arbitrated
Flow
VC FC Credits
Egress
AS
Fabric
ASTrans
PCI ExLink
PCI ExPhy
Egress
Flow
De-Multiplexing
-
8/4/2019 Advanced Switching Overview
27/116
Copyright 2004, PCI-SIG, All Rights Reserved 27PCI-SIG Developers Conference
Credit-Based Flow ControlCredit-Based Flow Control
Per-VC Link Flow Control
Credit-Based
Same Mechanism as PCI Express
Nearest Neighbors Avoid Congesting Input Ports
Data Never Sent to Depleted Input Buffer Credit Exchanged Using DLLPs
One BVC per DLLP
Up to Two OVCs per DLLP Up to Two MVCs per DLLP
Credit Denomination
64 Bytes
-
8/4/2019 Advanced Switching Overview
28/116
Copyright 2004, PCI-SIG, All Rights Reserved 28PCI-SIG Developers Conference
Status-Based Flow ControlStatus-Based Flow Control
Per-TC Link Flow ControlStatus-Based
Allows Transmission When In/Out Path Through NextSwitch for TC is Un-Congested
Optional Normative
Nearest Neighbors Avoid Congesting Output PortsReported Per Output Port & TC CombinationExplicit XON-XOFF
Time-Based XOFF
Sent via New DLLP
Ordered-Only Traffic
Bypass Traffic Indirectly
-
8/4/2019 Advanced Switching Overview
29/116
Copyright 2004, PCI-SIG, All Rights Reserved 29PCI-SIG Developers Conference
Switch Egress Link SchedulingSwitch Egress Link Scheduling
Link VC Transmit Scheduling
Up to 7 BVC Queues, Fabric Management VC Queue,
8 OVC Queues & 4 MVC Queues Fabric Management VC (#7) is Highest Priority
Class of Service Queue (CSQ) Scheduler
VC Queues Serviced Based on Configured Weightings
Constrained by CBFC & SBFC
Optional Normative Minimum Bandwidth Scheduler VC Arbitration Table Scheduler
Can be Vendor-Specific
-
8/4/2019 Advanced Switching Overview
30/116
Copyright 2004, PCI-SIG, All Rights Reserved 30PCI-SIG Developers Conference
Switch Egress SchedulingSwitch Egress Scheduling
Strict Priority
Scheduler
CSQ
FMC Rate
Limiter
MinBW Scheduler
Egress Link Architecture
This Portion of the Scheduler
Provides ByPassable Queue Strict
Priority Service Over the Ordered
Queue (as Long as Bypassable Link
Credit is Available)
A CSQ for a BVC is
Comprised of an Ordered and
A Bypassable Sub-Queue
Bypassable UnicastOrdered-Only UnicastMulticastUnicastFabric Mgmt
Channel
(1 Queue)DLLPs
Packets From SEs Switching Function
Optional
SBFC Feedback
Link Layer Credit
Availability
DLLPs
TLPs
Packets To Egress Link
FMC
Queue
(TC7)
Strict PriorityOuter CSQ
Scheduler
Strict Priority
Scheduler
Bypassable
Ordered
DLLP
Queue
Egress
CSQs
Minimum BWInner CSQ Scheduler
-
8/4/2019 Advanced Switching Overview
31/116
Copyright 2004, PCI-SIG, All Rights Reserved 31PCI-SIG Developers Conference
Endpoint Injection Rate LimitingEndpoint Injection Rate Limiting
Connection Queues Provide Flow Isolation
Multiple Granularity Options for CQs
One CQ Per TC
One CQ Per TC/Destination Pair
Each CQ is Associated with a Token BucketToken Buckets Limit Packet Flow Rates
Enables Fine-Grained Rate Adaptation
Maximum of 64K CQs
-
8/4/2019 Advanced Switching Overview
32/116
Copyright 2004, PCI-SIG, All Rights Reserved 32PCI-SIG Developers Conference
Endpoint Queuing & SchedulingEndpoint Queuing & Scheduling
ReverseMapping
Connection Queues andToken Buckets
PCI ExpressPhysical Layer
Processing
AS Transaction Layer Processing
SourceScheduler
Destination
SourcePacket Handle
Ingress PacketPayload
Egress Packet Egress ASPacket
Per-TCQueues @ Egress
AS TLPs
CBFC Feedback (Credit Exhausted Indicators)
TC0-7
Per-TC Queues
Per-TC Queues
Per-TC Queues
Per-TC Status
Path1
Path2
PathN
SBFC Feedback
To/From
Switch Element
1 or More
Lanes
TurnPool& TC
PacketHandle
ConnectionQueue Select
AS Header
AS
TLPs
IngressHeader
Processor
DeMux DestinationScheduler
PCI ExpressLink Layer
ProcessingTC0-7
TC0-7
OptionalSBFC Per-TCStatus Table
TokenBucket
TokenBucket
TokenBucket
CM StateMachine
MappingTable
QueueSelect
IngressASPack
et
-
8/4/2019 Advanced Switching Overview
33/116
Copyright 2004, PCI-SIG, All Rights Reserved 33PCI-SIG Developers Conference
AgendaAgenda
Introduction
Core AS Architecture
Protocol InterfacesProtocol Interfaces
Configuration Structures
Software & Management
Advanced SwitchingAdvanced SwitchingPCI ExpressPCI Express
Star Dual Star Mesh
-
8/4/2019 Advanced Switching Overview
34/116
Copyright 2004, PCI-SIG, All Rights Reserved 34PCI-SIG Developers Conference
Protocol Encapsulation LayerProtocol Encapsulation Layer
Protocol Interface (PI) Headers Enable Transport ofMultiple Protocols Through Same AS Fabric
Several Protocol PIs are Defined by ASI-SIG
PCI Express (PI-8), SDT (PI-9), SLS (PI-10), SQ (PI-11)
Vendor Defined PIsPI-96 to PI-126
InvalidPI-127
ASI-SIG Defined PIsPI-8 to PI-95
Reserved for Future AS Fabric Management InterfacesPI-6 & PI-7
Event ReportingPI-5
Device ManagementPI-4
Reserved for Future AS Fabric Management InterfacesPI-3
Segmentation and Re-Assembly (SAR)PI-2
Flow Identification for Congestion ManagementPI-1
Multicast Path BuildingPI-0
DescriptionProtocol Interface
ChainingChaining
PIsPIs
-
8/4/2019 Advanced Switching Overview
35/116
Copyright 2004, PCI-SIG, All Rights Reserved 35PCI-SIG Developers Conference
PI-0 MulticastPI-0 Multicast
FECN: Forward Explicit Congestion Notification
PCRC: Payload CRC
P: Perishable (Discard Eligibility)
PI: Protocol Interface
R: Reflected
PI
(0000000b)P
PC
R
C
Traffic
Class00b
Credits
Required
FE
C
N
Turn PointerHeader CRC
Turn PoolR
01234567891
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
2
0
2
1
2
2
2
3
2
4
2
5
2
6
2
7
2
8
2
9
3
0
3
1
Secondary PIOrigin Specific DataMulticast Group Index
Turn Pool & Turn Pointer Built Along The WayTurn Pool & Turn Pointer Built Along The Way
-
8/4/2019 Advanced Switching Overview
36/116
Copyright 2004, PCI-SIG, All Rights Reserved 36PCI-SIG Developers Conference
PI-2 Segmentation & ReassemblyPI-2 Segmentation & Reassembly
SAR Required When PDU Size > Fabric MPS
PI-2 Defines Standardized Format
Support for In-Order & Out-of-Order Applied to Other PIs via Chaining
PDU Trailer Added for Integrity Check
An Endpoint Function, Switches do not SAR
ASSwitch
Fabric
Port 1
Port 2
Port 3In Order SAR
Out of Order SAR
No SAR
SAR
Ethernet 1518 Bytes
1 of 4 2 of 4 3 of 4 4 of 4
TDM TDM
Ethernet TDMTDM
TDMTDM TDM1 of 4 2 of 4 3 of 4
TDMTDM TDM 1 of 42 of 44 of 4
-
8/4/2019 Advanced Switching Overview
37/116
Copyright 2004, PCI-SIG, All Rights Reserved 37PCI-SIG Developers Conference
PI-4 Device ManagementPI-4 Device Management
Load/Store Oriented Configuration/Control
Used to Read/Write Device Control Structures
Privileged Operation Three Packet Types
Read Request
Read Completion Write Request
Supports Several Transfer Sizes
Masked Bytes Within Single DWord (Any One, Two, or Three Bytes)
Full DWords
Blocks of Two to Eight DWords
-
8/4/2019 Advanced Switching Overview
38/116
-
8/4/2019 Advanced Switching Overview
39/116
Copyright 2004, PCI-SIG, All Rights Reserved 39PCI-SIG Developers Conference
PI-5 Event Packet FormatsPI-5 Event Packet Formats
Short-Form PI-5 Events
Long-Form PI-5 Events
Class CodeSub-Class CodePhysical Port #ETIRV0
01234567891
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
2
0
2
1
2
2
2
3
2
4
2
5
2
6
2
7
2
8
2
9
3
0
3
1
Event Vector or Event-Specific Data
Event Data
Event Vector or Event-Specific Data
Event Data
Event Data
Event Data
Class CodeSub-Class CodePhysical Port #ETIRV1
01234567891
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
2
0
2
1
2
2
2
3
2
4
2
5
2
6
2
7
2
8
2
9
3
0
3
1
-
8/4/2019 Advanced Switching Overview
40/116
Copyright 2004, PCI-SIG, All Rights Reserved 40PCI-SIG Developers Conference
PCI Express Encapsulation (PI-8)PCI Express Encapsulation (PI-8)
PCI Express Devices Communicate Transparently
Tunneled via AS Fabric
No Changes to PCI Express Devices or Drivers
Dynamic Binding of Subsystems to Controlling Agents
Bus Number to Path for Configuration Accesses
Local Memory Aperture to Path for 32/64-bit Memory Accesses Local I/O Aperture to Path for 32-bit I/O Accesses
New Native AS Software is Required to Set Bindings
Can Reside Locally or Elsewhere Within AS Fabric Bindings Cause Standard Hot Plug Events
Stay Tuned for Lots More Detail on PIStay Tuned for Lots More Detail on PI--8 Later8 Later
-
8/4/2019 Advanced Switching Overview
41/116
-
8/4/2019 Advanced Switching Overview
42/116
Copyright 2004, PCI-SIG, All Rights Reserved 42PCI-SIG Developers Conference
MemoryMemory
RDMA Using PI-9 (SDT)RDMA Using PI-9 (SDT)
SDT
AS
Fabric
ASAS
FabricFabric
SDT
Destination
HandleArray
DescriptorList
DescriptorList
Buffer
Buffer
Source
HandleArray
DescriptorList
Application
Data
Data
DescriptorList
RDMARDMARDMA
-
8/4/2019 Advanced Switching Overview
43/116
-
8/4/2019 Advanced Switching Overview
44/116
Copyright 2004, PCI-SIG, All Rights Reserved 44PCI-SIG Developers Conference
SLS Direct BAR MappingSLS Direct BAR Mapping
Read or Write to
PCI Address Location
Falls Within
BAR Memory Region
Unique 32-bit
PCI Address Space
Agent A
Local Memory
4GB
1GB AS
Fabric
ASAS
FabricFabric
Multiple Independent Address Domains
BAR 0
BAR N
Unique 32-bit
PCI Address Space
4GB
Agent B
Local Memory
1GB
-
8/4/2019 Advanced Switching Overview
45/116
Copyright 2004, PCI-SIG, All Rights Reserved 45PCI-SIG Developers Conference
SLS Aperture UsageSLS Aperture Usage
Memory
Aperture 1
Aperture 2
Aperture 3
Aperture 4
Memory
Aperture 1
Aperture 5
Aperture 3
Aperture 4
ASFabricASASFabricFabric
Endpoint
Memory
Endpoint
Memory
Endpoint
Memory
Endpoint
Memory
Endpoint
EndpointEndpoint
Memory
-
8/4/2019 Advanced Switching Overview
46/116
Copyright 2004, PCI-SIG, All Rights Reserved 46PCI-SIG Developers Conference
PI-11 Simple Queuing (SQ)PI-11 Simple Queuing (SQ) Datagram-Style Communication, with Push & Pull
Supports Both Unicast & Multicast
Supports Path Protection & Access Keys Like SLS
Up to 4K Push & 4K Pull Queues Per Endpoint
PUSH
Push
Queues
Enqueue
Enqueue
ACK
PULL
DequeueRequest
Dequeue
Response
SQ-Enabled AS
Ports or Bridges
Pull
Queues
TargetTarget
Push
Queues
Pull
Queues
AS
Fabric
ASAS
FabricFabric
TargetTarget
-
8/4/2019 Advanced Switching Overview
47/116
Copyright 2004, PCI-SIG, All Rights Reserved 47PCI-SIG Developers Conference
AgendaAgenda
Introduction
Core AS Architecture
Protocol Interfaces
Configuration StructuresConfiguration Structures
Software & Management
Advanced SwitchingAdvanced SwitchingPCI ExpressPCI Express
Star Dual Star Mesh
-
8/4/2019 Advanced Switching Overview
48/116
Copyright 2004, PCI-SIG, All Rights Reserved 48PCI-SIG Developers Conference
Configuration StructuresConfiguration Structures
Supported by All Endpoints & Switches
Rich Set of Standardized & Normalized Capabilities
Streamlines Management of Diverse Constellation of Devices Designed to be Extensible
Local or Through-Fabric Access via PI-4
Enables Remote Device Configuration & Control Access Control Features
Multiple Apertures for Partitioning
Write & Read Privileges Per Aperture Fabric-Wide or Granted to Specific Devices
Includes Communication and Synchronization Features
Scratchpads, Semaphores & Doorbells
-
8/4/2019 Advanced Switching Overview
49/116
Copyright 2004, PCI-SIG, All Rights Reserved 49PCI-SIG Developers Conference
Device Configuration SpacesDevice Configuration Spaces
00h
3Fh
40h
FFh
100h
PCI Cap Record
AS Capability 1
AS Capability 2
AS Capability 3
AS Capability N
00h
Capability 1 Data
Capability 2 Data
Capability NData
Region 1
Region 2
Region 2
Aperture 0 Apertures 1-N
Device Header
-
8/4/2019 Advanced Switching Overview
50/116
Copyright 2004, PCI-SIG, All Rights Reserved 50PCI-SIG Developers Conference
Reserved Capability Ptr
ReservedSubsystem Vendor IDSubsystem ID
Device HeaderDevice Header
Vendor ID
Reserved
Class CodeReserved
Reserved
ReservedReserved
Reserved
Or
PCI Revision 2.3 Capability Records
Device ID
Revision ID
31 16 15 8 7 0
00h
04h
08h
0Ch
28h2Ch
30h
34h
38h3Ch
40h
FCh
-
8/4/2019 Advanced Switching Overview
51/116
Copyright 2004, PCI-SIG, All Rights Reserved 51PCI-SIG Developers Conference
Capability Structure Content
AS Capability IDVer.Next Cap. Offset
Capability-Specific Data
Capability Structure Content
AS Capability ID000 Ver.
Capability-Specific Data
Capability Table Pointer AP
AS Capability ID Header
Local Read FlagsLocal Write Flags
Capability Structure ChainingCapability Structure Chaining
Capability Structure Content
00h
04h
AS Capability IDNext Cap. Offset Ver.
31 16 15 01920
031 16 151920
00h
04h
4 3 031 16 151920
00h
04h
06h
-
8/4/2019 Advanced Switching Overview
52/116
Copyright 2004, PCI-SIG, All Rights Reserved 52PCI-SIG Developers Conference
Protocol Identifier R PIProtocol Identifier R PI
Protocol Identifier R PI
AS Capability ID Header
SW Entry Size
Protocol Interface CapabilityProtocol Interface Capability
31 16 15 0
00h
04h
08h
0ChSoftware PI Table Pointer AP
14 11 4 3
# SW EntriesR R # HW Entries
Hardware PI Table Pointer AP
8 7 6
Protocol Identifier31 04 3
R PI
Hardware PI Table Pointer AP
8 7 6
Protocol Identifier
31 04 3
R PI
Software PI Table Pointer AP
8 7 6
-
8/4/2019 Advanced Switching Overview
53/116
Copyright 2004, PCI-SIG, All Rights Reserved 53PCI-SIG Developers Conference
Reserved Ingress PortTurn Pool
ER
Reserved Ingress Port
Turn Pool
E
R
# of Entries
AS Capability ID Header
Local Read FlagsLocal Write Flags
Reserved Ingress Port
Turn PoolPath Read FlagsPath Write Flags
Configuration Space PermissionConfiguration Space Permission
Reserved
31 16 15 0
00h
04h
08h
0ChGlobal Read FlagsGlobal Write Flags
CSP Table Pointer AP 10h
E
R
31 16 15 8 7 030
12 11 4 3
-
8/4/2019 Advanced Switching Overview
54/116
Copyright 2004, PCI-SIG, All Rights Reserved 54PCI-SIG Developers Conference
AgendaAgenda
Introduction
Core AS Architecture
Protocol Interfaces
Configuration Structures
Software & ManagementSoftware & Management
Advanced SwitchingAdvanced SwitchingPCI ExpressPCI Express
Star Dual Star Mesh
-
8/4/2019 Advanced Switching Overview
55/116
Copyright 2004, PCI-SIG, All Rights Reserved 55PCI-SIG Developers Conference
ENDPOINT
AS Architectural ElementsAS Architectural Elements
EndpointsEndpoints
Touched by Software Can Host Software
Might Manage Fabric
SwitchesSwitches
Touched by Software
END
POINT
AS
ENDPOINT
AS
END
POINT
AS
-
8/4/2019 Advanced Switching Overview
56/116
Copyright 2004, PCI-SIG, All Rights Reserved 56PCI-SIG Developers Conference
InitializationInitializationInitialization
Fabric Initialization OverviewFabric Initialization Overview Phase 1: Post-Reset & Initialization (Hardware)
Link Training, Node Initialization, Credit Exchange
Phase 2: Master Election (Hardware)
Blind Broadcast (PI-0:0) Candidate Masters Negotiate for Ownership of
Fabric or Sub-Fabric Spanning Trees
All Devices Configured with Spanning-Tree Owner
Many-to-Many Relationship of Fabrics to Masters
Phase 3: Discovery & Configuration (Software)
Fabric Managers Discover Fabric Using PI-4 Reads
Devices Configured Using PI-4 Writes
(e.g, Permissions, Event Routing, etc.) Any Node Can Perform Independent Discovery
(Based on Configuration of Permission)
T
ime
T
ime
MasterElection
MasterMasterElectionElection
Discovery
&Configuration
DiscoveryDiscovery&&
ConfigurationConfiguration
-
8/4/2019 Advanced Switching Overview
57/116
Copyright 2004, PCI-SIG, All Rights Reserved 57PCI-SIG Developers Conference
Fabric Initialization OverviewFabric Initialization Overview Phase 4: Fabric Management (Software)
All Management Models Supported
Centralized, Distributed, Hybrids
Any Node Can Make Path Decisions
Multicast Group Management
Creation, Deletion, Join, Leave
Fault Management
HA, Redundancy, Fail-Over / Take-Over
Policy Management & Enforcement
Performance Management
e.g., Load Balancing
Congestion Management
InitializationInitializationInitialization
T
ime
T
ime
MasterElection
MasterMasterElectionElection
Discovery
&Configuration
DiscoveryDiscovery&&
ConfigurationConfiguration
FabricManagement
FabricFabric
ManagementManagement
-
8/4/2019 Advanced Switching Overview
58/116
Copyright 2004, PCI-SIG, All Rights Reserved 58PCI-SIG Developers Conference
Fabric ManagerFabric Manager Privileged Fabric Entity
Selected via PI-0:0
Known to All Switches and Endpoints Spanning Tree Owner (ST[0], ST[1]) Fabric Representative for Attached Fabrics
Discovery & Configuration via PI-4
Multicast Group Management
Supervision & Maintenance via PI-4/PI-5
Failover & Redundancy Coordination
Source for Fabric Support Services (e.g., Event & Topology Services)
EP
EP
AS
AS
EP
Fabric
Owner(FMGR)
EP
FabricFabric--Wide ResponsibilityWide Responsibility
-
8/4/2019 Advanced Switching Overview
59/116
Copyright 2004, PCI-SIG, All Rights Reserved 59PCI-SIG Developers Conference
Fabric Discovery ProcessFabric Discovery Process
Exploration via PI-4
Read-Only
Iterative
Breadth (or Depth) First
Limited by 31-bit Turnpool Reach
Knowledge Gained
Identification of Devices
Characteristics & Capabilities of Each Device
Fabric Topology
All Paths Between Device Pairs
Required to Compute Routes Through Fabric
EP
EP
AS
AS
EP
Fabric
Owner(FMGR)
EP
1
2
3
4
5
6
-
8/4/2019 Advanced Switching Overview
60/116
Copyright 2004, PCI-SIG, All Rights Reserved 60PCI-SIG Developers Conference
PCIDevice Driver
PCIDevice Driver
PCIDevice Driver
AS PortalDevice Driver
Advanced Switching Portal Driver
Exposed Advanced Switching Services APIs
Host Endpoint Software StackHost Endpoint Software Stack
Fabric
ManagementFunctions ASDevice Driver
ASDevice Driver
ASDevice Driver
Application InterfacesTCP/IP
Stack Interface
Platform
ManagementInfrastructure
Hardware
Software
ApplicationsApplications
-
8/4/2019 Advanced Switching Overview
61/116
Copyright 2004, PCI-SIG, All Rights Reserved 61PCI-SIG Developers Conference
Spectrum of PortabilitySpectrum of Portability
Device / Platform-Specific Device Drivers
Device-Independent AS / PI Device Drivers
Fabric
Management
Endpoint
Management Services
Application APIs & Libraries
End-User Applications End-User Device-Drivers
Core Software StructureCore Software Structure
User
API
API
APIAPI
API
API
User
API
API
-
8/4/2019 Advanced Switching Overview
62/116
Copyright 2004, PCI-SIG, All Rights Reserved 62PCI-SIG Developers Conference
ASI SIG AS SimulatorASI SIG AS Simulator Components Include:
Switches, Endpoints,Data Sources & Sinks,Co-Simulation Interface
Test Performance Corner Cases
Evaluate AlternateImplementations & Topologies
-
8/4/2019 Advanced Switching Overview
63/116
Copyright 2004, PCI-SIG, All Rights Reserved 63PCI-SIG Developers Conference
Software Simulator ExampleSoftware Simulator Example
Example trace of the announce packet after reset
Running... Releasing reset to all devices
Generating PI0:0 Announce packet (960)
Owner EUI = 0x00000020 00000000
1.901200044: -- EP_0, Packet(960) -> fabric1.901200044: -- Switch_0, PI0:0 Announce Packet(960) -> Update ST[0]
1.901200088: -- Switch_0[0,1], Packet(980) -> fabric
1.901200088: -- Switch_0[0,2], Packet(981) -> fabric
1.901200088: -- Switch_0[0,3], Packet(982) -> fabric
1.901200088: -- Switch_1, PI0:0 Announce Packet(981) -> Update ST[0]
1.901200132: -- Switch_1[0,1], Packet(983) -> fabric
1.901200132: -- Switch_1[0,2], Packet(984) -> fabric
1.901200132: -- Switch_1[0,3], Packet(985) -> fabric
1.901200132: -- Switch_2, PI0:0 Announce Packet(984) -> Update ST[0]
1.901200176: -- Switch_2[0,1], Packet(986) -> fabric
1.901200176: -- Switch_2[0,2], Packet(987) -> fabric
1.901200176: -- Switch_2[0,3], Packet(988) -> fabric
1.901200440: -- EP_1, PI0:0 Announce Packet(980) -> Update ST[0]
1.901200440: -- EP_7, PI0:0 Announce Packet(982) -> Update ST[0]
1.901200484: -- EP_2, PI0:0 Announce Packet(983) -> Update ST[0]1.901200484: -- EP_6, PI0:0 Announce Packet(985) -> Update ST[0]
1.901200528: -- EP_3, PI0:0 Announce Packet(986) -> Update ST[0]
1.901200528: -- EP_4, PI0:0 Announce Packet(987) -> Update ST[0]
1.901200528: -- EP_5, PI0:0 Announce Packet(988) -> Update ST[0]
-
8/4/2019 Advanced Switching Overview
64/116
Copyright 2004, PCI-SIG, All Rights Reserved 64
Please Return for the Second HalfPlease Return for the Second Half
Advanced Switching
Star Dual Star Mesh
-
8/4/2019 Advanced Switching Overview
65/116
Copyright 2004, PCI-SIG, All Rights Reserved 65
PI-8 Technical ReviewPI-8 Technical Review
Joe BennettPrincipal Engineer
Intel Corporation
ASI-SIG PI-8 Workgroup Chair
Joe BennettPrincipal Engineer
Intel Corporation
ASI-SIG PI-8 Workgroup Chair
-
8/4/2019 Advanced Switching Overview
66/116
Copyright 2004, PCI-SIG, All Rights Reserved 66PCI-SIG Developers Conference
AgendaAgenda
Architecture Overview
AS Registers for PI-8
Tunneling Mechanism
PCI Express Configuration / Hot Plug
PCI Express Virtual Channels Power Management
Reset / Training
PCI Express ErrorsAdvanced SwitchingAdvanced SwitchingPCI ExpressPCI Express
Star Dual Star Mesh
PCI E * t AS B id M d lPCI E * t AS B id M d l
-
8/4/2019 Advanced Switching Overview
67/116
Copyright 2004, PCI-SIG, All Rights Reserved 67PCI-SIG Developers Conference
Processor BoardProcessor BoardProcessor Board
RootRoot
ComplexComplex
PCIePCIe**
IO DeviceIO Device PCIePCIe**
IO DeviceIO Device
I/O BoardI/O BoardI/O Board
PCIePCIe**
IO DeviceIO Device
PCIePCIe**IO DeviceIO Device
I/O BoardI/O BoardI/O Board
PCIePCIe**
IO DeviceIO Device
PCIePCIe**
IO DeviceIO Device
PCI Express*-to-AS Bridge ModelPCI Express*-to-AS Bridge Model
Express-to-AS Bridge spawns virtual PCI Express ports
Each virtual port connected through the AS fabric to an AS-to-Expressbridge AS-to-Express bridge connects to other PCI Express device types
Express-to-AS bridge and AS-to-Express bridge bound by AS fabric
through a set of binding registers
ASAS
NodeNode
ASAS
NodeNodeASAS
NodeNode
ASAS--toto--
ExpressExpress
BridgeBridge
ASAS--toto--
ExpressExpress
BridgeBridge
ExpressExpress--
toto--ASAS
BridgeBridgeAS SwitchAS Switch
TopologyTopology
Host SwitchHost Switch
IO SwitchIO Switch
IO SwitchIO Switch
PCI ExpressPCI Express
SwitchSwitch
TopologyTopology
PCI ExpressPCI Express
SwitchSwitch
TopologyTopology
PCI ExpressPCI Express
SwitchSwitch
TopologyTopology
ChallengeChallenge
-
8/4/2019 Advanced Switching Overview
68/116
Copyright 2004, PCI-SIG, All Rights Reserved 68PCI-SIG Developers Conference
Challenge PCI Express Software TransparencyChallenge PCI Express Software Transparency
To be software transparent AS-PCI Express bridges must be identified as a valid PCI
Express component
Root port, switch, endpoint, or bridge to PCI/PCI-X/etc.
When devices added/removed from AS fabric, PCI Expresssoftware must be notified via a hot plug event to reconfigure thesub-tree
Solution: AS-to-PCI Express bridges are PCI Expressswitches Each bridge has full PCI configuration header
Including PCI-PM, MSI(X), Subsystem ID, and PCI Express
capability Optionally contains PCI Express Enhanced capabilities
Allows PCI Express software identification and hot plug with noPCI Express software implications
-
8/4/2019 Advanced Switching Overview
69/116
Copyright 2004, PCI-SIG, All Rights Reserved 69PCI-SIG Developers Conference
PCI Express*
to AS Bridge
PCI Express*PCI Express*
to AS Bridgeto AS Bridge
Virtual PCI BusVirtual PCI Bus
PCIPCI--PCIPCI
BridgeBridge
HPCHPC
PCIPCI--PCIPCI
BridgeBridge
HPCHPC
PCIPCI--PCIPCI
BridgeBridge
HPCHPC
PCIPCI--PCIPCI
BridgeBridge
HPCHPC
PCIPCI--PCIPCI
BridgeBridge
Express-to-AS Bridge Host SwitchExpress-to-AS Bridge Host Switch Upstream port connected to a root
complex
Through root port, PCI Express*switch, PCI Express* bridge
Downstream ports connected to oneor more AS ports
In the limit, all 256 downstream portsmay be connected to a single AS port
PCI Express* Link
Upstream port
Hot Plug Controllers
Advanced Switching Link
PIPI--8 Formatter8 Formatter
AS Transaction LayerAS Transaction LayerDownstream ports(1 minimum,
256 maximum)
-
8/4/2019 Advanced Switching Overview
70/116
Copyright 2004, PCI-SIG, All Rights Reserved 70PCI-SIG Developers Conference
AS to
PCI Express*
Bridge
AS toAS to
PCI Express*PCI Express*
BridgeBridge
AS-to-Express Bridge I/O SwitchAS-to-Express Bridge I/O Switch
Upstream port connected to the ASfabric
One or more downstream portsconnected to PCI Express ports
For further connections toendpoints or other PCI Expressswitches
Advanced Switching LinkUpstream port
PIPI--8 Formatter8 Formatter
AS Transaction LayerAS Transaction Layer
PCI Express* Links
Virtual PCI BusVirtual PCI Bus
PCIPCI--PCIPCI
BridgeBridge
HPCHPC
PCIPCI--PCIPCI
BridgeBridge
HPCHPC
PCIPCI--PCIPCI
BridgeBridge
HPCHPC
PCIPCI--PCIPCI
BridgeBridge
HPCHPC
PCIPCI--PCIPCI
BridgeBridge
Hot Plug Controllers
Downstream ports
(1 minimum,
256 maximum)
-
8/4/2019 Advanced Switching Overview
71/116
Copyright 2004, PCI-SIG, All Rights Reserved 71PCI-SIG Developers Conference
AgendaAgenda
Architecture Overview
AS Registers for PIAS Registers for PI--88
Tunneling Mechanism
PCI Express Configuration / Hot Plug
PCI Express Virtual Channels Power Management
Reset / Training
PCI Express ErrorsAdvanced SwitchingAdvanced SwitchingPCI ExpressPCI Express
Star Dual Star Mesh
-
8/4/2019 Advanced Switching Overview
72/116
Copyright 2004, PCI-SIG, All Rights Reserved 72PCI-SIG Developers Conference
Creating Virtual PCI Express LinkCreating Virtual PCI Express Link
To connect a downstream port of a Host Switchto an I/O switch, a binding register is needed
Creates an AS path to between components
Several bindings are needed in the host switch
One binding per downstream PCI Express port of the
Host Switch
One for the upstream port of the IO Switch
Binding tables exist in AS configuration space,configured by the AS fabric manager
-
8/4/2019 Advanced Switching Overview
73/116
Copyright 2004, PCI-SIG, All Rights Reserved 73PCI-SIG Developers Conference
Host Switch CapabilityHost Switch Capability
Capability ID = 0000hCapability ID = 0000hReservedReserved
Port 0 TurnPort 0 Turn
PointerPointerRsvdRsvd
# of Routes# of Routes
P0P0EgressEgress
BB
EE ReservedReserved
RR Port 0 Request Turn PoolPort 0 Request Turn Pool
RR Port 0 Check Turn PoolPort 0 Check Turn Pool
Port N TurnPort N Turn
PointerPointerRsvdRsvdPNPN
EgressEgress
BB
EEReservedReserved
RR Port N Request Turn PoolPort N Request Turn Pool
RR Port N Check Turn PoolPort N Check Turn Pool
-
8/4/2019 Advanced Switching Overview
74/116
Copyright 2004, PCI-SIG, All Rights Reserved 74PCI-SIG Developers Conference
Host Switch Capability DetailsHost Switch Capability Details
Host Switch is one-to-many mapping
For each implemented downstream port, a binding
register is needed to an IO switch Binding registers map in incrementing fashion
Register set 0 maps to lowest numbered
device/function downstream portRegister set 255 maps to highest numbered
device/function downstream port
-
8/4/2019 Advanced Switching Overview
75/116
Copyright 2004, PCI-SIG, All Rights Reserved 75PCI-SIG Developers Conference
Host Switch Capability DetailsHost Switch Capability Details
When generating request packets, the turnpointer and request turn pool are used
When checking request packets from an IOswitch, the check turn pool is used as aprotection check
The egress port dictates which of the ports (upto 4) request and completions use
-
8/4/2019 Advanced Switching Overview
76/116
Copyright 2004, PCI-SIG, All Rights Reserved 76PCI-SIG Developers Conference
I/O Switch CapabilityI/O Switch Capability
Capability ID = 0001hCapability ID = 0001hReservedReservedTurnTurn
PointerPointerRsvdRsvdEgressEgress
BB
EEReservedReserved
BB
EERequest Turn PoolRequest Turn Pool
BB
EECheck Turn PoolCheck Turn Pool
RR
-
8/4/2019 Advanced Switching Overview
77/116
Copyright 2004, PCI-SIG, All Rights Reserved 77PCI-SIG Developers Conference
I/O Switch Capability DetailsI/O Switch Capability Details
When generating request packets, the turnpointer and request turn pool are used
When checking request packets from a hostswitch, the check turn pool is used as aprotection check
The egress port dictates which of the ports (upto 4) request and completions use
-
8/4/2019 Advanced Switching Overview
78/116
Copyright 2004, PCI-SIG, All Rights Reserved 78PCI-SIG Developers Conference
BindingBinding
Three methods of establishing Host Switch / IOSwitch binding
Hardware (Predetermined configuration via pin-strapping, SROM pre-load, etc.)
Third-party AS agent (CPU with AS aware fabric
management software)AS aware software running on PCI Express CPU
(Using AS portal being defined by PI-8 SpecificationTeam)
-
8/4/2019 Advanced Switching Overview
79/116
Copyright 2004, PCI-SIG, All Rights Reserved 79PCI-SIG Developers Conference
AgendaAgenda
Architecture Overview
AS Registers for PI-8
Tunneling Mechanism
PCI Express Configuration / Hot Plug
PCI Express Virtual Channels Power Management
Reset / Training
PCI Express ErrorsAdvanced SwitchingAdvanced SwitchingPCI ExpressPCI Express
Star Dual Star Mesh
PCI Express* PacketPCI Express* Packet
-
8/4/2019 Advanced Switching Overview
80/116
Copyright 2004, PCI-SIG, All Rights Reserved 80PCI-SIG Developers Conference
AS FabricAS Fabric
pEncapsulation and Extraction
pEncapsulation and Extraction
Source Bridge encapsulates PCI ExpressPacket
AS switches route encapsulated packets based
on AS path specification
Destination Bridge extracts original PCI Expresspacket
SourceSource
BridgeBridge
DataData
AS HeaderAS Header
PCI Express HeaderPCI Express HeaderDestinationDestination
BridgeBridge
-
8/4/2019 Advanced Switching Overview
81/116
Copyright 2004, PCI-SIG, All Rights Reserved 81PCI-SIG Developers Conference
Turn PoolTurn Pool
TCTCCreditsCredits
RequiredRequiredTurnTurn
PointerPointerHeader CRCHeader CRC
DD
TCTCTT
SSCreditsCredits
RequiredRequiredTurnTurn
PointerPointerHeader CRCHeader CRC
Turn PoolTurn PoolDD
TT
SS0000 PIPI0000 PIPI00000000
PI-8 AS HeaderPI-8 AS Header
Green fields fixed fields for all PI-8 packets PI (8h)
Perishable, Packet CRC, Ordered Only, and FECN must all be 0.
Red fields calculated from PCI Express packet
TC (unmodified, unless TC=7)
This is just the AS TC, the PCI Express TC remains unchanged in the PCIExpress header
TS (set for reads and non-posted writes, cleared for posted writes)
D (cleared on requests, set on responses) Yellow fields taken from binding table
Blue fields calculated
Header CRC (from constructed header)
Credits Required (based upon PCI Express packet length)
-
8/4/2019 Advanced Switching Overview
82/116
Copyright 2004, PCI-SIG, All Rights Reserved 82PCI-SIG Developers Conference
Checks PerformedChecks Performed
D bit in AS header determines whether thepacket is a request or completion
AS EventsAS Malformed Packet (return to sender event)
Perishable bit or OO bit set (all packets)
TS field set for completion packets
AS Invalid Turn Pointer (return to sender event) Turn pool not 0 on request packets
Turn pointer does not match RPTR on completion packet
PI-8 Protection Event (sent to fabric manager) Turn pool does not match CPOOL on request packet
Turn pool does not match RPOOL on completion packet
-
8/4/2019 Advanced Switching Overview
83/116
Copyright 2004, PCI-SIG, All Rights Reserved 83PCI-SIG Developers Conference
Transaction Layer CRCTransaction Layer CRC
AS header contains a transaction layer CRCsimilar to PCI Express End-to-End CRC
Called Packet CRC
Packet CRC does not adequately cover thepacket between two PCI Express endpoints
PCI Express links still on either side of fabric
End to end CRC, if desired by PCI Express,should use the ECRC field
PI-8 specifies that the packet CRC bit must be0.
Results inAS Malformed PacketEvent
-
8/4/2019 Advanced Switching Overview
84/116
Copyright 2004, PCI-SIG, All Rights Reserved 84PCI-SIG Developers Conference
PCI Express OrderingPCI Express Ordering A path from the host switch to an IO switch must match
the path from the IO switch back to the host switch This ensures the fabric acts as a virtual PCI Express link
If path from host-to-IO switch is different than that fromIO-to-host switch, possibilities exist for PCI Express*ordering rule violations
Example: Device writes to system memory, updates an internal flag
indicating data written
Host reads the flag
If completion has different AS path from write, it could bereturned to the host before the host write occurs
Switch link congestion, for example
A subsequent read of memory results in stale data
Correct bindings is the responsibility of the AS fabric managerCorrect bindings is the responsibility of the AS fabric managerCorrect bindings is the responsibility of the AS fabric manager
-
8/4/2019 Advanced Switching Overview
85/116
Copyright 2004, PCI-SIG, All Rights Reserved 85PCI-SIG Developers Conference
PI-8: No ChainingPI-8: No Chaining
Multicast operations, such as replication of PCIExpress* messages, handled by PCI Express*logic of the bridge
Example: reset
No PI-1
No usage model identified to reschedule traffic basedupon PI-8 logic congestion.
No PI-2
PCI Express* MPS field must fit within the MPS fieldof the AS link the PI-8 bridge is attached to The PCI Express* logic of the PI-8 bridge will therefore break
larger packets into correct AS sizes as necessary, ensuringno SAR needed
-
8/4/2019 Advanced Switching Overview
86/116
Copyright 2004, PCI-SIG, All Rights Reserved 86PCI-SIG Developers Conference
AgendaAgenda
Architecture Overview
AS Registers for PI-8
Tunneling Mechanism
PCI Express Configuration / Hot Plug
PCI Express Virtual Channels Power Management
Reset / Training
PCI Express ErrorsAdvanced SwitchingAdvanced SwitchingPCI ExpressPCI Express
Star Dual Star Mesh
-
8/4/2019 Advanced Switching Overview
87/116
Copyright 2004, PCI-SIG, All Rights Reserved 87PCI-SIG Developers Conference
IntroductionIntroduction
Link between host switch and I/O switch is virtual
i.e. no link or PHY layer
Detecting the presence of the link must therefore also bevirtualized
No DLLPs
PI-8 sequences the connection via the binding enablebits in the PI-8 device PI structure
Items to be virtualized
Mechanism to ensure negotiated link speed and width
communicated
Mechanism to set bit in slot status register of host switch to allowhot plug event to be generated
Either PME or interrupt
H Add AS F b i
-
8/4/2019 Advanced Switching Overview
88/116
Copyright 2004, PCI-SIG, All Rights Reserved 88PCI-SIG Developers Conference
Hot Add to AS FabricHot Add to AS Fabric When a device is added to the AS fabric, the fabric
manager is (optionally) notified, so that it may beconfigured
PI-5 link up event Once a PI-8 capability is detected, it must be bound to
an RC so that PCI Express may configure it
Process1. Program a route into the I/O switch back to the host switch2. Set binding enable in the I/O switch
3. Program a route into unused host switch port, giving path to I/O
switch4. Set binding enable in the host switch
Setting the binding enable in the host switch kicks offthe hot plug hardware process
SH t Add t AS F b i
-
8/4/2019 Advanced Switching Overview
89/116
Copyright 2004, PCI-SIG, All Rights Reserved 89PCI-SIG Developers Conference
Hot Add to AS FabricHot Add to AS Fabric
When AS fabric manager sets binding enable bitin the host switch, it passes an AS message to
the I/O switchPI-8 specific PI-5 event
Sends the Max Link Width and Max Link Speed
from its PCI Express Link Capabilities register
AS HeaderAS Header
00 11h11h
ReservedReservedMax LinkMax Link
SpeedSpeed
88h88hPhysical Port #Physical Port #00 0000 0h0h
Max LinkMax Link
WidthWidth
PIPI--5 Header5 Header
H Add AS F b iH t Add t AS F b i
-
8/4/2019 Advanced Switching Overview
90/116
Copyright 2004, PCI-SIG, All Rights Reserved 90PCI-SIG Developers Conference
Hot Add to AS FabricHot Add to AS Fabric
When I/O switch receives the PI-5 event, itcompares the received max link and speed to its
own internal value It updates its negotiated link width and speed to
be the lowest common denominator of the two
Example I/O Switch Link Capabilities:
MLS = 2.5 Gb/s, MLW = 8
Received event: MLS = gen 2, MLW = 4
I/O Switch Link Status: MLS = 2.5 Gb/s, MLW = 4
H t Add t AS F b iH t Add t AS F b i
-
8/4/2019 Advanced Switching Overview
91/116
Copyright 2004, PCI-SIG, All Rights Reserved 91PCI-SIG Developers Conference
Hot Add to AS FabricHot Add to AS Fabric After updating slot status register, I/O switch sends same
message back to host switch PI-5 Event
Contains its copy of the Link Capabilities register
Upon receiving message from I/O switch, host switchdoes same update in slot status register
After updating its link status register, the hot plug maynow occur Host switch updates Presence Detect Status and Presence
Detect Changed in its slot status register
If these events enabled in PCI Express, software event signaledto RC operating system
This event triggers PCI Express software to configurethe new sub-tree
Hot Add to AS FabricHot Add to AS Fabric
-
8/4/2019 Advanced Switching Overview
92/116
Copyright 2004, PCI-SIG, All Rights Reserved 92PCI-SIG Developers Conference
StartStart
BindingBinding
EnableEnable
Set?Set?
Send PISend PI--5 Event to IO5 Event to IO
SwitchSwitch
StartStart
BindingBinding
EnableEnable
Set?Set?
Update NLW/NLS,Update NLW/NLS,
Send PISend PI--5 Event to5 Event to
Host SwitchHost Switch
PI5 EventPI5 Event
ReceivedReceived
??
TimeoutTimeout
??
EndEnd
Send PISend PI--5 Timeout5 Timeout
event to FMevent to FM
Update NLW/NLS,Update NLW/NLS,
Send PISend PI--5 Event to5 Event to
Host SwitchHost Switch
PI5 EventPI5 Event
ReceivedReceived
??
TimeoutTimeout
??
Send PISend PI--5 Timeout5 Timeout
event to FMevent to FM
EndEnd
NN
NN
NN
NN
NN
NN
YY YY
YY YY
YYYY
Host SwitchHost Switch I/O SwitchI/O Switch
H t Add EH t Add E
-
8/4/2019 Advanced Switching Overview
93/116
Copyright 2004, PCI-SIG, All Rights Reserved 93PCI-SIG Developers Conference
Hot Add ErrorHot Add Error
A hot plug is only successful after the bridgeshave been bound and the max link width/speed
PI-5 event has been exchanged A timeout mechanism exists to let AS fabric
manager know the link did not negotiate
Called the Link Capabilities TimeoutEventTimer based 10ms to 50ms
Event signaled to fabric manager
FM can choose to retry (clear and re-set bindingenable) or choose other action
Hot Remove fromAS F b i H t S it hHot Remove fromAS F b i H t S it h
-
8/4/2019 Advanced Switching Overview
94/116
Copyright 2004, PCI-SIG, All Rights Reserved 94PCI-SIG Developers Conference
AS Fabric Host SwitchAS Fabric Host Switch
When AS fabric manager clears binding enablehost switch modifies PC Express* slot statusregister
Presence Detect is cleared
Presence Detect Change is set
If enabled via PCI Express software, hot plugsoftware event signaled
When AS fabric manager clears binding enablein IO switch, IO switch resets its PCI Express
registers and PCI Express interfaceEnsures registers are in a default idle state, allowing
fabric manager to hot swap this IO switch into a newhost switch
PCI Express / AS SoftwareO d iPCI Express / AS SoftwareO d i
-
8/4/2019 Advanced Switching Overview
95/116
Copyright 2004, PCI-SIG, All Rights Reserved 95PCI-SIG Developers Conference
OrderingOrdering
If AS software comes up first
All connections will be made, PCI Express software
will see full trees If PCI Express software comes up first
Configuration will stop at host switches, as no bound
downstream ports will result in unsupported requestbeing returned to PCI Express software
Upon AS software binding, hot plug events will cause
PCI Express software to re-enumerate the sub-trees
A dA d
-
8/4/2019 Advanced Switching Overview
96/116
Copyright 2004, PCI-SIG, All Rights Reserved 96PCI-SIG Developers Conference
AgendaAgenda
Architecture Overview
AS Registers for PI-8
Tunneling Mechanism
PCI Express Configuration / Hot Plug
PCI Express Virtual Channels Power Management
Reset / Training
PCI Express ErrorsAdvanced SwitchingAdvanced SwitchingPCI ExpressPCI Express
Star Dual Star Mesh
Hi tHi t
-
8/4/2019 Advanced Switching Overview
97/116
Copyright 2004, PCI-SIG, All Rights Reserved 97PCI-SIG Developers Conference
HistoryHistory
PCI Express virtual channels are not enabled bydefault as in AS
PCI Express software must explicitly enable them Concern that if virtual channels not identified to
PCI Express software, it may not enable TCs in
the endpointsUnknown variable, different OSes may act differently
Additionally, did not want to report more VCsthan actually existed on the AS links
PCI Express software may think it is gettingdifferentiation in traffic that it is not actually getting
Mapping of PCI Express toAS Vi t l Ch lMapping of PCI Express toAS Vi t l Ch l
-
8/4/2019 Advanced Switching Overview
98/116
Copyright 2004, PCI-SIG, All Rights Reserved 98PCI-SIG Developers Conference
AS Virtual ChannelsAS Virtual Channels
AS can have any number of virtual channels
PCI Express may only have a power-of-2 virtual
channel count Limits what PCI Express may report
88
44,5,6,7
22,3
11
Max PCI Express VCCount
AS BVC Count
A dAgenda
-
8/4/2019 Advanced Switching Overview
99/116
Copyright 2004, PCI-SIG, All Rights Reserved 99PCI-SIG Developers Conference
AgendaAgenda
Architecture Overview
AS Registers for PI-8
Tunneling Mechanism
PCI Express Configuration / Hot Plug
PCI Express Virtual Channels Power Management
Reset / Training
PCI Express ErrorsAdvanced SwitchingAdvanced SwitchingPCI ExpressPCI Express
Star Dual Star Mesh
-
8/4/2019 Advanced Switching Overview
100/116
PCI Express Power Management Li k (ASPM)PCI Express Power Management Li k (ASPM)
-
8/4/2019 Advanced Switching Overview
101/116
Copyright 2004, PCI-SIG, All Rights Reserved 101PCI-SIG Developers Conference
Link (ASPM)Link (ASPM)
Concept
When link goes idle for some period of time, it enters
a lower power state L0s (low latency) counter value is specified
L1 (higher latency) is implementation specific
On a switch, if all downstream ports idle and in lowpower state, upstream port may go into low powerstate
Mechanism involves DLLPs and TLPs for entryand exit
PCI Express PowerManagement Link (ASPM)PCI Express PowerManagement Link (ASPM)
-
8/4/2019 Advanced Switching Overview
102/116
Copyright 2004, PCI-SIG, All Rights Reserved 102PCI-SIG Developers Conference
Management Link (ASPM)Management Link (ASPM)
PI-8 chose not to implement this functionality
AS links are not power managed by PCI Express
softwareThe messaging required would be complex
New messages for DLLPs would be required for L0s and L1entrance requests, and the exit request
Implications of NotI l ti PCI E ASPMImplications of NotImplementing PCI E press ASPM
-
8/4/2019 Advanced Switching Overview
103/116
Copyright 2004, PCI-SIG, All Rights Reserved 103PCI-SIG Developers Conference
Implementing PCI Express ASPMImplementing PCI Express ASPM
I/O Switch - None
PCI Express ports may still go into L0s/L1 based
upon timersUpstream port is AS, and cannot physically enter PCI
Express L0s/L1
Host SwitchNo downstream effects AS fabric port
Upstream port does not enter L0s/L1
Therefore, PCI Express subsystem above host switch will notenter L0s/L1 via ASPM
PCI ExpressP M t D i PMPCI ExpressPower Management Device PM
-
8/4/2019 Advanced Switching Overview
104/116
Copyright 2004, PCI-SIG, All Rights Reserved 104PCI-SIG Developers Conference
Power Management Device PMPower Management Device PM
All PI-8 devices are reqiured to implement thePCI-PM capability structure
Required by PCI Express specification Entering a lower power state has two effects
First, puts the link into a lower power state
Second, allows device to enter lower power state
i.e. gate clocks, shut any internal PLLs, etc.
Does not mandate
PCI ExpressPower Management Device PMPCI ExpressPower Management Device PM
-
8/4/2019 Advanced Switching Overview
105/116
Copyright 2004, PCI-SIG, All Rights Reserved 105PCI-SIG Developers Conference
Power Management Device PMPower Management Device PM
As with ASPM, the AS links do not enter a lowpower state when device put into a low power
statePhysical PCI Express links must enter low power
state as per PCI Express specification
Bridges may opt to go into a low power state,but AS functionality must not be compromised
AgendaAgenda
-
8/4/2019 Advanced Switching Overview
106/116
Copyright 2004, PCI-SIG, All Rights Reserved 106PCI-SIG Developers Conference
AgendaAgenda
Architecture Overview
AS Registers for PI-8
Tunneling Mechanism
PCI Express Configuration / Hot Plug
PCI Express Virtual Channels Power Management
Reset / Training
PCI Express ErrorsAdvanced SwitchingAdvanced SwitchingPCI ExpressPCI Express
Star Dual Star Mesh
ResetReset
-
8/4/2019 Advanced Switching Overview
107/116
Copyright 2004, PCI-SIG, All Rights Reserved 107PCI-SIG Developers Conference
ResetReset
A bridge may be reset, (with its accompanyinglink), by programming the secondary bus reset
register in PCI configuration space In PCI Express, a reset is recognized on the link
via an electrical change
PI-8 creates PI-5 events to virtualize this
Reset Host SwitchReset Host Switch
-
8/4/2019 Advanced Switching Overview
108/116
Copyright 2004, PCI-SIG, All Rights Reserved 108PCI-SIG Developers Conference
Reset Host SwitchReset Host Switch
If receives an electrical reset from PCI Express
Host switch creates PI-5 reset events for each bounddownstream port
Host switch resets PCI configuration registers in the upstreamand each downstream port
If secondary bus reset bit set in upstream port
Host switch creates PI-5 reset events for each bounddownstream port
Host switch resets PCI configuration registers in eachdownstream port
If secondary bus reset bit set in downstream port Host switch creates single PI-5 reset event for the downstream
port (if it is bound)
Reset I/O SwitchReset I/O Switch
-
8/4/2019 Advanced Switching Overview
109/116
Copyright 2004, PCI-SIG, All Rights Reserved 109PCI-SIG Developers Conference
Reset I/O SwitchReset I/O Switch
If receives a PI-5 reset event
I/O switch resets PCI configuration registers in theupstream and each downstream port
I/O switch electrically resets the downstream links
If secondary bus reset bit set in upstream port
I/O switch resets PCI configuration registers in eachdownstream port
I/O switch electrically resets the downstream links
If secondary bus reset bit set in downstreamport I/O switch electrically resets the single downstream
link
TrainingTraining
-
8/4/2019 Advanced Switching Overview
110/116
Copyright 2004, PCI-SIG, All Rights Reserved 110PCI-SIG Developers Conference
TrainingTraining
A link is trained during initial bring up, and byPCI Express software (through the link control
register) Training of virtual links attached to AS links does
not occur
When a link is bound (via the binding enable) it isautomatically considered trained
If PCI Express software sets the start training bit, the
training complete bit is automatically set, and nocommunication occurs with other side
-
8/4/2019 Advanced Switching Overview
111/116
IntroductionIntroduction
-
8/4/2019 Advanced Switching Overview
112/116
Copyright 2004, PCI-SIG, All Rights Reserved 112PCI-SIG Developers Conference
IntroductionIntroduction
PCI Express Errors are TLPs encapsulated onAS
Completion codesPCI Express messages
Errors must be logged in the appropriate P2P
bridge of the host or I/O switch i.e. either in the downstream or upstream port, as
appropriate
Want More Info onAdvanced Switching?Want More Info onAdvanced Switching?
-
8/4/2019 Advanced Switching Overview
113/116
Copyright 2004, PCI-SIG, All Rights Reserved 113PCI-SIG Developers Conference
Advanced Switching?Advanced Switching?
Intel Developer Network for PCI Express Architecture
http://developer.intel.com/technology/pciexpress/devnet/comms.htm
Information on Intel Industry Enabling
ASI-SIG Web Site http://www.asi-sig.org/join
Specification Documents and Working Groups
Advanced Switching
Star Dual Star Mesh
Join The ASIJoin The ASI--SIGSIG
-
8/4/2019 Advanced Switching Overview
114/116
Copyright 2004, PCI-SIG, All Rights Reserved 114PCI-SIG Developers Conference
Thank you for attending thePCI-SIG Developers Conference 2004.
For more information please go towww.pcisig.com
-
8/4/2019 Advanced Switching Overview
115/116
Copyright 2004, PCI-SIG, All Rights Reserved 115
Advanced Switching OverviewAdvanced Switching Overview
Seth Zirin Joe Bennett
Principal Engineer Principal Engineer
Intel Corporation Intel CorporationASI-SIG FMS WG Chair ASI-SIG PI-8 WG Chair
Seth Zirin Joe Bennett
Principal Engineer Principal Engineer
Intel Corporation Intel CorporationASI-SIG FMS WG Chair ASI-SIG PI-8 WG Chair
-
8/4/2019 Advanced Switching Overview
116/116