infiniband: today and tomorrow

33
1 © 2005 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Public InfiniBand: Today and Tomorrow Jamie Riotto Sr. Director of Engineering Cisco Systems (formerly Topspin Communications) [email protected]

Upload: networksguy

Post on 06-May-2015

1.738 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: InfiniBand: Today and Tomorrow

1© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

InfiniBand: Today and Tomorrow

Jamie Riotto

Sr. Director of EngineeringCisco Systems (formerly Topspin Communications)

[email protected]

Page 2: InfiniBand: Today and Tomorrow

2© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

Agenda

• InfiniBand Today

– State of the market

– Cisco and InfiniBand

– InfiniBand products available now

– Open source initiatives

• InfiniBand Tomorrow

– Scaling InfiniBand

– Future Issues

• Q&A

Page 3: InfiniBand: Today and Tomorrow

3© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

InfiniBand Maturity Milestones

• High adoption rates

– Currently shipping > 10,000 IB ports / Qtr

• Cisco acquisition will drive broader market adoption

• End-to-end price points of <$1000.

• New Cluster scalability proof-points

– 1000 to 4000 nodes

Page 4: InfiniBand: Today and Tomorrow

4© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

Cisco Adopts InfiniBand

• Cisco acquired Topspin on May 16, 2005

• Adds InfiniBand to Switching Portfolio

– Network Switches, Storage Switches, now Server Switches

– Creates independent Business Unit to promote InfiniBand & Server Virtualization

• New Product line of Server Fabric Switches (SFS)

– SFS 7000 Series InfiniBand Server Switches

– SFS 3000 Series Multifabric Server Switches

Page 5: InfiniBand: Today and Tomorrow

5© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

Network Switch

Clients

Network Resources (Internet, Printer, Server)

Storage Switch

Server

Storage (SAN)

Server Switch

Servers

StorageNetwork

Cisco and InfiniBandThe Server Fabric Switch

Page 6: InfiniBand: Today and Tomorrow

6© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

Cisco HPC Case Studies

Page 7: InfiniBand: Today and Tomorrow

7© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

Real Deployments Today: Wall Street Bank with 512 Node Grid

SAN LAN

2 96-portTS-270

23 24-port TS-120

512 Server Nodes

2 TS-360 w/ Ethernet and Fibre Channel Gateways

Core Fabric

Edge Fabric

GRID I/O

Existing Networks

Fibre Channel and GigE connectivity built seamlessly into the cluster

Page 8: InfiniBand: Today and Tomorrow

8© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

520 Dual CPU Nodes1,040 CPUs

NCSANational Center for Supercomputing Applications

Tungsten 2: 520 Node Supercomputer

Core Fabric

Edge Fabric

6 72-portTS270

29 24-port TS120

174 uplinkcables

512 1mcables

18 Compute Nodes

18 Compute Nodes

Parallel MPI codes for commercial clients

Point to point 5.2us MPI latency

Deployed: November 2004

Page 9: InfiniBand: Today and Tomorrow

9© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

D.E. Shaw Bio-Informatics:1,066 Node Super Computer

Fault Tolerant

Core Fabric

Edge Fabric

12 96-portTS-270

89 24-port TS-120

1,068 5m/7m/10m/15muplink cables

1,066 1mcables

12 Compute Nodes

12 Compute Nodes

1,066 Fully Non-Blocking Fault Tolerant IB Cluster

Page 10: InfiniBand: Today and Tomorrow

10© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

Large Government LabWorlds Largest Commodity Server Cluster – 4096 nodes

• Application:

High Performance Super Computing Cluster

• Environment:

4096 Dell Servers

50% Blocking Ratio

8 TS-740s

256 TS-120s

• Benefits:

Compelling Price/Performance

Largest Cluster Ever Built (by approx. 2X)

Expected to be 2nd Largest Supercomputer in the world by node count

CoreFabric

8x SFS TS740288 ports each

Edge256x TS120

24-ports each

18 Compute Nodes)

18 Compute Nodes)

8192 Processor 60TFlop SuperCluster

2048 uplinks(7m/10m/15m/20m)

Page 11: InfiniBand: Today and Tomorrow

11© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

InfiniBand Products Available Today

Page 12: InfiniBand: Today and Tomorrow

12© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

InfiniBand Switches and HCAs

• Fully non-blocking switch building blocks available in sizes from 24 up to 288 ports.

• Blade servers offer integrated switches and pass-through modules

• HCAs available in PCI-X and PCI-Express

• IP & Fibre-Channel Gateway Modules

Page 13: InfiniBand: Today and Tomorrow

13© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

Integrated InfiniBand for Blade Servers Create “wire-once” fabric

• Integrated 10Gbps InfiniBand switches provide unified “wire-once” fabric

• Optimize density, cooling, space, and cable management.

• Option of integrated InfiniBand switch (ex: IBM BC) or pass-thru module (ex: Dell 1855)

• Virtual I/O provides shared Ethernet and Fibre Channel ports across blades and racks

IB SwitchIB Switch

10Gbps 30Gbps

Blade Chassis with InfiniBand Switches

HCA

Page 14: InfiniBand: Today and Tomorrow

14© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

Ethernet and Fibre Channel Gateways Unified “wire-once” fabric

SAN Server FabricLAN/WAN

Server Cluster

Fibre Channel to InfiniBand gateway for storage accessFibre Channel to InfiniBand gateway for storage access

Ethernet to InfiniBand gateway for LAN accessEthernet to InfiniBand gateway for LAN access

Single InfiniBand link for: - Storage - Network

Single InfiniBand link for: - Storage - Network

Page 15: InfiniBand: Today and Tomorrow

15© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

InfiniBand Price / Performance

InfiniBandPCI-Express

10GigE GigE Myrinet D Myrinet E

Data Bandwidth(Large Messages)

950MB/s 900MB/s 100MB/s 245MB/s 495MB/s

MPI Latency(Small Messages)

5us 50us 50us 6.5us 5.7us

HCA Cost(Street Price)

$550 $2K-$5K Free $535 $880

Switch Port $250 $2K-$6K $100-$300 $400 $400

Cable Cost(3m Street Price)

$100 $100 $25 $175 $175

•Myrinet pricing data from Myricom Web Site (Dec 2004) ** InfiniBand pricing data based on Topspin avg. sales price (Dec 2004)*** Myrinet, GigE, and IB performance data from June 2004 OSU study

• Note: MPI Processor to Processor latency – switch latency is less

Page 16: InfiniBand: Today and Tomorrow

16© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

InfiniBand Cabling

• CX4 Copper (15m)

• Flexible 30-Gauge Copper (3m)

• Fiber Optics up to 150m

Page 17: InfiniBand: Today and Tomorrow

17© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

Host Drivers for Standard Protocols

• Open source strategy = reliability at low cost

• IPoIB: legacy TCP/IP applications

• SDP: reliable socket connections (optional RDMA)

• MPI: leading edge HPCC applications (RDMA)

• SRP: block storage access (RDMA)

• uDAPL: User level RDMA

Page 18: InfiniBand: Today and Tomorrow

18© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

OS Support

• Operating Systems Available:

– Linux (Red Hat, SuSE, Fedora, Debian, etc.)

– Windows 2000 and 2003

– HP-UX (Via HP)

– Solaris (Via Sun)

Page 19: InfiniBand: Today and Tomorrow

19© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

The InfiniBand Driver Architecture

BSD Sockets FS API

TCPSDP

IP

DriversVERBS

ETHER INFINIBAND HCA

DAT FILE SYSTEM

SCSI

SRP

FC

FCP

SDP

INFINIBAND SAN

API

BSD Sockets NFS-RDMA

LAN/WAN SERVER FABRICSAN

INFINIBAND SWITCHETHERSWITCH

FCSWITCHFC GW

EETH GW

NETWORK

APPLICATION

UDAPL

TS TS

IPoIB

User

Kernel

Page 20: InfiniBand: Today and Tomorrow

20© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

Open Software Initiatives

• OpenIB.org– Topspin primary authors of major portions including IPoIB, SDP, SRP and TS-API. Cisco will continue to invest.

– Current protocol development nearing production quality code. Expect release by end of year.

– Charter has been expanded to include Windows and iWarp

– MPI will be available in the near future (MVAPICH 0.96)

• OpenSM

• OpenMPI

Page 21: InfiniBand: Today and Tomorrow

21© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

InfiniBand Tomorrow

Page 22: InfiniBand: Today and Tomorrow

22© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

Looking into the future

• Cost

• Speed

• Distance Limitations

• Cable Management

• Scalability

• IB and Ethernet

Page 23: InfiniBand: Today and Tomorrow

23© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

Speed: InfiniBand DDR / QDR, 4X / 12X

• DDR Available end of 2005

Doubles wire speeds to ? (ok, still working on this one)

PCI-Express DDR

Distances of 5-10m using copper

Distances of 100m using fiber

• QDR Available WHEN?

• 12X (30 Gb/s) available for over one year!!

– Not interesting until 12X HCA

• Not interesting until > 16X PCIe

Page 24: InfiniBand: Today and Tomorrow

24© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

Future InfiniBand Cables

• InfiniBand over CAT5 / CAT6 / CAT7

Shielded cable distances up to ???

Leverage existing 10-GigE cabling

10-GigE too expensive?

Page 25: InfiniBand: Today and Tomorrow

25© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

IB Distance Scaling

• IB Short Haul– New Copper drivers

– 25 – 50 Meters (KeyEye)

– 75 - 100 Meters (IEEE 10Ge)

• IB Wan– Same Subnet over distance (300 KM target)

– Buffer / Credit / Timeout issues

– Applications: Disaster Recover, Data Mirroring

• IB Long Haul– IB over IP (over SONET?)

– utilizes existing public plant (WDM, Debugging, etc)

Page 26: InfiniBand: Today and Tomorrow

26© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

Scaling InfiniBand

• Subnet Management

• Host-side Drivers

MPI

IPoIB

SRP

• Memory Utilization

Page 27: InfiniBand: Today and Tomorrow

27© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

IB Subnet Manager

• Subnets are getting bigger

– 4,000 -> 10,000 nodes

– Topology convergence times

• Topology disturbance times

• Topology disturbance minimization

Page 28: InfiniBand: Today and Tomorrow

28© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

Subnet Management Challenges

• Cluster Cold Start times–Template Routing

– Persistent Routing

• Cluster Topology Change Management– Intentional Change - Maintenance

– Unintentional Change – Dealing with Faults

• How to impact minimum number of connections

• Predetermine fault reaction strategy?

• Topology Diagnostic Tools– Link/Route Verification

– Built-in BERT testing

• Partition Management

Page 29: InfiniBand: Today and Tomorrow

29© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

Multiple Routing Models

• Minimum Latency Routing: – Load-Balanced Shortest-Path Routing

• Minimum Contention Routing: – Lowest-Interference Divergent-Path Routing

• Template Driven Routing: – Supports Pre-Determined Routing Topology

– For example: Clos Routing, Matrix Row/Column, etc

– Automatic Cabling Verification for Large Installations

Page 30: InfiniBand: Today and Tomorrow

30© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

IB Routing Challenges

• Static / Dynamic Routing– IB impliments Static Routing through Linear Forwarding Tables at each chip

– Multi-LID Routing enables Dynamic Routing

• Credit Loops

• Cost Base Routing– Speed mismatches cause Store & Forward (vs. cut through)

– SDR <> DDR <>QDR

– 4X <> 12X

– Short Haul <> Long Haul

Page 31: InfiniBand: Today and Tomorrow

31© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

Multi-LID Source-Based Routing Support

• Applications can implement “Dynamic” Routing for Contention Avoidance, Failover, Parallel Data Transfer

1,2,3,4

Spine SwitchesLeaf Switches Leaf Switches

Page 32: InfiniBand: Today and Tomorrow

32© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

New IB Peripherals

• CPUs?

• Storage– SAN

– NFS-RDMA

• Memory (coherent / non-coherent)

• Purpose built Processors?– Floating Point Processors

– Graphics Processors

– Pattern Matching Hardware

– XML Processor

Page 33: InfiniBand: Today and Tomorrow

33© 2005 Cisco Systems, Inc. All rights reserved.Session NumberPresentation_ID Cisco Public

THANK YOU!

• Questions & Answers