checklist for selecting and deploying scalable clusters...

23
1 Checklist for Selecting and Deploying Scalable Clusters with InfiniBand Fabrics Lloyd Dickman, CTO InfiniBand Products Host Solutions Group QLogic Corporation November 13, 2007 @ SC07, Exhibitor Forum

Upload: others

Post on 07-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    Checklist for Selecting and Deploying Scalable Clusters with InfiniBand Fabrics

    Lloyd Dickman, CTO

    InfiniBand ProductsHost Solutions GroupQLogic Corporation

    November 13, 2007 @ SC07, Exhibitor Forum

  • 2

    InfiniBand is the Fabric of Choice for HPC

    Large and growing standards-based eco-system

    Aggressive technology roadmap

    Low-latency, high messaging rate implementations in market

    Evidenced by increasing traction in Top500

  • 3

    InfiniBand Interoperability

  • 4

    Interdependent Tradeoffs

    DecisionsHost ArchitectureFabric TopologyFabric Convergence / UnificationRAS, Power / CoolingSoftware – host applications, ULPsand fabric

    Budget

    CapEx+

    OpEx

    Performance

  • 5

    For HPC, Application Messaging Profiles Matter!

    LegacyApplications

    NewApplications

    Reservoir Simulation

    QuantumChromodynamics

    Climate

    Weather

    CFD

    Proteomics

    Genomics

    Magneto Hydrodynamics

    MolecularDynamics

    Message Size

    Mes

    sage

    Rat

    e

    EmbarrassinglyParallel

    MultiPhysics

    Time to Solution

    Mo

    ore

    ’s L

    aw

    Clo

    ck R

    ate Coarse-Grain

    BANDWIDTHDOMINATES

    Mo

    ore

    ’s L

    aw

    Mu

    lti-

    Co

    re

    Fine-GrainLATENCY

    andMESSAGE RATE

    DOMINATE

    Ap

    pli

    cati

    on

    Acc

    ele

    rato

    rs

  • 6

    Application Messaging Patterns Change with Scale

    Source: PathScale measurements, averages per node, DL_POLY_2 AWE Medium case, 19dec2005

    8

    16 32 64

    128

    256

    512

    1,02

    4

    2,04

    8

    4,09

    6

    8,19

    2

    16,3

    84

    32,7

    68

    65,5

    36

    131,

    072

    262,

    144

    524,

    288

    3264

    128256

    5120

    2,000,000

    4,000,000

    6,000,000

    8,000,000

    10,000,000

    12,000,000

    14,000,000

    16,000,000Nu

    mbe

    r of M

    essa

    ges

    Message Size (Bytes)

    Processors

    3264128256512

  • 7

    QLogic InfiniBand Provides Superior Application Scaling

    Scaling results are for 8 nodes / 32 cores.

    Latency results include a single switch crossing.

    Results use native MPIs: InfiniPath MPI for QLogic

    Point-to-point microbenchmarks are no longer sufficient predictors of performance at scale.

    Poin

    t-to-

    Poin

    tSc

    alin

    g

    2.6X13 M/s12 M/s9 M/sOSU Message Rate @ 4 processes/nodePoint-to-Point Message Rate(non-coalesced)

    .13 GUPs

    1.3 µs

    1.4 µs

    1.9 GB/s

    QLE7280 DDR

    7.6X

    1.8X

    Tie

    +24%

    vsCompetitive

    .12 GUPs

    1.4 µs

    1.5 µs

    1.5 GB/s

    QLE7240 DDR

    .12 GUPs

    1.5 µs

    1.6 µs

    0.9 GB/s

    QLE7140 SDR

    HPCC Random Ring Latency @ 32 cores

    ScalableLatency

    OSU Bandwidth(peak)Point-to-Point Bandwidth

    HPCC MPI Random Access @ 32 cores

    ScalableMessage Rate(non-coalesced)

    OSU Latency (0 Byte)

    Point-to-PointLatency

    BenchmarkComparison

  • 8

    SPEC MPI2007 Results Confirm Scalability

    QLogic SDR to 512 Cores

    0

    5

    10

    15

    20

    25

    104_

    milc

    107_

    leslie

    3d11

    3_Ge

    msFD

    TD

    115_

    fds4

    121_

    pop2

    122_

    tachy

    on12

    6_lam

    mps

    127_

    wrf2

    128_

    GAPg

    eofem

    129_

    tera_

    tf13

    0_so

    corro

    132_

    zeus

    mp2

    137_

    lu

    Rela

    tive

    Scal

    ing

    32 64 128 256 512

  • 9

    Bandwidth Must Account for ALL Traffic

    50% Link Utilization

  • 10

    Core1

    Core2

    Host-to-Fabric Rails

    Core3

    Core4

    Core1

    Core2

    Core3

    Core4

    Core1

    Core2

    Core3

    Core4

    Core1

    Core2

    Core3

    Core4

    Core1

    Core2

    Core3

    Core4

    Core1

    Core2

    Core3

    Core4

    Core1

    Core2

    Core3

    Core4

    Core1

    Core2

    Core3

    Core4

    InfiniBand Data Center Fabric

  • 11

    InfiniBandData Center

    Fabric

    InfiniBandData Center

    Fabric

    Fabric Topology Considerations

    Multi-railFat Tree / FBBOversubscriptionAsymmetric Topology

  • 12

    QLogic InfiniBand Fabric Products

    7U

    14U

    2U

    7U

    19” rack

    4U

    288 Ports

    144 Ports

    96 Ports

    48 Ports24 Ports24 Ports1

    U

    Core Fabric Switches

    Edge Fabric Switch

    Multi-Protocol

    Fabric DirectorModules

    4X SDR

    4X DDR

    SDR and DDR Spines

    Production quality DDR since April 2006, Over 50,000 ports to date with extensive OEM qualifications

  • 13

    Matching Fabric to Node Count

    Optimal Fabric Size for today’s 24-port switch chips• 1 tier – 24 ports• 2 tier – 288 ports• 3 tier – 3,456 ports

    24 Ports

    24 Ports

    9020 Switch / FC / 10 GbE Gateway

    288 Ports

    144 Ports

    96 Ports

    48 Ports24 Ports

  • 14

    Advanced Design Capabilities

    Signal Integrity• Internal and external bit

    lanes capable of 10Gb/sec• Significant signal integrity

    challenge• Leveraging established

    QLogic center for signal integrity analysis at headquarters

    Thermal Analysis• Chip power nearly triple

    previous generation• Leveraging QLogic CFD

    capabilities

  • 15

    Unified InfiniBand Fabric Reduces Cost

    Typical Cluster Interconnect Unified Fabric Interconnect

    AAAAAAAAA

    AAAAAAAAA

    aa

    a

    Unified Fabric Deployments

    Increasing

  • 16

    Cost and Power Advantages of Converged Fabric

    Cost Savings of a Converged Fabric

    $0

    $100,000

    $200,000

    $300,000

    $400,000

    $500,000

    $600,000

    Storage $35,200 $144,000 $144,00010GbE $200 $200 $283,400Ethernet $0 $61,280 0IB $113,300 0 0Servers $115,200 $115,200 $115,200

    Converged Fabric 1GbE+FC 10GbE+FC

    CF cost and power savings • FC HBAs and switches• Ethernet NICs and switches• Cables

    Additional cost savings• Simpler management: single

    network

    Comparisons for a 64-node cluster• CF advantage increases for larger

    clusters• 30-38% cost savings on a

    400/800-node FSC cluster

    Power savings from CF• ~30% vs. GigE

    Source: HP BoF, Converged Fabrics, SC07

  • 17

    Connectivity to ALL StorageDirect Connect

    InfiniBand Storage

    Fibre ChannelGateway

    EtherNetGateway

    SRP

    iSER

    iSER

    iSCSI

  • 18

    Connectivity to ALL Sockets Applications and Ethernet Networks

    10 GbEGateway

    SDP

    EtherNetEmulation

    Intra-Cluster Sockets

    Communications

    VNIC

    IPoIB-UDIPoIB-CM

    TCP/IP

  • 19

    OFED+ Provides OpenFabrics BenefitsPlus Optional Value Added Capabilities

    OFED Stack

    IPTra

    nsp

    ort

    Sta

    nd

    ard

    Lin

    ux

    TC

    P/

    IP S

    tack

    AcceleratedSocketsAccelerated HPC Stack

    MP

    ICH

    2

    QLo

    gic

    MP

    I

    PSM API

    Op

    en

    MP

    I

    HP

    -MP

    I

    Sca

    li M

    PI

    MV

    AP

    ICH

    QLogic 7200 Series Adapters

    OpenFabrics Verbs

    IPoIBSDPVNICSRPiSERRDS

    OpenSM

    HP

    -MP

    I

    Inte

    l M

    PI

    uDAPL

    Lu

    stre

    QLo

    gic

    SR

    P

    QLo

    gic

    SD

    P

    QLo

    gic

    VN

    IC

    Infi

    niP

    ath

    Fab

    ric

    Su

    ite

    MV

    AP

    ICH

    Op

    en

    MP

    I

    Sca

    li M

    PI

    OFED Driver

    Robust ULPs and Tools

  • 20

    Large Fabrics Challenges

    Scale of InfiniBand deployments growing rapidly• 1000s of Nodes, soon 10,000s of

    Nodes

    Fabric is the cluster• 1 wire vision being deployed now• Requires high degree of fabric stability

    Fabric Diagnosis and Resiliency• Many moving parts at large scale

    Demands of End Nodes on SA increasing

  • 21

    InfiniBand Fabric SuiteAutomates Cluster Install, Verification and Administration

    • For both Hosts and Switches

    Rich set of tools and capabilities for ongoing Fabric Administration

    • Easy to use TUI and CLI• Easy integration into site specific tools

    Rapidly Pinpoints fabric errors• Extensive verification and diagnostic capabilities

    Industry Leading Performance and Scalability• 1024 node fabric initialization in

  • 22

    Please Visit Us to Learn More About Complete InfiniBand – Booth #261

    Fluent Application running multiple data sets on a converged

    fabric with Fibre Channel storage via a gateway

    Cluster Administration

    using InfiniBand

    Fabric Suite

    InfiniBand DDR Adapter

    for highly scaling

    applications

    9020 InfiniBand Switch and

    Multi-Protocol Gateway

    HPCtrackPartnership

    Program

    Advanced Technology

    Demonstrations

  • 23

    Thank You