portland: a scalable fault-tolerant layer 2 data center network fabric radhika niranjan mysore,...

Portland: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar

Radhakrishnan, Vikram Subramanya, and Amin Vahdat

Department of Computer Science and Engineering University of California San Diego

Background

• Emerging needs for massive scale data centers• Various design elements to achieve high

performance, scalability, fault-tolerance in such environments

Problems• VM migration support among Traditional DC networks are vulnerable; migrating VMs

change the VM’s IP address breaks pre-existing TCP connections, which results in administrative overhead for TCP connection handover among VM hosts

• Switches need to be configured before deployment

• Inefficient communication between physically distance hosts

• Forwarding loops results to inefficiency, worse yet paralysis of the network

• Physical connectivity failures interferes with existing unicast and multicast sessions

DC規模のネットワークにおいて，既存の IP, Ethernet Protocolに起因する問題が多い

Solution

• Portland– An ethernet compatible L2 protocol to solve the

mentioned issues

A Fat Tree Network• 本論文で対象とするネットワークトポロジ• DCネットワークで汎用的に用いられてるトポロジ

Portland DesignFabric Manager

• An user process running on a dedicated machine somewhere in the network responsible for..– Assisting with ARP resolution– Fault tolerance– Multicast

• 前提– The location of the Fabric Manager is transparent for each of

the switches in the network– Fabric Manager serves as a core function in Portland;

therefore 冗長化されてる

Portland DesignPositional Pseudo MAC Address

• Virtual MAC addr which specifies the location of the host in the network• Described as pod.position.port.vmid

– Pod = pod number– Position = position within pod– Port = switch port number– VMid = virtual machine number (auto increment for each added vm, zero if not running on VM?)

1. A host is connected to an edge switch

2. The edge switch creates an address mapping table within itself for further forwarding

3. The edge switch refers to the fabric manager for the newly added host

Portland DesignProxy-based ARP

• Ethernet by default broadcast to all host in the same L2 domain -> inefficient

Portland DesignDistributed Location Discovery

• All the switches broadcast a LDP (Location Discovery Protocol) to all its port on a certain interval

• LDPを受け取ったスイッチは， LDP listener thread() 関数の内容を処理し，新規に接続されたスイッチはネットワークにおける現在位置を，既存のスイッチは Forwarding Tableのアップデートを行う

Portland DesignUnicast Fault Tolerant Routing

1. Link Failure Detection2. Informs the Fabric Manager3. The Fabric Manager updates

the per-link connectivity matrix4. The Fabric Manager informs

all switches about the link failure

Traditional Routing Protocols Portland

O(n2) O(n)

Communication Overhead for Failure Detection

Implementation• HW

– Switch * 20• 4-port NetFPGA PCI card switches with Xilinx FPGA for hardware extensions ( 1U dual-core 3.2

GHz Intel Xeon machines with 3GB RAM )• Openflow

– Switch configuration software?

• 32-entry TCAM and a 32K entry SRAM for flow table entries – End host * 16

• 1U quad-core 2.13GHz Intel Xeon machines with 3GB of RAM running Linux 2.6.18-92.1.18el5

• System architecture

Switch

FM FMOpenflow

Fabric Manager Communication Module

Fabric Manager Network

冗長化，同期されてる

FM

DC Network

Open Flow Protocol

EvaluationConvergence Time

Convergence Time with Increasing FaultsTCP Convergence

Multicast Convergence

EvaluationScalability

Fabric manager control traffic CPU requirements for ARP Requests

Conclusion

• Fabric Managerの冗長化

portland: a scalable fault-tolerant layer 2 data center network fabric radhika niranjan mysore,...

Documents