portland: a scalable fault-tolerant layer 2 data center network fabric radhika niranjan mysore,...
TRANSCRIPT
Portland: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric
Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar
Radhakrishnan, Vikram Subramanya, and Amin Vahdat
Department of Computer Science and Engineering University of California San Diego
Background
• Emerging needs for massive scale data centers• Various design elements to achieve high
performance, scalability, fault-tolerance in such environments
Problems• VM migration support among Traditional DC networks are vulnerable; migrating VMs
change the VM’s IP address breaks pre-existing TCP connections, which results in administrative overhead for TCP connection handover among VM hosts
• Switches need to be configured before deployment
• Inefficient communication between physically distance hosts
• Forwarding loops results to inefficiency, worse yet paralysis of the network
• Physical connectivity failures interferes with existing unicast and multicast sessions
DC規模のネットワークにおいて,既存の IP, Ethernet Protocolに起因する問題が多い
Solution
• Portland– An ethernet compatible L2 protocol to solve the
mentioned issues
A Fat Tree Network• 本論文で対象とするネットワークトポロジ• DCネットワークで汎用的に用いられてるトポロジ
Portland DesignFabric Manager
• An user process running on a dedicated machine somewhere in the network responsible for..– Assisting with ARP resolution– Fault tolerance– Multicast
• 前提– The location of the Fabric Manager is transparent for each of
the switches in the network– Fabric Manager serves as a core function in Portland;
therefore 冗長化されてる
Portland DesignPositional Pseudo MAC Address
• Virtual MAC addr which specifies the location of the host in the network• Described as pod.position.port.vmid
– Pod = pod number– Position = position within pod– Port = switch port number– VMid = virtual machine number (auto increment for each added vm, zero if not running on VM?)
1. A host is connected to an edge switch
2. The edge switch creates an address mapping table within itself for further forwarding
3. The edge switch refers to the fabric manager for the newly added host
Portland DesignProxy-based ARP
• Ethernet by default broadcast to all host in the same L2 domain -> inefficient
Portland DesignDistributed Location Discovery
• All the switches broadcast a LDP (Location Discovery Protocol) to all its port on a certain interval
• LDPを受け取ったスイッチは, LDP listener thread() 関数の内容を処理し,新規に接続されたスイッチはネットワークにおける現在位置を,既存のスイッチは Forwarding Tableのアップデートを行う
Portland DesignUnicast Fault Tolerant Routing
1. Link Failure Detection2. Informs the Fabric Manager3. The Fabric Manager updates
the per-link connectivity matrix4. The Fabric Manager informs
all switches about the link failure
Traditional Routing Protocols Portland
O(n2) O(n)
Communication Overhead for Failure Detection
Implementation• HW
– Switch * 20• 4-port NetFPGA PCI card switches with Xilinx FPGA for hardware extensions ( 1U dual-core 3.2
GHz Intel Xeon machines with 3GB RAM )• Openflow
– Switch configuration software?
• 32-entry TCAM and a 32K entry SRAM for flow table entries – End host * 16
• 1U quad-core 2.13GHz Intel Xeon machines with 3GB of RAM running Linux 2.6.18-92.1.18el5
• System architecture
Switch
FM FMOpenflow
Fabric Manager Communication Module
Fabric Manager Network
冗長化,同期されてる
FM
DC Network
Open Flow Protocol
EvaluationConvergence Time
Convergence Time with Increasing FaultsTCP Convergence
Multicast Convergence
EvaluationScalability
Fabric manager control traffic CPU requirements for ARP Requests
Conclusion
• Fabric Managerの冗長化