self-adaptive, multi-rate optical network for ...lightwave.ee.columbia.edu/files/samadi2017c.pdf ·...

W3D.2.pdf OFC 2017 © OSA 2017

Self-Adaptive, Multi-Rate Optical Network forGeographically Distributed Metro Data Centers

Payman Samadi1, Matteo Fiorani2, Yiwen Shen1, Lena Wosinska2, Keren Bergman1

1Lightwave Research Laboratory, Department of Electrical Engineering, Columbia University, New York, NY, USA2School of ICT, KTH Royal Institute of Technology, Stockholm, Sweden

[email protected], [email protected]

Abstract: We propose a self-adaptive, multi-rate converged architecture and control-planefor metro-scale inter-data-center networks, enabling live autonomous bandwidth steering.Experimental and numerical evaluations demonstrate up to 5× and 25% improvements intransmission times and spectrum usage.OCIS codes: (060.4250) Networks; (060.4510) Optical communications; (200.4650) Optical interconnects.

1. Introduction

Small to mid-sized Data Centers (DCs) in metro-scale distances are now widely used by enterprises and cloud-providers. Trends show metro traffic surpassing long-haul, i.e. most of the traffic generated in the metro networkstays locally and does not go to through the core network [1]. Furthermore, the upcoming 5th generation of mobilecommunications (5G) is expected to rely on general purpose hardware and distributed DCs to bring the services closerto the end-users. These new emerging services including the ones enabled by 5G impose strict latency and bandwidthrequirements that force the next-generation metro networks to be flexible, dynamic and support different levels ofQuality of Service (QoS). As a result of this inevitable complexity, network management and provisioning needs tomove from conventional human-operated towards autonomous, self-adaptive and cognitive networks.

Several network architectures have been recently proposed to support dynamic inter DC networking [2, 3] and pro-vide bandwidth on-demand [4]. Also, inside large-scale DCs, automated bandwidth management leveraging Software-Defined Networking (SDN) has been explored [5]. However, there is no comprehensive network architecture and con-trol strategy that supports various QoS levels, dynamic optical connectivity, and autonomous bandwidth steering, usingcommodity optical and electrical components for metro-scale inter DC networks. In previous works, we introduced theconcept of converged inter/intra DC network with background and dynamic connections supporting multiple QoS lev-els [6]. In this work, we (i) extend the control plane to enable self-adaptive bandwidth steering, (ii) support multi-rateoptical data plane, (iii) show a comparison (performed by simulations) with single-rate converged and conventionalnon-converged networks, and (iv) demonstrate full network prototype with autonomous bandwidth steering. Resultsshow 2–5× shorter transmission times and 20–25% lower wavelength usage compared with single-rate, converged andconventional networks. This architecture enables DC scaling in distance, improves application performance reliabilityby enabling distribution over multiple DCs and supports connections with strict bandwidth and latency requirements.

2. Hardware Architecture and Control Plane

The network architecture is shown in Fig. 1(a). In this converged architecture Rack/Pod switches are aggregated usingElectrical Packet Switches (EPS) for intra DC connectivity. The optical gateway [7] that is a high port count Colorless,Directionless, Contentionless Reconfigurable Optical Add/Drop Multiplexer (CDC-ROADM) manages the optical in-ter DC connectivity. Racks/Pods are connected to the gateway by optical metro transceivers with different rates (10G,40G, 100G, etc.) to support dedicated dynamic Rack-to-Rack and/or Pod-to-Pod connections (shown in green). Dy-namic connections are utilized for high priority long-lived traffic. The EPS aggregation switch is also connected tothe optical gateway with multiple transceivers for background connections (shown in red). Background connectionstransmit low priority short-lived traffic. The function of the SDN control plane is illustrated in the flowchart shownin Fig. 1(b). The control plane consists of the traffic monitoring, network optimizer and topology manager modules.Each DC is divided into different subnets based on the size. ToRs and/or Pods are equipped with OpenFlow switchesand have permanent flow rules for global inter DC connectivity through background connections. The traffic monitor-ing module receives flow counters of the background and dynamic flow rules periodically. The decision to add/dropbackground/dynamic connections is based on the average traffic in the last n seconds and the Standard Deviation (SD)of the traffic on the background connections. Machine learning tools can be applied to optimize the decision makingprocess using historical data.

Fig. 2(a) shows the proposed network control algorithm. At the beginning, a set of static background connectionsare established to provide all-to-all connectivity among the metro DCs. The monitoring module measures the traffic on

W3D.2.pdf OFC 2017 © OSA 2017

Optical GatewayOptical Gateway

DC3

DC2 – 50k serversDC1 – 50k servers

Optical Gateway

EPS

DC controller

DC controller

DC controller

Metro network controller

Copper link

Backup (Priority 0)Video Streaming (Priority 2) VM Migration (Priority 1)

Dynamic (Multi-Rate)

Rack

s

Core EPS

EPS

Rack

s

EPS

Rack

s

EPS

Rack

s

Core EPS

EPS

Rack

s

EPS

Rack

s

EPS

Background

Control link

Fiber link

10.2

.1.x

10.2

.2.x

10.2

.3.x

10.1

.1.x

10.1

.2.x

10.1

.3.x

10G, 40G

(a)

Source Destination Subnet Counter

DC2 – Rack1 DC1 – Rack3 (10.0.3.x) AA Bytes

DC2 – Rack1 DC3 – Rack1 (10.3.1.x) BB Bytes

Initialize Network• Background Connections

Traffic Monitoring (f = n sec)• Under-usage Alert• Over-usage Alert

Optical Gateway Electronic Packet Switches (EPS)

Network Optimizer• Connection Request Aggregation• Routing and Wavelength Assignment

Topology Manager• Link Establishment• Link Removal

NetworkDatabase

Control Plane

Data PlaneOpenFlow

FlowMonitor

FlowUpdateFlowUpdate

From OpenFlow EPS

(b)

Fig. 1. (a) Proposed converged architecture with background/dynamic connections, (b) Self-adaptive control plane workflow.

the background connections. If the traffic on background connection B between source and destination DCs (DCS andDCD) is higher than a specified threshold, the network optimizer runs a routing and wavelength assignment algorithmto identify a possibility to establish a new lightpath between DCS and DCD (in our work we use k-Shortest Path withFirst-Fit). In the positive case, the network optimizer calculates the SD of the traffic of the different flows in B. If theSD is ≤1, the network optimizer creates a new background connection between DCS and DCD. On the other hand, ifthe SD is >1, the network optimizer creates a new dynamic connection for carrying the flows between the Racks/Podsgenerating the highest traffic. The network optimizer creates the new dynamic connection using the highest availabledata rate (in this work we assume 10G and 40G rates). If the new dynamic connection has high priority (Critical), thenetwork optimizer can force an active dynamic connection with lower priority (Bulk) to move to a lower data rate.

3. Prototype, Experimental and Numerical Results

We developed an event-driven simulator to evaluate the benefits of the proposed architecture and control algorithm ina realistic network scenario. The reference metro topology is composed of 38 nodes, 59 links and 100 wavelengths perfiber [6]. Each node represents a metro DC with 100 Racks/Pods. Each Rack/Pod switch is equipped with one 10G andone 40G WDM tunable transceivers connected to the optical gateway. The EPS that aggregates Racks/Pods has 25 10Ggrey transceiver connected to the optical gateway as well. We assume that Rack/Pod switches generate traffic flowswith lognormal inter-arrival distribution. We vary the mean of the lognormal distribution to mimic different trafficloads. In average half of the traffic flows have low priority (Bulk) and half have high priority (Critical). The flowsrepresent data transfers with sizes uniformly distributed between 1 and 500 GB. We compared the performance of ourConverged Multi-Rate (MR) network with a Converged Single-Rate (SR) and a Conventional network, all with overallthe same capacity. In the Converged SR only 10G transceivers are used for both background and dynamic connections,while the Conventional relies only on the background connections [6].

Fig. 2(b) shows the average time required to complete a data transfer as a function of the load. It can be observedthat the proposed Converged MR provides in average 2.5× faster Critical transfers and 2× faster Bulk transfers, withrespect to the Converged SR. This is due to the use of multi rate transmission and to an effective QoS management.In addition, the Converged MR provides 5× faster Critical and Bulk transfers compared to the Conventional network.This is due to the combined use of multi-rate transmission and dynamic connections. Fig. 2(c) shows the averagenumber of wavelengths required to carry different loads. The Converged MR requires at least 20% less wavelengthsthan the Converged SR and 25% less wavelengths than the Conventional, i.e., it offers a more efficient resource usage.

We built a 3-node DC network prototype as shown in Fig. 2(d), which is an experimental implementation of thearchitecture shown on Fig. 1(a). DC1 and DC2 are emulated with 4 ToRs, each connected to one server. ToR switchesare implemented using Pica8 OpenFlow switches with 10 Gbps server ports and 10 Gbps and 40 Gbps uplinks. Theyare aggregated with a 10G EPS switch for intra DC connectivity. Two optical gateways are implemented using Calientand Polatis OSS, Nistica WSS, and DWDM Mux/Demux. In each ToR one 10G and one 40G, and from the EPS two10G transceivers are connected to the optical gateway. The 10G optical transceivers are DWDM SFP+ with 24 dBpower budget while for 40G, due to limitations in single wavelength 40G DWDM transceivers, we used (4×10G)QSFP+ with 18 dB power budget. DC3 is only implemented in the control plane and the distances between the DCsare 5 to 25 km. We have assigned 10.1.x subnet to DC1, 10.2.x subnet to DC2. The controller server is connected tothe ToR switches via 1 Gbps campus Internet network. Once the network is initialized, DC1 and DC2 are connectedthrough one 10 Gbps background connections with permanent flow rules on all electronic switches. The control plane

W3D.2.pdf OFC 2017 © OSA 2017

(a)

Load per DC [Gb/s]0 500 1000 1500 2000

Avera

ge tra

nsfe

r tim

e [s]

0

100

200

300

400

500

600

700Converged MR: BulkConverged MR: CriticalConverged SR: BulkConverged SR: CriticalConventional: BulkConventional: Critical

(b)

Load per DC [Gb/s]777 1389 2000

# o

f u

sed

wa

vele

ng

hts

0

1000

2000

3000

4000

5000

6000Converged MRConverged SRConventional

(c)

DC1 OSS

DC2 OSS

DC1 ToRsDC2 ToRs

DC1

& 2

Ser

vers

Controller

2 x WSS

Optical Gateway

Circulators

Combiners

Mux/Demux

Mux/Demux

25 km SMF

5 km SMF

(d)

Time [s]10 20 30 40 50 60 70

Thro

ughp

ut [G

bps]

0

1

2

3

4

5

6Rack 1Rack 2Rack 3Rack 4

Traffic Increase

Link Saturation

New 10G Background Link

(e)

Time [s]10 20 30 40 50 60 70

Thro

ughp

ut [G

bps]

02468

101214161820

Rack 1Rack 2Rack 3

Traffic Increase on Rack 1

Link Saturation

New 40G Dynamic

Link

(f)

Fig. 2. (a) Network control algorithm, (b) Average transfer times, (c) Average network resource usage, (d) Experimentalsetup, (e) Autonomous background link establishment between DC1 and DC2 after link saturation by low SD traffic, (f)Autonomous dynamic Rack-to-Rack link establishment between DC1 and DC2 after increase in only Rack1 traffic.

receives the flow counters from the electronic switches every 2s and is averaged every 4 measurements.In the first set of experiments, we evaluated automated bandwidth steering on the background connections. Racks

1–4 between DC1 and DC2 are transmitting data with bit rates from 0.8-1.2 Gbps, total 4 Gbps on the backgroundconnection. At 25s, the throughput increases to 4 Gbps on all four Racks, overall 16 Gbps. The background linksaturates at 30s to the total link capacity of 10 Gbps. At this point, the monitoring module of the controller detectsthe link saturation (>9 Gbps) with the SD of ≤1. In this case, a new background link is established (32s) and halfof the traffic is randomly moved to the new background connection (Racks 1, 3). Now each background connectionscarries 8 Gbps of traffic. Fig. 2(e) shows the throughput as a function of time. Next, we demonstrate autonomousbandwidth adjustment for a dynamic Rack-to-Rack connection. Fig. 2(f) shows the results. Rack 1, 2, 3 between DC1and DC2 are transmitting data with data rates of 0.8, 1, and 1.2 Gbps, respectively. At 23s, Rack 1 requires morebandwidth (18 Gbps traffic) and start saturating the background traffic (30s). At this point, the controller has measuredhigh throughput (>9 Gbps) on the background connection and since the SD of the 3 traffic flows is larger than 1, adedicated Rack-to-Rack connection is established for Racks 1 of the DCs (32s). At this point a dedicated dynamic 40Gbps link is transferring Rack 1 data and the 10 Gbps background link transfers Racks 2 and 3.

4. Conclusion

We proposed a self-adaptive multi-rate optical network architecture and provisioning strategy for geographically dis-tributed metro DCs. Based on the traffic characteristics and requested QoS, the SDN control plane detects, grooms andautonomously provisions the bandwidth resources across the network. Simulation results in a realistic scenario show2.5–5× shorter transmission times and 20–25% lower wavelength usage compared with converged and conventionalsingle-rate networks. The architecture and control plane were experimentally validated on a prototype.

AcknowledgmentThis work was supported in part by CIAN NSF ERC (EEC-0812072), NSF NeTS (CNS-1423105), DoE ASCR Turbo Project (DE-SC0015867)and the Swedish Research Council (VR). We would also like to thank ATT, Calient and Polatis for generous donations to our testbed.

References1. Cisco White Paper, “Cisco Visual Networking Index: Forecast and Methodology, 2014-2019 White Paper,” Aug. 2015.2. S. Yan et al., “Archon: A Function Programmable Optical Interconnect Architecture for Transparent Intra and Inter Data Center... ,” JLT 2015.3. Gang Chen et al., “First Demonstration of Holistically-Organized Metro-Embedded Cloud Platform with All-Optical... ,” OECC, 2015.4. R. Doverspike et al., “Using sdn technology to enable cost-effective bandwidth-on-demand for cloud services,” JOCN 2015.5. D. Adami et al., “Cloud and Network Service Orchestration in Software Defined Data Centers,” SPECTS 2015.6. M. Fiorani et al., “Flexible Architecture & Control Strategy for Metro-Scale Networking of Geographically Distributed DCs,” ECOC 2016.7. P. Samadi et al.,“Software-Defined Optical Network for Metro-Scale Geographically Distributed Data Centers,” OE 2016.

self-adaptive, multi-rate optical network for ...lightwave.ee.columbia.edu/files/samadi2017c.pdf ·...

Documents