understanding the performance guarantee of physical
TRANSCRIPT
42
Understanding the Performance Guarantee of PhysicalTopology Design for Optical Circuit Switched Data Centers
SHIZHEN ZHAO, PEIRUI CAO, and XINBING WANG, Shanghai Jiao Tong University, China
As a first step of designing Optical-circuit-switched Data Centers (ODC), physical topology design is critical
as it determines the scalability and the performance limit of the entire ODC. However, prior works on ODC
have not yet paid much attention to physical topology design, and the adopted physical topologies either scale
poorly, or lack performance guarantee.
We offer a mathematical foundation for the design and performance analysis of ODC physical topologies in
this paper. We introduce a new performance metric ðœ (G) to evaluate the gap between a physical topology
G and the ideal physical topology. We develop a coupling technique that bypasses a significant amount of
computational complexity of calculating ðœ (G). Using ðœ (G) and the coupling technique, we study four physicaltopologies that are representative of those in literature, analyze their scalabilities and prove their performance
guarantees. Our analysis may provide new guidance for network operators to design better physical topologies
for their ODCs.
CCS Concepts: â¢NetworksâNetwork performance modeling;Network performance analysis;Datacenter networks.
Additional Key Words and Phrases: Optical Circuit Switch, Data Center, Physical Topology, Logical Topology
ACM Reference Format:Shizhen Zhao, Peirui Cao, and Xinbing Wang. 2021. Understanding the Performance Guarantee of Physical
Topology Design for Optical Circuit Switched Data Centers. Proc. ACMMeas. Anal. Comput. Syst. 5, 3, Article 42(December 2021), 24 pages. https://doi.org/10.1145/3491054
1 INTRODUCTIONAs data center traffic doubles every year [25], building Clos topologies [1, 12, 25] for data centers
using electrical switches is becoming more and more expensive and power prohibitive [2]. In order
to meet the growing demand at reduced energy cost, building ODCs is becoming a promising
alternative for future data centers. An optical circuit switch, e.g. Calient [4], could offer at least
hundreds of times higher switching capacity than an electrical switch, while its energy cost is
hundreds of times lower (less than 45 Watts for an optical circuit switch with 320 Tx/Rx pairs).
While the eventual goal of ODC design is the full-optical data center, due to technological
immaturity, existing designs of ODC have mostly adopted a hybrid design that involves both
electrical switching and optical switching. The design of an ODC typically covers the following
four aspects:
(1) Physical topology design: Determine how to interconnect hosts, electrical packet switches
(EPS) and optical circuit switches (OCS).
(2) OCS control: Determine the configurations of all the OCS nodes in an ODC.
Shizhen Zhao is the corresponding author.
Authorsâ address: Shizhen Zhao; Peirui Cao; Xinbing Wang, Shanghai Jiao Tong University, China.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses,
contact the owner/author(s).
© 2022 Copyright held by the owner/author(s).
2476-1249/2021/12-ART42
https://doi.org/10.1145/3491054
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
This work is licensed under a Creative Commons Attribution International 4.0 License.
© 2021 Copyright held by the owner/author(s). 2476-1249/2021/12-ART42. https://doi.org/10.1145/3491054
42:2 Shizhen Zhao, Peirui Cao, and Xinbing Wang
(3) EPS control: Set up the routing strategies for all the EPS nodes. Since OCS reconfiguration
may affect the connected EPS nodes, the queuing and buffer management policies of the EPS
nodes may also require careful redesign.
(4) Host control: Modify the host protocol stacks at different layers in accordance with the
OCS/EPS control strategies to attain better end-to-end performance.
Existing works on ODC, e.g., Helios [10], c-Through [26], Mordia [24], REACToR [19], OSA [7],
Solstice [20], MegaSwitch [8], FireFly [13], ProjectToR [11], RotorNet [22], Opera [21], etc., have
primarily focused on the OCS/EPS/Host control aspects, but their physical topology designs may not
support large-scale data centers (see Section 2 for more details). Only until recently, three scalable
physical topology designs are adopted by Flexfly [27], Sirius [2] and 3D-Hyper-FleX-LION [18].
However, the performance guarantee of these physical topologies remain unclear. To the best of
our knowledge, there is a lack of rigorous performance metrics that evaluate the âgoodnessâ of
a physical topology design, and there is no systematic study on how to design a good physical
topology for ODC.
In this paper, we focus on the physical topology design for ODC, and offer a rigorous performance
evaluation for different physical topologies. One challenge of this work is to design a performance
metric for physical topologies. While much of the literature on ODC has adopted network through-
put, flow completion time, max link utilization, etc., to evaluate the end-to-end performance of an
ODC, these metrics may not accurately reflect the âgoodnessâ of a physical topology. The reason is
that, both the physical topology design and the network control policies may affect these metrics,
and thus it is hard to evaluate the exact contribution of a physical topology. To eliminate the impact
of network control policies, we introduce ` (G, ð ), which is defined as the optimal throughput
across all possible network control policies for a given physical topology G and a given data center
demand pattern ð (see §3.2). Based on ` (G, ð ), we then define âðœ-optimalityâ for a given physical
topology G, i.e., G is said to be ðœ (G)-optimal as long as ` (G, ð ) ⥠ðœ (G)` (Gâ², ð ) for any physical
topology Gâ²and any demand pattern ð (see §3.3). A physical topology G is said optimal if ðœ (G) = 1.
We first study how to design the optimal physical topologies with ðœ (G) = 1. A naive approach
requires calculating ` (G, ð ) for different ð âs and optimizing ` (G, ð ) among different physical
topologies. Unfortunately, this approach is computationally prohibitive. First, the formulation of
` (G, ð ) is essentially an integer programming problem (see (5) in §3.2), which is NP-hard in general.
Second, there are an uncountable number of different traffic patterns, and it is impossible to calculate
` (G, ð ) one by one. To circumvent the above difficulty, we introduce a coupling technique (see
Lemma 1 in §3.3) to compare two physical topologies. The intuition is that, if one physical topology
can âimitateâ all the control strategies of another physical topology, then the first physical topology
will perform no worse than the second one. Using this coupling technique, we prove that the ideal
physical topology and the uniform bipartite physical topology are both optimal. While these two
physical topologies have been adopted by many existing ODC designs [5, 7, 10, 19â22, 24, 26] to
build prototypes, they scale poorly to support large data centers.
We then study how to design scalable physical topologies for large ODCs. Along this direction,
we first prove a negative result that no physical topology is optimal when the number of EPS nodes
exceeds the number of ports of an OCS node. Hence, we have to seek for sub-optimal physical
topologies, and calculate ðœ (G) as a performance guarantee of these physical topologies. We could
still use the coupling technique to calculate ðœ (G) for these physical topologies. However, due
to the lack of direct connectivity between certain node pairs, it may not always be possible for
one physical topology to imitate the control decisions of another physical topology. To overcome
this challenge, we enhance the above coupling technique using overlay topologies. If one physicaltopology allows constructing an overlay topology to âimitateâ all the control strategies of another
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers 42:3
physical topology, then the first physical topology will perform no worse than the second one (see
Lemma 2 in §3.3). Using the enhanced coupling technique, we successfully calculated ðœ (G) forthree representative physical topologies.
The contributions of this work are summarized below:
(1) We introduce a new metric ðœ (G) to evaluate the goodness of a physical topology G.
(2) We develop a coupling technique to calculate ðœ (G). This technique circumvents the compu-
tational complexity of calculating ` (G, ð ) for any traffic pattern ð .
(3) We design an optimal physical topology (with rigorous proof) for small-scale data centers.
(4) We prove that no physical topology is optimal for large-scale data centers.
(5) Motivated by Flexfly [27], 3D-Hyper-FleX-LION [18] and Sirius [2], we design three physical
topologies that can scale to large ODCs, and prove their performance guarantees ðœ (G).
2 RELATEDWORKExisting works on optical circuit switched data centers have primarily focused on the overall
network control. The adopted physical topology designs either scale poorly, or lack performance
guarantee.
c-Through [26], Mordia [24], REACToR [19], OSA [7], Solstice [20] simply use one OCS to connect
all the Top-of-Rack (ToR) switches. This physical topology design could only support tiny data
centers with Î(100) number of servers. To improve scalability, MegaSwitch [8] uses fiber rings
for interconnection (a similar idea was also discussed in [24]). This physical topology architecture
could support Î(1000) number of servers, but still does not work for large-scale data centers with
over 100k servers and thousands of ToRs.
Using free-space optics, FireFly [13] and ProjectToR [11] enable optical interconnection for thou-
sands of ToRs. However, the free-space optical switching technology faces tremendous deployment
complexity, as many environmental factors (e.g., vibration, dust, and humidity) may hinder the
performance of the free-space optical links.
Helios [10], RotorNet [22], Opera [21], TROD [5] create a uniform bipartite graph between all the
PoDs/ToRs and all the OCSs. This physical topology turns out to be optimal based on our analysis
in §4. While this design scales well for the PoD-level interconnect, it scales poorly for the ToR-level
interconnect. A large-scale data center may require hundreds of OCSs to connect over 100k servers.
Unfortunately, a ToR switch only has tens of uplinks and thus it is impossible to establish a link
between every pair of ToR and OCS. (In contrast, a PoD could have hundreds of uplinks.)
Recently, Flexfly [27], Sirius [2] and 3D-Hyper-FleX-LION [18] offered three physical topologies
that can potentially scale to large-scale data centers. However, no performance guarantee is provided
for any of the three physical topologies.
3 MATHEMATICAL MODEL3.1 Basic DefinitionsWe study a network with ð electrical-packet-switching (EPS) nodes S = {ð1, ð2, ..., ðð } and ðŸoptical-circuit-switching (OCS) nodes O = {ð1,ð2, ...,ððŸ }. Each EPS node has ð¿ ports, each of
which has a transceiver. Each OCS node has ð ingress ports and ð egress ports. Each EPS port can
either connect to another EPS port, or connect to a pair of OCS ports. Note that when we connect
an EPS port to OCS ports, we need to separate the transmitter and the receiver of the EPS port,
have the transmitter connected to an OCS ingress port, and have the receiver connected to an OCS
egress port. Since an OCS port capacity is typically much higher (100Ã and more) than an EPS port
capacity, the capacity of each network link is dominated by the EPS port capacity ðµ.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:4 Shizhen Zhao, Peirui Cao, and Xinbing Wang
Remark on OCS-OCS connections: The optical transceivers commonly used in data centers,
e.g., 100G QSFP28 CWDM4, typically have a link margin of 5dB and a maximum transmission
distance of 2km. Note that signals traversing an OCS node could experience 3dB optical loss. To
enable OCS-OCS connections, one may need to replace the 2km optical transceivers by 10km
optical transceivers, e.g, QSFP28 LR4. However, the QSFP28 LR4 transceiver has a much higher
consumption and laser cost than the QSFP28 CWDM4 transceiver. Due to the above cost reasons,
we do not allow OCS-OCS connections in this paper. Nevertheless, the methodologies introduced
in this paper can be easily generalized to incorporate OCS-OCS connections.
Traffic Matrix: Let ðð ð be the relative traffic demand between the EPS node ðð and the EPS node
ð ð . Then, the network demand can be modeled using a traffic matrix ð = [ðð ð , ð = 1, 2, ..., ð , ð =
1, 2, ..., ð ] .Physical Topology: We use G = (V, E), where V = S ⪠O, to denote the physical topology
among all the EPS nodes and all the OCS nodes. As shown in Fig. 1(a), the edges of G consist of
two parts E = E1 ⪠E2, where E1 is the set of bidirectional links that interconnect two EPS nodes
and E2 is the set of unidirectional links that interconnect one EPS node and one OCS node. The
port count constraints of the EPS nodes and the OCS nodes are reflected by the degrees of the
nodes in G. Specifically, the in-degree and out-degree of each EPS node are no more than ð¿ and the
in-degree and out-degree of each OCS node are no more than ð . Note that there could be multiple
physical links connecting two adjacent nodes.
OCS Configuration: Each OCS node has ð ingress ports and ð egress ports. As shown in Fig. 1(b),
an OCS configuration is essentially a 1 â 1 mapping from the ð ingress ports to the ð egress
ports. In this paper, we denote by Î ð , ð = 1, 2, ..., ðŸ as the OCS configuration of ðð , and let
Î = {Î 1,Î 2, ...,Î ðŸ }. An OCS configuration helps create multiple logical links between EPS node
pairs. For example, if an ingress port ð is mapped to an egress port ð in an OCS node, let ðð be
the EPS node that connects to ð and ð ð be the EPS node that connects to ð, then a unidirectional
logical link is created between ðð and ð ð . Since OCS nodes do not perform packet decoding, a packet
traversing through a logical link will not be aware of the underlying OCS node.
Logical Topology: Given a physical topology G and an OCS configuration Î , a logical topologyGð = Gð (G,Î ) = (Vð , Eð ) is formed. In the logical topology Gð (also see Fig. 1(c)), the node set
Vð = S only contains the EPS nodes; the link set Eð = E1 ⪠E3, where E1 is the set of physical
links among all the EPS nodes and E3 is the set of logical links formed by the OCS configuration Î .We use a non-negative integer ð¥ð ð (Gð ) to denote the number of links from ðð to ð ð in Gð . Note thatð¥ð ð (Gð ) could be larger than one.
3.2 Network Control for Optimal ThroughputNetwork performance are usually evaluated from different dimensions, including throughput,
latency, flow completion time [9], deployment complexity [28], expansion complexity [28, 29], etc.
We focus on throughput in this paper due to the following reasons. First, compared to latency and
flow completion time, throughput can be mathematically formulated and thus is easy to evaluate
even at the planning stage. Second, having a higher throughput could also help reduce latency, flow
completion time, expansion complexity (the number of expansion stages can be reduced [28]), etc.
Given a traffic pattern ð and a physical topology G = (V, E), network throughput can be affectedby various control policies, including route selection, load balancing, congestion control, etc. For
ODC, the OCS configurations can be also a determining factor for throughput. We consider an
âoracleâ that could choose the best control policies for every pair of (G, ð ). Specifically, this oracleperforms topology engineering (ToE) and traffic engineering (TE) to attain the optimal network
throughput, denoted by ` (G, ð ). Admittedly, this oracle is hard to realize in practice. But on the
other hand, this oracle allows us to eliminate the impact of different control policies on ` (G, ð ),
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers 42:5
âŠ
âŠ
⊠⊠âŠ
(a) Physical topology.
âŠ
âŠ
⊠⊠âŠ
(b) OCS configuration.
âŠ
âŠ
⊠⊠âŠ
(c) Logical topology.
Fig. 1. Physical topology and logical topology. When an OCS configuration acts on a physical topology, thelogical topology will be changed. In (a) physical topology, red arrowlines represent the set of bidirectionallinks that interconnect two EPS nodes and green arrowlines represent the set of unidirectional links thatinterconnect one EPS node and one OCS node. In (c) logical topology, blue arrowlines represent the setof logical links formed by the OCS configuration. Note that each EPS port can either connect to one redbidirectional arrowline, or connect to two green unidirectional arrowlines with opposite directions.
and thus ` (G, ð ) can be used to measure the goodness of the physical topology G under the traffic
pattern ð . Next, we will describe how this oracle calculates the optimal throughput ` (G, ð ).Topology Engineering (ToE): ToE controls the logical topologies used to serve different traffic
patterns. For each traffic pattern ð , ToE could either generate one logical topology, or generate
multiple logical topologies and perform time sharing among these logical topologies. Letð ⥠1 be
the number of logical topologies generated by ToE. Each logical topology, denoted by G (ð)ð
,ð =
1, 2, ..., ð, corresponds to a set of ðŸ OCS configurations Î (ð) = {Î (ð)1,Î (ð)
2, ...,Î (ð)
ðŸ}, and lasts for
Î(ð)amount of time. Note that reconfiguring OCSs could incur a reconfiguration latency ð¿ , which
ranges from hundreds of nanoseconds to tens of milliseconds, depending on the optical switching
technology being used. Hence, the average bandwidth allocated to (ðð¢, ðð£) can be computed as
ðµð¢ð£ =
(ðâð=1
ð¥ð¢ð£ (G (ð)ð
)Î(ð)ðµ
) / (ðð¿ +
ðâð=1
Î(ð)
). (1)
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:6 Shizhen Zhao, Peirui Cao, and Xinbing Wang
In addition, traffic patterns may change in an ODC. To model this fact, we impose the following
constraint on Î(ð):
ðð¿ +ðâð=1
Î(ð) †Î, (2)
where Î is the duration of the traffic pattern ð .
Traffic Engineering (TE): Given a ToE solution, network operators can then perform traffic
engineering to maximize network throughput. Here we adopt the max-flow formulation. For every
flow from ðð to ð ð , let ðð ð (ð¢, ð£) be the amount of traffic allocated to (ðð¢, ðð£). The traffic allocation
ðð ð (ð¢, ð£) must satisfy the following constraints:
(1) Flow conservation constraints: The total ingress traffic equals to the total egress traffic at
every EPS node, i.e.,âð€1â ð£
ðð ð (ð€1, ð£) =âð€2â ð£
ðð ð (ð£,ð€2), â ð£ â ð, ð,` ðð ð +
âð€1â ð
ðð ð (ð€1, ð) =âð€2â ð
ðð ð (ð,ð€2),âð€1â ð
ðð€1 ð (ð¢, ð) = `ðð ð +âð€2â ð
ðð ð ( ð,ð€2),(3)
where ` is a scaling factor of the network traffic pattern ð .
(2) Network capacity constraints: For any EPS node pair (ðð¢, ðð£), the total amount of traffic
allocated to (ðð¢, ðð£) must be no larger than ðµð¢ð£ , i.e.,
ðâð=1
ðâð=1
ðð ð (ð¢, ð£) †ðµð¢ð£ . (4)
Defining ` (G, ð ): Note that we introduced a scaling factor ` in (3). We could optimize ` by solving
the following optimization problem:
max `
s.t. G (ð)ð
= Gð (G,Î (ð) ),âð = 1, 2, ..., ð,
`, ðµð¢ð£, ðð ð (ð¢, ð£),G (ð)ð
,Î(ð)satisfy (1)(2)(3)(4).
(5)
Then, ` (G, ð ) is defined as the optimal objective value of the above optimization problem.
Remark on the complexity of calculating ` (G, ð ): Even though we have defined ` (G, ð ) rig-orously, calculating ` (G, ð ) turns out to be an NP-Complete problem (see Appendix A for the
detailed proof). We could use Gurobi [23], the worldâs fastest linear programming solver, to solve (5).
Unfortunately, (5) contains Î(ð 2ðŸð) number of integer variables and Î(ð 4) fractional variables.As we will show in Section 7.2, Gurobi may fail to solve (5) even when there are only 9 EPS nodes.
Remark on TE:We have adopted an âedge formulationâ for TE in (3) and (4). This edge formulation
requires Î(ð 4) routing variables, one for every link & flow pair. Another widely used formulation
for TE is the âpath formulationâ (see SWAN [14]). The path formulation pre-computes a set of paths
for each flow, and then defines a traffic-spitting ratio variable for every flow and every path. This
formulation generally requires fewer number of decision variables. However, its logical topology
must be determined first in order to compute paths. In contrast, calculating ` (G, ð ) requires a jointoptimization over topology and routing, and thus cannot adopt the âpath formulationâ.
3.3 Objective of Physical Topology DesignDefinition 1. A physical topology G is said to be ðœ-optimal if for every traffic pattern ð ,
` (G, ð ) ⥠ðœ` (Gâ², ð )holds for any possible physical topology Gâ². ðœ is called the performance ratio of G.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers 42:7
Clearly, every physical topology G corresponds to a ðœ (G) â [0, 1]. ðœ (G) can be viewed as a
performance guarantee of the physical topology G, and the larger the better. A physical topology
is said to be optimal if its corresponding ðœ = 1.
Unfortunately, directly calculating ðœ (G) based on (5) and Definition 1 is computationally in-
tractable for large scale ODCs. First, calculating ðœ (G, ð ) is NP-Complete. Second, there are un-
countable number of traffic patterns ð , and it is impossible to calculate ` (G, ð ) one by one. Due
to the above difficulties, we have to resort to a theoretical approach. The key to our theoretical
approach is a coupling technique detailed in the following two lemmas. This coupling technique
helps us circumvent the algorithmic complexity of calculating ðœ (G, ð ) for all possible ð âs.
Lemma 1. Given two physical topologies G and Gâ². If any logical topology Gð formed on top of Gcan be also formed on top of Gâ², then for any traffic pattern ð , ` (Gâ², ð ) ⥠` (G, ð ).
Proof. Let (Gâ(ð)ð
, ð âð ð (ð¢, ð£)) be the optimal topology+routing solution that attains the optimal
throughput ` (G, ð ). Since the logical topologies Gâ(ð)ð
are also realizable on Gâ², (Gâ(ð)
ð, ð âð ð (ð¢, ð£))
must also satisfy the constraints in (5) when calculating ` (Gâ², ð ). Therefore, ` (Gâ², ð ) ⥠` (G, ð ). â¡
Lemma 1 can be also generalized to the concept of overlay topology, which is defined below:
Definition 2. Given a logical topology Gð . A topology Gðð can be formed as an overlay topologyof Gð if we can reserve a number of paths for every source and destination pairs ðð , ð ð , such that thefollowing two conditions are met:
(1) From the overlay topology Gðð aspect, for every pair of EPS nodes ðð , ð ð , the total reserved capacityamong all paths is equal to ð¥ð ð (Gðð )ðµ.
(2) From the underlay topology Gð aspect, for any EPS node pair (ðð , ð ð ), the total capacity offeredfor reservation does not exceed ð¥ð ð (Gð )ðµ.
Lemma 2. Given two physical topologies G and Gâ². If for any logical topology Gð formed on topof G, there is a logical topology Gâ²
ðformed on top of Gâ², such that Gð can be formed as an overlay
topology of Gâ²ð, then for any traffic pattern ð , ` (Gâ², ð ) ⥠` (G, ð ).
Proof. See Appendix B. â¡
3.3.1 An Ideal Physical Topology. Fix the number of links ð¿ and the link capacity ðµ of an EPS node.
Assume that each OCS node has the infinite number of ports. Then, we could connect all the EPS
nodes to a single OCS node and obtain an ideal physical topology Gideal
ð¿,ðµ. It is easy to verify that
any logical topology formed on top of any physical topology is realizable on top of Gideal
ð¿,ðµ. Then
according to Lemma 1, Gideal
ð¿,ðµis optimal, as stated below:
Theorem 3. For any fixed number of links ð¿ and fixed link capacity ðµ, Gidealð¿,ðµ
is optimal.
Based on the ideal physical topology Gideal
ð¿,ðµ, we can then define the ideal throughput `ideal
ð¿,ðµ(ð ) for
an arbitrary traffic pattern ð as
`idealð¿,ðµ (ð ) = ` (Gideal
ð¿,ðµ , ð ).
Clearly, `idealð¿,ðµ
(ð ) scales linearly with respect to ðµ. Further, as the number of links ð¿ of an EPS node
increases, the ideal throughput `idealð¿,ðµ
(ð ) would never decrease.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:8 Shizhen Zhao, Peirui Cao, and Xinbing Wang
4 OPTIMAL PHYSICAL TOPOLOGY FOR SMALL DATA CENTERS WITH ð †ð
Building an ideal physical topology requires ð ⥠ðð¿. This only applies to tiny data centers with
at most tens of EPS nodes1. In this section, we study a less restrictive case where ð †ð , which
corresponds to hundreds of EPS nodes.
Before proposing our physical topology design, we first introduce the following lemma.
Lemma 4. Given an ð à ð non-negative integer matrix x = {ð¥ð, ð }, for any integer ð» ⥠1 andmutually disjoint sets A1,A2, ...,Að¶ satisfying âªð¶ð=1
Að = {1, 2, ..., ð } and Að1â© Að2
= â ,âð1 â ð2,there must exist ð» non-negative integer matrices x(1) , ..., x(H) satisfying(1) x = x(1) + · · · + x(H) ;
(2) 0 †âðð=1ð¥(â)ð, ð
â€ââð
ð=1ð¥ð,ð
ð»
â, for any ð = 1, ..., ð and any â = 1, 2, ..., ð» ;
(3) 0 †âðâAð
âðð=1ð¥(â)ð, ð
â€ââ
ðâAð
âðð=1ð¥ð,ð
ð»
â, for any ð = 1, ...,ð¶ and any â = 1, 2, ..., ð» ;
(4) 0 †âðð=1ð¥(â)ð, ð
â€ââð
ð=1ð¥ð,ð
ð»
â, for any ð = 1, ..., ð and any â = 1, 2, ..., ð» ;
(5) 0 †âðð=1
âð âAð
ð¥(â)ð, ð
â€ââð
ð=1
âðâAð
ð¥ð,ð
ð»
â, for any ð = 1, ...,ð¶ and any â = 1, 2, ..., ð» .
Lemma 4 can be proved by transforming it into a sequence of max-flow problems. It is actually
a direct consequence of Theorem 3 in [29] by setting A = B = {A1, ...,Að¶ , {1}, ..., {ð }}. SeeAppendix A in [29] for the detailed proof.
4.1 Physical Topology DesignMotivated by Lemma 4, we set the number of OCS nodes as ðŸ = ð¿ and evenly distribute the ð¿ links
of each EPS node across all the OCS nodes. In other words, the physical topology G becomes a
uniform bipartite graph, with exactly one bidirectional physical link between every pair of EPS
node and OCS node. (More specifically, the transmitter of the EPS port is connected to an OCS
ingress port, and the receiver of the EPS port is connected to an OCS egress port.) Note that ð †ð
ensures that the number of ports of each OCS node is not exhausted. We denote such a physical
topology by Guniform, as shown in Fig. 2. We can prove that Guniform is optimal.
âŠ
âŠ
âŠ
⊠⊠âŠ
⊠⊠⊠âŠ
âŠ
Fig. 2. Guniform topology.
Theorem 5. If ð †ð , then Guniform is optimal.
Proof. We only need to prove that any logical topology Gð is realizable on top of the physical
topology Guniform.
1A practical OCS node has hundreds of ports and a practical ToR switches has tens of ports.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers 42:9
Since each EPS node has ð¿ ports, we must have{ âðð=1ð¥ð ð (Gð ) †ð¿, â ð = 1, ..., ð ,âð
ð=1ð¥ð ð (Gð ) †ð¿, â ð = 1, ..., ð .
(6)
Then, according to Lemma 4, we can decompose {ð¥ð ð (Gð )} into ð» = ðŸ = ð¿ integer matrices
{ð¥ (â)ð ð
(Gð )}, â = 1, 2, ..., ð» satisfying the conditions in Lemma 4. Based on (6) and the conditions (2)
and (4) of Lemma 4, it is easy to check that any {ð¥ (â)ð ð
(Gð )}, â = 1, 2, ..., ð» , must satisfy{ âðð=1ð¥(â)ð ð
(Gð ) †1, â ð = 1, ..., ð ,âðð=1ð¥(â)ð ð
(Gð ) †1, â ð = 1, ..., ð .
Since ð¥(â)ð ð
(Gð )âs are non-negative integers, ð¥ (â)ð ð
(Gð ) is actually a permutation matrix. Note that
each EPS node has one bidirectional physical link connected to the â-th OCS node, it is easy to
verify that the permutation matrix ð¥(â)ð ð
(Gð ) is realizable on the â-th OCS node.
â¡
5 NEGATIVE RESULT ON PHYSICAL TOPOLOGY DESIGNHaving designed an optimal physical topology for the case ð †ð , we thus wonder if it is possible
to design an optimal physical topology for large-scale data centers with ð > ð . Unfortunately, the
following theorem demonstrates the impossibility of such a design.
Theorem 6. If ð > ð , then no physical topology is ðœ-optimal for ðœ > ð+ð â1
2ðâ2. In other words, for
any physical topology G, there exists a traffic pattern ð and a physical topology Gâ², such that
` (G, ð ) †ð + ð â 1
2ð â 2
` (Gâ², ð ).
Proof. We pick ð EPS nodes from the ð EPS nodes, and form a permutationZ = (ð1, ð2, ..., ðð )among the ð EPS nodes. In total, we obtain ð (ð â 1) · · · (ð â ð + 1) different permutations.
For each permutation Z = (ð1, ð2, ..., ðð ), we count the maximum number of unidirectional
logical links that can be formed for ðð1â ðð2
, ðð2â ðð3
, ..., ððð â ðð1under a physical topology
G, and denote this number by ð (G,Z).Step 1:We would like to prove that for any physical topology G, there must exist a permutation
Zâ, such that
ð (G,Zâ) †ð¿ð ð
ð â 1
.
Let ððâð be the maximum number of links that can be created from the EPS node ðð to the EPS
node ð ð . We computeâZð (G,Z) =
âZ
âð, ð :ðâ ð
1{ ð is after ð in Z}ððâð =âð, ð :ðâ ð
ððâð
âZ
1{ ð is after ð in Z} .
Here, â ð is after ð inZ = (ð1, ð2, ..., ðð )â means that either there exists ð = 1, 2, ..., ð â 1, such that
ðð = ð, ðð+1 = ð , or ðð = ð, ð1 = ð . In order to compute the number of permutations in which ð is
after ð , i.e.,â
Z 1{ ð is after ð in Z} ,
(1) we first find a permutation with ð â 2 elements excluding ð and ð , and the total number of
such permutations is (ð â 2) (ð â 3) · · · (ð â ð + 1);(2) we then insert ð and ð to the right places of the above permutation, and the number of
different inserting positions is ð . (For example, when ð = 4, ð ð â â, âð ðâ, â â ð ð and ð â âð arethe 4 generated permutations.)
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:10 Shizhen Zhao, Peirui Cao, and Xinbing Wang
Hence, âZ
1{ ð is after ð in Z} = ð (ð â 2) (ð â 3) · · · (ð â ð + 1).
We then focus on
âð, ð :ðâ ð ððâð . We first fix ð and compute
âð :ðâ ð ððâð . Assume that the EPS node ðð
has at least one transmitter connected to the OCS node ðð . Let ð ð be the number of links between
the egress ports of the OCS node ðð and the receivers of all the EPS nodes excluding ðð . Then it is
easy to check that the contribution of the OCS node ðð toâð :ðâ ð ððâð is at most ð ð †ð . Since the
EPS node ðð can connect to at most ð¿ different OCS nodes, we must haveâð :ðâ ð
ððâð †ð¿ð , for any fixed ð .
Therefore, âð, ð :ðâ ð
ððâð =
ðâð=1
âð :ðâ ð
ððâð †ðð¿ð . (7)
Note that there are ð (ð â 1) · · · (ð â ð + 1) different permutations in total. According to the
Pigeonhole Principle, there must exist a permutation Zâsuch that
ð (G,Zâ) â€â
Z ð (G,Z)ð (ð â 1) · · · (ð â ð + 1) †ð¿ð
ð
ð â 1
.
Step 2:We construct a permutation traffic pattern ð â based on ð â = (ðâ1, ðâ
2, ..., ðâ
ð ), i.e.,
ð âð ð =
1, if ð = ðâð , ð = ð
âð+1, where ð = 1, 2, ..., ð â 1;
1, if ð = ðâð , ð = ðâ
1;
0, otherwise.
Clearly, we can create a physical topology G(ð â) by picking ð¿ OCS nodes and connecting these OCSnodes to the ð EPS nodes ððâ
1
, ððâ2
, ..., ððâð using a uniform bipartite graph. On top of this physical
topology, we can then create ð¿ identical cycles ðâ1â ðâ
2â ... â ðâ
ð â ðâ
1as the logical topology.
It is easy to check that this logical topology offers the highest throughput for the traffic matrix ð â
over the physical topology G(ð â), and ` (G(ð â), ð â) = ðµð¿.We then compute the throughput ` (G, ð â) for ð â over G. Consider the optimal ToE+TE strategy
that attains the throughput value ` (G, ð â). Then, within a time period of Î,
` (G, ð â)Îðâð=1
ðâð=1
ð âð ð = ` (G, ð â)Îð (8)
amount of traffic is delivered from sources to destinations. The delivered traffic can be grouped
into two classes:
(1) Class-1: traffic that is delivered via only one hop;
(2) Class-2: traffic that is delivered via at least two hops.
Class-1 traffic must occupy one of following types of logical links ððâ1
â ððâ2
, ..., ððâð â1
âððâ
ð , ððâ
ð â ððâ
1
. Since the total number of such links is at most ð (G,Zâ), we must have
Class-1 traffic †ð (G,Zâ)Îðµ. (9)
Every byte of the Class-2 traffic occupies at least two bytes of the network capacity. Hence,
2 à Class-2 traffic †(ð¿ð â ð (G,Zâ))Îðµ. (10)
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers42:11
Based on (8)(9)(10), we then obtain
` (G, ð â) †1
ð
(ðµð (G,Zâ) + ðµ
2
(ð¿ð â ð (G,Zâ)))=ðµð¿
2
+ ðµ
2ð ð (G,Zâ)
†ðµð¿
2
+ ðµð¿2
ð
ð â 1
=ð + ð â 1
2ð â 2
` (G(ð â), ð â).
This completes the proof. â¡
Discussion: Note that ð+ð â1
2ðâ2< 1 when ð > ð + 1. Then according to Theorem 6, no physical
topology is optimal. Readers may wonder if it is possible to construct an optimal physical topology
for the case where ð = ð + 1. The answer is no. When ð = ð + 1,ð+ð â1
2ðâ2= 1. This indicates that, if
there exists an optimal physical topology, the â=â must hold for the inequality (7). This â=â requires
that the 2ð OCS ports (ð ingress ports and ð egress ports) must connect to 2ð different EPS nodes.
This is impossible because there are only ð = ð + 1 EPS nodes.
6 SCALABLE PHYSICAL TOPOLOGY DESIGNS WITH PERFORMANCE GUARANTEEIn this section, we propose three scalable physical topology designs and analyze their performance
guarantees. The main results in this section are summarized in Table 1. Note that the maximum
scale of different physical topologies depends on the OCS node port count ð and the EPS node port
count ð¿. A commercially-available OCS node, e.g., the 3D-MEMS based OCS [3] and the Arrayed
Wavelength Grating Routers [2], could have hundreds of ports. An off-the-shelf ToR EPS node, e.g.,
the CloudEngine 8800 [15], could have 32 ⌠64 ports. Hence, the physical topologes studied in this
section could support large-scale ODCs with thousands or even tens of thousands of EPS nodes.
Table 1. The scalability and the performance ratios of different physical topologies.
Physical Topology Maximum Number of EPS Nodes Performance Ratio ðœ
Guniform ð 1
Geps-group ð âð¿/3â minð
{`idealâð¿/3â,ðµ (ð )/`
ideal
ð¿,ðµ(ð )
}Geps-mesh ð 2
minð
{`idealâð¿/3â,ðµ (ð )/`
ideal
ð¿,ðµ(ð )
}Gocs-mesh âð /2â âð¿/2â minð
{`idealâð¿/2â,ðµ (ð )/`
ideal
ð¿,ðµ(ð )
}6.1 EPS-Group based Physical TopologyPhysical topology of Geps-group: Let ð = âð¿/3â. We create ð¶ EPS node groups, each of which
contains ð EPS nodes and is denoted by Að . Within each EPS node group, there are two links
between every pair of EPS nodes. Each EPS node has 2(ð â 1) intra-group links, and the total
number of links in each EPS node group is ð (ð â 1). Each EPS node also has ð links connected
to the OCS nodes to establish inter-group connections. It is easy to verify that the degree of each
EPS node is 3ð â 2 †ð¿. We use ðŸ = ð2OCS nodes and divide these OCS nodes into ð equal-sized
groups. We number the EPS nodes in a group from 1 to ð . The ð-th EPS node in a group has one
link connected to every OCS node in the ð-th OCS node group, where ð = 1, 2, ..., ð . Since the
number of ports of an OCS node is ð , the number of EPS node groups of Geps-group must satisfy
𶠆ð . Thus, the total number of EPS nodes in Geps-group satisfies ð = ð¶ð †ð âð¿/3â .
Theorem 7. For any traffic pattern ð , the optimal throughput under Geps-group satisfies
` (Geps-group, ð ) ⥠`idealâð¿/3â,ðµ (ð ).
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:12 Shizhen Zhao, Peirui Cao, and Xinbing Wang
Fig. 3. Toy example of EPS-Group topology (ð = 2, ð = 6).
Proof. Let Gideal
âð¿/3â,ðµ be the ideal physical topology. According to Lemma 2, we need to show that
any logical topology Gð formed over Gideal
âð¿/3â,ðµ can be realized as a overlay topology over Geps-group.
Since each EPS node in Gideal
âð¿/3â,ðµ has ð = âð¿/3â number of uplinks, the following constraints must
be satisfied { âðð=1ð¥ð ð (Gð ) †âð¿/3â = ð, â ð = 1, ..., ð ,âð
ð=1ð¥ð ð (Gð ) †âð¿/3â = ð, â ð = 1, ..., ð .
(11)
Note that it may not always be possible to establish a logical link between two EPS nodes in
Geps-group, especially when the two EPS nodes are in different EPS node groups and these two
nodes have different intra-group indices. Hence, we are interested in constructing Gð as an overlay
topology on top of Geps-group. For every pair of EPS nodes ðð and ð ð in Geps-group, we would like to
create ð¥ð ð (Gð ) number of virtual logical links in between. Here, a virtual logical links is essentiallya path between two EPS nodes, with each path segment occupying exactly one logical link.
For every EPS node pair (ðð , ð ð ), we classify this pairâs virtual logical links into ð = âð¿/3â types:⢠If ðð and ð ð are in the same EPS node group, the type-ð virtual logical links are two-hop
paths with intermediate node being the ð-th EPS node in this EPS node group. It is possible
that the ð-th EPS node is exactly ðð or ð ð . In this case, this âtwo-hop pathâ degenerates to a
single-hop path.
⢠If ðð and ð ð belong to different EPS node groups, the type-ð virtual logical links are three-hop
paths with intermediate nodes being the ð-th EPS node in the EPS node group of ðð and the
ð-th EPS node in the EPS node group of ð ð . It is possible that the ð-th EPS node in the EPS
node group of ðð is exactly ðð , or the ð-th EPS node in the EPS node group of ð ð is exactly ð ð .
In this case, this âthree-hop pathâ degenerates to a two/one-hop path.
We use ð¥ð
ð ð, ð = 1, 2, ..., ð to denote the total number of type-ð paths between ðð and ð ð . According
to Lemma 4, there exists an integer solution of [ð¥ðð ð] satisfying the following constraints:
(1) ð¥ð ð =âðð=1
ð¥ð
ð ð,âð, ð = 1, ..., ð ;
(2) 0 †âðð=1ð¥ð
ð ð†1,âð = 1, ..., ð , ð = 1, ..., ð ;
(3) 0 †âðâAð
âðð=1ð¥ð
ð ð†ð,âð = 1, ...,ð¶, ð = 1, ..., ð ;
(4) 0 †âðð=1ð¥ð
ð ð†1,âð = 1, ..., ð , ð = 1, ..., ð ;
(5) 0 †âðð=1
âð âAð
ð¥ð
ð ð†ð,âð = 1, ...,ð¶, ð = 1, ..., ð .
Note that ð¥ð
ð ðdetermines the underlying logical topology of the overlay topology Gð . We need to
show that this logical topology is compatible with the physical topology Geps-group.
Intra-group Logical Topology: If ðð and ð ð are in the same EPS node group, then the number of
links in between is
âðð¢=1
ð¥( ð mod ð )ðð¢
+ âðð£=1
ð¥(ð mod ð )ð£ ð
. Based on the constraints 2) and 4), it is easy to
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers42:13
check that the above number is no greater than 2, which is compatible with intra-group physical
topology design of Geps-group.
Inter-group Logical Topology: Fix ð = 1, 2, ..., ð . Let ð (ð1â1)ð+ð and ð (ð2â1)ð+ð be the ð-th EPS
node in the EPS node groups Að1and Að2
, respectively. Then the total number of links from
ð (ð1â1)ð+ð to ð (ð2â1)ð+ð is ðŠðð1ð2
=âðâAð
1
âð âAð
2
ð¥ð
ð ð. Based on the constraints 3) and 5), it is easy
to verify that
âð¶ð1=1
ðŠð€ð1ð2
†ð,âð¶ð2=1
ðŠð€ð1ð2
†ð . Then, using the same arguments in the proof of
Theorem 5, we can prove that the logical topology ðŠðð1ð2
, ð1, ð2 = 1, 2, ...,ð¶ is realizable on top of
Geps-group. This completes the proof. â¡
Remark: Theorem 7 implies that ðœ (Geps-group) = minð {`idealâð¿/3â,ðµ (ð )/`ideal
ð¿,ðµ(ð )}. Geps-group is similar
to the physical topology used by Flexfly [27]. In order to prove the performance guarantee in
Theorem 7, we have restricted Geps-group to use uniform mesh for its intra-group topology, which
limits the scale of Geps-group. One could use flattened butterfly [17] for intra-group interconnect
to increase the size of each EPS group of Geps-group, as suggested by Flexfly [27]. However, the
performance guarantee would become much poorer.
6.2 EPS-Mesh based Physical Topology DesignPhysical topology of Geps-mesh: Let ð = âð¿/3â. We arrange all the ESP nodes into a ð¶ Ãð mesh.
Theð EPS nodes in the ð-th row forms an EPS node group, denoted by Að , ð = 1, 2, ...,ð¶ . We index
all the EPS nodes row by row, from 1 to ð¶ð . There are 2ð OCS nodes dedicated for each EPS node
group Að . For every EPS node in Að , there is one bidirectional link between this EPS node and
each of the 2ð OCS nodes. There are also ð OCS nodes dedicated for the EPS nodes in each column.
For theð€-th column, there is one bidirectional link between every EPS node in this column and
every OCS node dedicated for this column. Since the number of ports of an OCS node is ð , the
number of EPS nodes in each row or each column must be no more than ð . Thus, the total number
of EPS nodes must satisfy ð = ð¶ð †ð 2.
Fig. 4. Toy example of EPS-Mesh topology (ð = 3, ð = 9).
Theorem 8. For any traffic pattern ð , the optimal throughput under Geps-mesh satisfies
` (Geps-mesh, ð ) ⥠`idealâð¿/3â,ðµ (ð ).
Proof. Let Gideal
âð¿/3â,ðµ be the ideal physical topology. According to Lemma 2, we need to show that
any logical topology Gð formed over Gideal
âð¿/3â,ðµ can be realized as a overlay topology over Geps-mesh.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:14 Shizhen Zhao, Peirui Cao, and Xinbing Wang
Since each EPS node in Gideal
âð¿/3â,ðµ has ð = âð¿/3â number of uplinks, the following constraints
must be satisfied { âðð=1ð¥ð ð (Gð ) †âð¿/3â = ð, â ð = 1, ..., ð ,âð
ð=1ð¥ð ð (Gð ) †âð¿/3â = ð, â ð = 1, ..., ð .
(12)
Similar to Geps-group, it may not always be possible to establish a logical link between two EPS
nodes in Geps-mesh as well, especially when the two EPS nodes are in different rows and different
columns. Hence, we will construct Gð as an overlay topology on top of Glarge. For every pair of
EPS nodes ðð and ð ð in Geps-mesh, we would like to create ð¥ð ð (Gð ) number of virtual logical links in
between, and classify these virtual logical links intoð types:
⢠If ðð and ð ð are in the same row, the type-ð€ virtual logical links are two-hop paths with
intermediate node being theð€-th EPS node in this row.
⢠If ðð and ð ð belong to different rows, the type-ð€ virtual logical links are three-hop paths with
intermediate nodes being theð€-th EPS node in the same row as ðð and theð€-th EPS node in
the same row as ð ð .
We useð¥ð€ð ð,ð€ = 1, 2, ...,ð to denote the total number of type-ð€ paths between ðð and ð ð . According to
Lemma 4, there exists an integer solution of ð¥ð€ð ð,ð€ = 1, 2, ...,ð satisfying the following constraints:
(1) ð¥ð ð (Gð ) =âðð€=1
ð¥ð€ð ð,âð, ð = 1, ..., ð ;
(2) 0 †âðâAð
âðð=1ð¥ð€ð ð
†ð,âð = 1, ...,ð¶,ð€ = 1, ...,ð ;
(3) 0 †âðð=1
âð âAð
ð¥ð€ð ð
†ð,âð = 1, ...,ð¶,ð€ = 1, ...,ð .
Note that ð¥ð€ð ðdetermines the underlying logical topology of the overlay topology Gð . We need to
show that this logical topology is compatible with the physical topology Geps-group.
Inter-group Logical Topology: Fix ð€ = 1, 2, ...,ð . Let ð (ð1â1)ð +ð€ and ð (ð2â1)ð +ð€ be the ð€-th
EPS node in the EPS node groups Að1and Að2
, respectively. Then the total number of links from
ð (ð1â1)ð +ð€ to ð (ð2â1)ð +ð€ is ðŠð€ð1ð2
=âðâAð
1
âð âAð
2
ð¥ð€ð ð. Based on the constraints 2) and 3), it is easy
to verify that
âð¶ð1=1
ðŠð€ð1ð2
†ð,âð¶ð2=1
ðŠð€ð1ð2
†ð . Then, using the same arguments in the proof of
Theorem 5, we can prove that the logical topology ðŠð€ð1ð2
, ð1, ð2 = 1, 2, ...,ð¶ is realizable on the ð OCS
nodes dedicated for theð€-th row.
Intra-group Logical Topology: Fix ð = 1, 2, ...,ð¶ . Let ð (ðâ1)ð +ð€1and ð (ðâ1)ð +ð€2
be the ð€1-th
and the ð€2-th EPS nodes in Að . Then the total number of links from ð (ðâ1)ð +ð€1to ð (ðâ1)ð +ð€2
is
ð§ðð€1,ð€2
=âðð¢=1
ð¥ð€2
(ðâ1)ð +ð€1,ð¢+âð
ð£=1ð¥ð€1
ð£,(ðâ1)ð +ð€2
. According to Eqn. (13) and the constraints 2) and 3),
we can verify that
âðð€1=1
ð§ðð€1ð€2
†2ð,âðð€2=1
ð§ðð€1ð€2
†2ð . Take the first one for example:
ðâð€1=1
ð§ðð€1ð€2
=
ðâð€1=1
ðâð¢=1
ð¥ð€2
(ðâ1)ð +ð€1,ð¢+
ðâð€1=1
ðâð£=1
ð¥ð€1
ð£,(ðâ1)ð +ð€2
=âðâAð
ðâð¢=1
ð¥ð€2
ðð¢+
ðâð£=1
ð¥ð£,(ðâ1)ð +ð€2(Gð ) †2ð .
Again, using the same arguments in the proof of Theorem 5, we can prove that the logical topology
ð§ðð€1ð€2
,ð€1,ð€2 = 1, 2, ...,ð is realizable on the 2ð OCS nodes dedicated for the EPS node group Að .
This completes the proof. â¡
Remark:Theorem 8 implies that ðœ (Geps-mesh) = minð {`idealâð¿/3â,ðµ (ð )/`ideal
ð¿,ðµ(ð )}. The design ofGeps-mesh
shares a similar idea as the physical topology used by 3D-Hyper-FleX-LION [18]. Although 3D-
Hyper-FleX-LION uses a 3D-mesh instead, it is easy to generalize Geps-mesh from 2D-mesh to
3D-mesh to further improve its scalability. Notably, one critical design of Geps-mesh is that the
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers42:15
EPS ports used for intra-group interconnect is âtwoâ times the EPS ports used for inter-group
interconnect. This design allows us to attain the best performance guarantee.
6.3 OCS-Mesh based Physical Topology DesignPhysical Topology of Gocs-mesh: We arrangeð¶2
OCS nodes into anð¶ Ãð¶ mesh, whereð¶ = âð¿/2â.The EPS nodes are arranged into ð¶ groups, each of which containsð EPS nodes and is denoted by
Að , ð = 1, 2, ...,ð¶ . For the EPS node group Að , we connect two directed links from the transmitters
of each EPS node in Að to the ingress ports of each OCS node in the ð-th row; we also connect
two directed links from the egress ports of each OCS node in the ð-th column to the receivers
of each EPS node in Að . Since each OCS node has ð ingress/egress ports, the number of EPS
nodes in each group must satisfyð †âð /2â. Hence, the total number of EPS nodes must satisfy
ð = ð¶ð †âð¿/2â âð /2â .
⊠âŠâŠ âŠ
Fig. 5. Toy example of OCS-Mesh topology (ð = 6, ð = 9). Some RX side links are omitted for clear visualization.Each EPS node in A2 is connected to two egress ports of OCS(1,2), OCS(2,2) and OCS(3,2). Each EPS node inA3 is connected to two egress ports of OCS(1,3), OCS(2,3) and OCS(3,3).
Theorem 9. For any traffic pattern ð , the optimal throughput under Gocs-mesh satisfies
` (Gocs-mesh, ð ) ⥠`idealâð¿/2â,ðµ (ð ).
Proof. Let Gideal
âð¿/2â,ðµ be the ideal physical topology. According to Lemma 2, we need to show that
any logical topology Gð formed over Gideal
âð¿/2â,ðµ can be realized as a overlay topology over Gocs-mesh.
Since each EPS node in Gideal
âð¿/2â,ðµ has ð¶ = âð¿/2â number of uplinks, the following constraints
must be satisfied { âðð=1ð¥ð ð (Gð ) †âð¿/2â = ð¶, â ð = 1, ..., ð ,âð
ð=1ð¥ð ð (Gð ) †âð¿/2â = ð¶, â ð = 1, ..., ð .
(13)
Again, we construct Gð as an overlay topology on top of Gocs-mesh. For every pair of EPS nodes
ðð and ð ð in Gocs-mesh, we would like to create ð¥ð ð (Gð ) number of virtual logical links. Each virtual
logical link is essentially a two-hop path ðð â ðð â ð ð , ð = 1, 2, ..., ð , and we use ð¥ðð ð to denote the
number of such virtual logical links.
Next, we would like to divide ð¥ð ð (Gð ) =âðð=1
ð¥ðð ð , such that the ð¥ðð ð âs are compatible with the
physical topology Gocs-mesh. Unlike Geps-group and Geps-mesh, we need to apply Lemma 4 multiple
times to obtain a valid decomposition of ð¥ð ð (Gð ).Step 1: Decompose ð¥ð ð (Gð ) =
âð¶ð=1
ð¥(ð)ð ð
such that the following constraints are met:
⢠0 †âðð=1ð¥(ð)ð ð
†1,âð = 1, ..., ð , ð = 1, ...,ð¶ ;
⢠0 †âðð=1ð¥(ð)ð ð
†1,âð = 1, ..., ð , ð = 1, ...,ð¶ .
Step 2: For every ð0 = 1, 2, ...,ð¶ , decompose ð¥(ð0)ð ð
=âðð€=1
ð¥(ð0â1)ð +ð€ð ð
such that for anyð€ = 1, ...,ð ,
the following constraints are met:
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:16 Shizhen Zhao, Peirui Cao, and Xinbing Wang
⢠0 †âðâAð
âðð=1ð¥(ð0â1)ð +ð€ð ð
†1,âð = 1, ...,ð¶ ;
⢠0 †âðð=1
âð âAð
ð¥(ð0â1)ð +ð€ð ð
†1,âð = 1, ...,ð¶ .
Here Að = {(ð â 1)ð + 1, (ð â 1)ð + 2, ..., ðð }.Based on Step 1&2, we obtain an integer decomposition ð¥ð ð (Gð ) =
âð¶ð=1
ð¥(ð)ð ð
=âðð=1
ð¥ðð ð satisfying
(1) 0 †âðð=1
âðâAð
ð¥ðð ð †1,âð = 1, ..., ð , ð = 1, ...,ð¶;
(2) 0 †âðð=1
âðâAð
ð¥ðð ð †1,âð = 1, ..., ð , ð = 1, ...,ð¶ ;
(3) 0 †âðâAð
âðð=1ð¥ðð ð †1,âð = 1, ...,ð¶, ð = 1, ..., ð ;
(4) 0 †âðð=1
âð âAð
ð¥ðð ð †1,âð = 1, ...,ð¶, ð = 1, ..., ð .
With ð¥ðð ð , the total number of logical links needed from ðð to ð ð can be computed as ðŠð ð =âðð=1
ð¥ðð ð
+ âðð=1
ð¥ð
ðð. Next, we will show that ðŠð ð is compatible with Gocs-mesh.
Recall that the OCS nodes in Gocs-mesh are arranged as a 2D mesh. Consider the OCS node in the
ð1-th row and ð2-th column. This OCS node connects the transmitters of the EPS node in group
Að1to the receivers of the EPS node in group Að2
. Then, if the following constraints are satisfiedâð âAð
2
ðŠð ð †2,âðâAð
1
ðŠð ð †2, (14)
the logical topology ðŠð ð , ð â Að1, ð â Að2
will be realizable on this OCS. (14) can be verified below:âð âAð
2
ðŠð ð =âð âAð
2
ðâð=1
ð¥ðð ð
+âð âAð
2
ðâð=1
ð¥ð
ðð†2;
âðâAð
1
ðŠð ð =âðâAð
1
ðâð=1
ð¥ðð ð
+âðâAð
1
ðâð=1
ð¥ð
ðð†2.
This completes the proof. â¡
Remark:Theorem 9 implies that ðœ (Gocs-mesh) = minð {`idealâð¿/2â,ðµ (ð )/`ideal
ð¿,ðµ(ð )}. The design ofGocs-mesh
shares a similar idea as the physical topology used by Sirius [2]. One critical difference between
Gocs-mesh and Siriusâ physical topology is that we connect âtwoâ links instead of one between each
connected pair of EPS node and OCS node. This subtle design allows us to derive the performance
guarantee in Theorem 9.
7 PERFORMANCE RATIO ANALYSISWe have performed theoretical analysis for four physical topologies Guniform, Geps-group, Geps-mesh
and Gocs-mesh, and related their throughput metrics with that of the ideal physical topology. Guniform
is optimal but scales poorly. Geps-group, Geps-mesh and Gocs-mesh have better scalability but are sub-
optimal. Accoring to Theorem 7, Theorem 8 and Theorem 9, It is easy to obtainðœ (Geps-group) = minð {ðœ (Geps-group, ð )}, where ðœ (Geps-group, ð ) = `idealâð¿/3â,ðµ (ð )/`
ideal
ð¿,ðµ(ð ),
ðœ (Geps-mesh) = minð {ðœ (Geps-mesh, ð )}, where ðœ (Geps-mesh, ð ) = `idealâð¿/3â,ðµ (ð )/`ideal
ð¿,ðµ(ð ),
ðœ (Gocs-mesh) = minð {ðœ (Gocs-mesh, ð )}, where ðœ (Gocs-mesh, ð ) = `idealâð¿/2â,ðµ (ð )/`ideal
ð¿,ðµ(ð ).
(15)
In this section, wewill calculate the approximated values of ðœ (Geps-group), ðœ (Geps-mesh) and ðœ (Gocs-mesh).We focus on two cases below:
(1) ToE can generate multiple logical topologies to serve each traffic matrix. This setting was
adopted by [2, 20â22].
(2) ToE generates only one logical topology (i.e.,ð = 1) for each traffic matrix. This setting was
adopted by [5, 10, 13].
For the first case, we obtain some theoretical results in Section 7.1. The second case is hard to
analyze, and we could only obtain some approximated numerical results in Section 7.2.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers42:17
7.1 Calculating ðœ (Geps-group), ðœ (Geps-mesh) and ðœ (Gocs-mesh) in Case 1Definition 3. A traffic pattern ð is ð-decomposable if there exists ð permutation traffic matrices
ð1, ð2, ..., ðð and ð1 + ð2 + · · · + ðð = 1, 0 †ðð †1, such that ð †âð
ð=1ðððð .
Definition 4. A traffic pattern ð is a normalized traffic pattern if the following condition is met
max
{ð
max
ð=1
{ðâð=1
ðð ð
},ð
max
ð=1
{ðâð=1
ðð ð
}}= 1.
Lemma 10. For any normalized traffic pattern ð , we must have `idealð¿,ðµ
(ð ) †ðµð¿.
Proof. Without loss of generality, we assume that the egress traffic from ð1 sums to 1, e.g.,âðð=1
ð1ð = 1. Consider the network control policy that achieves the throughput value `idealð¿,ðµ
(ð ).Then, `ideal
ð¿,ðµ(ð )âð
ð=1ð1ð = `
ideal
ð¿,ðµ(ð ) amount of traffic can be delivered from the EPS node ð1 to other
EPS nodes in a unit time period. On the other hand, the total egress bandwidth of ð1 is upper
bounded by ðµð¿, because ð1 has ð¿ ports. Thus, `idealð¿,ðµ
(ð ) †ðµð¿. â¡
Lemma 11. Given a ð-decomposable traffic pattern ð that lasts Î amount of time, if the ToE policyallows generatingð OCS configurations, then `ideal
ð¿,ðµ(ð ) ⥠ðµð¿(1 â ð
ðð¿) (1 â ð¿ð
Î ).
Proof. We split Î into ð time slots, such that Î(1) = Î(2) = · · · = Î(ð) = Î/ð â ð¿. In each
time slot, ð¿ permutations can be formed because each EPS node has ð¿ bidirectional links connected
the OCS. In total, we can formðð¿ permutations.
Let ð †âð
ð=1ðððð be the ð-decomposition of ð . For every permutation ðð , ð = 1, 2, ..., ð, we
configure OCSs such that ðð is formed â(ðð¿ â ð)ððâ times. It is easy to verify that
ðâð=1
â(ðð¿ â ð)ððâ â€ðâð=1
((ðð¿ â ð)ðð + 1) = ðð¿.
Hence, the above ToE strategy is feasible. Under this strategy, the bandwidth allocated to the ODC
satisfies
1
Î
ðâð=1
â(ðð¿ â ð)ððâðµ(Î/ð â ð¿)ðð â¥(ðð¿ â ð)ðµ(Î/ð â ð¿)
Î
ðâð=1
ðððð â¥(ðð¿ â ð)ðµ(Î/ð â ð¿)
Îð .
Therefore, under the above ToE strategy and the one-hop routing, we can achieve a throughput
value(ðð¿âð)ðµ (Î/ðâð¿)
Î . Based on the definition of ` (G, ð ), we must have
`idealð¿,ðµ (ð ) ⥠(ðð¿ â ð)ðµ(Î/ð â ð¿)Î
= ðµð¿(1 â ð
ðð¿) (1 â ð¿ð
Î).
â¡
Note that Lemma 11 holds for any ð . According to the definitions of ðœ (Geps-group), ðœ (Geps-mesh)and ðœ (Gocs-mesh) in (15), it is sufficient to consider normalized traffic patterns, because `ideal
ð¿,ðµ(ð )
increases linearly if we scales ð linearly. Further, based on the Birkhoff and von Neumann theo-
rem [6], any normalized traffic pattern ð is ð-decomposable with ð †ð 2 â 2ð + 2. Combining all
the above analysis with Lemma 10, we then obtain
Corollary 11.1. Let ðŸ (ð, ð) =(1 â ð 2â2ð+2
ðð
) (1 â ð¿ð
Î
), we have
âð¿/3âð¿
max
ð{ðŸ (ð, âð¿/3â)} †ðœ (Geps-group) â€
âð¿/3âð¿
(max
ð{ðŸ (ð, ð¿)}
)â1
,
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:18 Shizhen Zhao, Peirui Cao, and Xinbing Wang
âð¿/3âð¿
max
ð{ðŸ (ð, âð¿/3â)} †ðœ (Geps-mesh) â€
âð¿/3âð¿
(max
ð{ðŸ (ð, ð¿)}
)â1
,
âð¿/2âð¿
max
ð{ðŸ (ð, âð¿/2â)} †ðœ (Gocs-mesh) â€
âð¿/2âð¿
(max
ð{ðŸ (ð, ð¿)}
)â1
.
When the OCS reconfiguration delay ð¿ = 0, ðŸ (ð, ð) can be arbitrarily close to 1 as we increase
the number of OCS configurationsð . In this case, we have
Corollary 11.2. If ð¿ = 0, then ðœ (Geps-group) = âð¿/3âð¿
, ðœ (Geps-mesh) = âð¿/3âð¿
, ðœ (Gocs-mesh) = âð¿/2âð¿
.
7.2 Calculating ðœ (Geps-group), ðœ (Geps-mesh) and ðœ (Gocs-mesh) in Case 2 Whereð = 1
ð = 1 means that only one logical topology is generated for each traffic pattern ð (see the definition
of ToE in Section 3.1).Whenð = 1, the lower bound of `idealð¿,ðµ
(ð ) given by Lemma 11 can be extremely
loose, and thus may not be useful for calculating ðœ (Geps-group), ðœ (Geps-mesh) and ðœ (Gocs-mesh). Inthis case, we perform numerical analysis instead. Note that exhaustively enumerating all the
different traffic matrices is not feasible. We simply generate random traffic matrices to perform the
calculation. We use two approaches to generate random traffic matrices:
Approach 1: Randomly generate ð permutation traffic matrices ð (1) , ð (2) , ..., ð (ð)and ð pa-
rameters 0 †ðŸ1, ðŸ2, ..., ðŸð †1 satisfying
âð
ð=1ðŸð = 1. Let ð =
âð
ð=1ðŸð ð
(ð). It is easy to verify
that the traffic matrix ð generated with this approach is a normalized traffic pattern.
Approach 2: Generate a random non-negative number for each entry ðð ð , ð â ð , and then
normalize ð such that max{maxðð=1
{âðð=1
ðð ð },maxðð=1
{âðð=1
ðð ð }} = 1.
For every ð and every ð¿, we generate a number of normalized random traffic patterns using
the above two approaches, and calculate throughput values `idealâð¿/3â,ðµ (ð ), `ideal
âð¿/3â,ðµ (ð ), `ideal
âð¿/2â,ðµ (ð ),`idealð¿,ðµ
(ð ) for every traffic pattern ð . We use three approaches to calculate the throughput values:
Joint Optimization: Directly solve the optimization problem (5) using Gurobi [23].
Decoupled Optimization: First, compute a logical topology¯G (1)ð
that best approximates the
traffic pattern ð by solving the following optimization problem:
max
ðâð=1
ðâð=1
(ð¿ðð ð â âð¿ðð ð â
)ð¥ð ð ( ¯G (1)
ð)
s.t.âð¿ðð ð â †ð¥ð ð ( ¯G (1)ð
) †âð¿ðð ð â, for all ð, ð = 1, 2, ..., ð ,
ðâð=1
ð¥ðð ( ¯G (1)ð
) †ð¿,
ðâð=1
ð¥ð ð ( ¯G (1)ð
) †ð¿, for all ð, ð = 1, 2, ..., ð .
(16)
Then, solve the optimization problem (5) with the logical topology G (1)ð
being fixed as¯G (1)ð
.
Fast Approximation: First, compute a logical topology¯G (1)ð
by solving (16). Second, find all
the shortest paths for every EPS node pair in¯G (1)ð
. Third, calculate the throughput value
based on Equal-Cost Multi-Path (ECMP) routing.
The Joint Optimization approach solves topology and routing jointly, and gives the exact through-
put values. However, this approach has the highest computational complexity. The Decoupled
Optimization approach reduces algorithmic complexity by solving topology and routing separately.
The Fast Approximation approach further reduces computational complexity by using a fixed
ECMP routing. Notably, the second and the third approaches can only compute approximated
throughput and performance ratio values.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers42:19
7.2.1 Approximating ðœ (Geps-group), ðœ (Geps-mesh) and ðœ (Gocs-mesh). We generate a list of (ð, ð¿) pairs:(7, 10), (8, 10), (9, 10), (15, 10), (20, 10), (30, 16), (40, 16), (128, 32), (256, 32), (512, 64) and (1024, 64).For each (ð, ð¿) pair, we generate a set of 60 normalized traffic patterns, and denote this set by Fð,ð¿ .For every traffic pattern ð â Fð,ð¿ , we use the above three approaches to compute the per-traffic-
pattern performance ratio ðœ (G, ð ), G â {Geps-group,Geps-mesh,Gocs-mesh} as defined in (15).
(7,1
0)(8
,10)
(9,1
0)(1
5,10
)(2
0,10
)(3
0,16
)(4
0,16
)(1
28,3
2)(2
56,3
2)(5
12,6
4)(1
024,
64)
0.2
0.3
0.4
0.5
Per
-Tra
ffic-
Pat
ternβ
Joint Decoupled Fast
(a) EPS-Group
(7,1
0)(8
,10)
(9,1
0)(1
5,10
)(2
0,10
)(3
0,16
)(4
0,16
)(1
28,3
2)(2
56,3
2)(5
12,6
4)(1
024,
64)
0.150.200.250.300.350.40
Per
-Tra
ffic-
Pat
ternβ
Joint Decoupled Fast
(b) EPS-Mesh
(7,1
0)(8
,10)
(9,1
0)(1
5,10
)(2
0,10
)(3
0,16
)(4
0,16
)(1
28,3
2)(2
56,3
2)(5
12,6
4)(1
024,
64)
0.20.30.40.50.60.7
Per
-Tra
ffic-
Pat
ternβ
Joint Decoupled Fast
(c) OCS-Mesh
(7,1
0)(8
,10)
(9,1
0)(1
5,10
)(2
0,10
)(3
0,16
)(4
0,16
)(1
30,3
2)(2
56,3
2)(5
12,6
4)(1
024,
64)
10â310â210â1
100101102103104
Ru
nn
ing
Tim
e(s
) X Unfinished X Unfinished
Joint Decoupled Fast
(d) Running Time Comparison
Fig. 6. Calculating ðœ (G, ð ) using Joint Optimization, Decoupled Optimization and Fast Approximation.
Since the Joint Optimization approach can compute the exact per-traffic-pattern performance
ratio values, we consider ðœjoint (G) = minð âFð,ð¿ðœjoint (G, ð ) as the best approximation to ðœ (G) for
every G â {Geps-group,Geps-mesh,Gocs-mesh}. However, calculating ðœjoint (G, ð ) is computationally
expensive. The Joint Optimization approach can only compute ðœjoint (G) for the (ð, ð¿) pairs (7, 10),(8, 10) and (9, 10). When ð > 9, the Joint Optimization approach may not be able to compute a
solution even after running a few hours. Hence, in the following, we study how to approximate
ðœjoint (G) using the other two approaches.
We first use the Decoupled Optimization approach to approximate ðœjoint (G). For every ð â Fð,ð¿ ,we compute ðœdecoupled (G, ð ) using the Decoupled Optimization approach. As shown in Fig. 6,
the values of ðœdecoupled (G, ð ) exhibit much higher variance than that of ðœjoint (G, ð ). If we use
minð âFð,ð¿ðœdecoupled (G, ð ) to approximate ðœdecoupled (G), the gap between the estimated value and
the true value would be too large. Hence, we adopt the following heuristic. Let ðŒdecoupled (G, ð)be the ð-th percentile value among all the values of ðœdecoupled (G, ð ). We aim to find ð such that
ðŒdecoupled (G, ð) â ðœjoint (G). From Fig. 6, we can see that ð (Geps-group) â 25th, ð (Geps-mesh) â 35th
and ð (Gocs-mesh) â 25th when (ð, ð¿) = (7, 10), (8, 10) or (9, 10). Then, we use ðŒdecoupled (G, ð (G)) toapproximate ðœjoint (G) for larger data centers. However, even this Decoupled Optimization approach
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:20 Shizhen Zhao, Peirui Cao, and Xinbing Wang
Table 2. The values of ð â²(G) for different (ð, ð¿)âs. When ð †9, we pick ð â²(G) such that ðŒfast (G, ð â²(G)) âðœjoint (G). When 9 < ð †40, we pick ð â²(G) such that ðŒfast (G, ð â²(G)) â ðŒdecoupled (G, ð (G)).
(7,10) (8,10) (9,10) (15,10) (20,10) (30,16) (40,16)
ð â²(Geps-group) 36th 26th 23th 35th 33th 58th 56th
ð â²(Geps-mesh) 28th 31th 20th 36th 36th 56th 61th
ð â²(Gocs-mesh) 48th 31th 26th 46th 45th 65th 63th
Table 3. Approximating ðœ (Geps-group), ðœ (Geps-mesh) and ðœ (Gocs-mesh) using the Fast Approximation approach.
(7,10) (8,10) (9,10) (15,10) (20,10) (30,16)
ðœ (Geps-group) 0.34 0.34 0.33 [0.28,0.31] [0.27,0.3] [0.25,0.28]
ðœ (Geps-mesh) 0.24 0.23 0.22 [0.17,0.21] [0.16,0.19] [0.18,0.22]
ðœ (Gocs-mesh) 0.45 0.45 0.44 [0.39,0.43] [0.36,0.42] [0.36,0.42]
(40,16) (128,32) (256,32) (512,64) (1024,64)
ðœ (Geps-group) [0.24,0.26] [0.22,0.27] [0.23,0.27] [0.24,0.28] [0.25,0.29]
ðœ (Geps-mesh) [0.18,0.2] [0.2,0.25] [0.19,0.24] [0.22,0.26] [0.21,0.28]
ðœ (Gocs-mesh) [0.35,0.4] [0.37,0.45] [0.39,0.44] [0.38,0.43] [0.37,0.45]
cannot scale beyond 40 EPS nodes. When ð > 40, the Decoupled Optimization approach may not
produce any solution after running a few hours.
We then use the Fast Approximation approach to approximate ðœjoint (G). As shown in Fig. 6,
the values of ðœfast (G, ð ) exhibit even higher variance, and the gap between minð âFð,ð¿ðœfast (G, ð )
and ðœdecoupled (G) would be even larger. To reduce the estimation gap, we let ðŒfast (G, ð â²) be theð â²-th percentile value among all the values of ðœfast (G, ð ). We aim to find ð â² such that ðŒfast (G, ð â²) âðœjoint (G) when ð †9 and ðŒfast (G, ð â²) â ðŒdecoupled (G, ð (G)) when 9 < ð †40. The values of
ð â²(G) vary for different physical topologies and different (ð, ð¿) pairs. The detailed values are
summarized in Table 2. For every G â {Geps-group,Geps-mesh,Gocs-mesh}, we pick the minimum and
the maximum values of ð â²(G), and denote them by ð â²min
(G) and ð â²max
(G). Then, we can generate a
range [ðŒfast (G, ð â²min
(G)), ðŒfast (G, ð â²max(G))] to approximate ðœjoint (G) â ðœ (G). We summarize the
approximation results for all ð > 9 in Table 3.
Note: The idea of using percentile values to estimate ðœ (G)âs is just a heuristic. There is no guar-antee that the true values of ðœ (G) are indeed within the ranges provided by Table 3. Nevertheless,
we believe that our numerical estimation gives readers a sense on the magnitude of ðœ (G).
8 CONCLUSIONWe study physical topology design for ODCs, and offer a novel methodology to analyze the
performance guarantee for different physical topologies in this paper. Based on our methodology,
we prove that the uniform bipartite graph based physical topology is optimal for small-scale
data centers with ð †ð , while optimal physical topologies do not exist when ð > ð . We also
design three physical topologies that support larger-scale ODCs, and prove their performance gaps
with respect to the ideal physical topology. The methodology in this paper provides a theoretical
foundation for ODC physical topology design, and may help network operators design physical
topologies with a better guarantee.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers42:21
There are a number of interesting directions for future work. First, we do not assume any prior
knowledge on the traffic patterns while we define the performancemetric ðœ (G). In practice, differentToR switches may belong to different production groups, and the intra-group traffic can be higher
than the inter-group traffic. It would be interesting to study how such traffic knowledge affects
physical topology design. Second, in addition to throughput, deployment/management/expansion
complexities may also affect the physical topology design for ODC. It remains open to find a
physical topology with the best tradeoff among different performance metrics for ODC.
9 ACKNOWLEDGEMENTThis work was supported by the NSF China under Grant 61902246. We also thank our shepherd
Rachee Singh, and the SIGMETRICS reviewers.
REFERENCES[1] Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A scalable, commodity data center network
architecture. In SIGCOMM.
[2] Hitesh Ballani, Paolo Costa, Raphael Behrendt, Daniel Cletheroe, Istvan Haller, Krzysztof Jozwik, Fotini Karinou,
Sophie Lange, Kai Shi, Benn Thomsen, and Hugh Williams. 2020. Sirius: A Flat Datacenter Network with Nanosecond
Optical Switching. In SIGCOMM.
[3] Inc. CALIENT Technologies. [n.d.]. https://www.calient.net/.
[4] Inc. CALIENT Technologies. [n.d.]. S Series Optical Circuit Switch. https://www.calient.net/products/s-series-photonic-
switch/.
[5] Peirui Cao, Shizhen Zhao, Min Yee Teh, Yunzhuo Liu, and Xinbing Wang. 2021. TROD: Evolving From Electrical Data
Center to Optical Data Center. In ICNP.[6] Cheng-shang Chang, Wen-Jyh Chen, and Hsiang-Yi Huang. 2000. Birkhoff-von Neumann Input-Buffered Crossbar
Switches. In INFOCOM.
[7] Kai Chen, Ankit Singla, Atul Singh, Kishore Ramachandran, Lei Xu, Yueping Zhang, Xitao Wen, and Yan Chen. 2012.
OSA: An optical switching architecture for data center networks with unprecedented flexibility. In NSDI.[8] Li Chen, Kai Chen, Zhonghua Zhu, Minlan Yu, George Porter, Chunming Qiao, and Shan Zhong. 2017. Enabling
wide-spread communications on optical fabric with megaswitch. In NSDI.[9] Nandita Dukkipati and Nick McKeown. 2006. Why Flow-Completion Time is the Right Metric for Congestion Control.
In SIGCOMM Review.[10] Nathan Farrington, George Porter, Sivasankar Radhakrishnan, Hamid Hajabdolali Bazzaz, Vikram Subramanya,
Yeshaiahu Fainman, George Papen, and Amin Vahdat. 2010. Helios: a hybrid electrical/optical switch architecture for
modular data centers. In SIGCOMM.
[11] Monia Ghobadi, Ratul Mahajan, Amar Phanishayee, Nikhil Devanur, Janardhan Kulkarni, Gireeja Ranade, Pierre Alexan-
dre Blanche, Houman Rastegarfar, Madeleine Glick, and Daniel Kilper. 2016. ProjecToR: Agile Reconfigurable Data
Center Interconnect. In SIGCOMM.
[12] Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, and Sudipta Sengupta. 2009. VL2: A Scalable
and Flexible Data Center Network. In SIGCOMM.
[13] Navid Hamedazimi, Zafar Qazi, Himanshu Gupta, Vyas Sekar, Samir R. Das, Jon P. Longtin, Himanshu Shah, and
Ashish Tanwer. 2014. FireFly: A Reconfigurable Wireless Data Center Fabric Using Free-Space Optics. In SIGCOMM.
[14] Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer.
2013. Achieving High Utilization with Software-Driven WAN. In SIGCOMM.
[15] Co. Huawei Technologies. [n.d.]. https://e.huawei.com/en/products/enterprise-networking/switches/data-center-
switches/ce8800.
[16] Robert W. Irving and Mark R. Jerrum. 1992. Three-dimensional Statistical Data Security Problems. SIAM J. Comput. 23(October 1992), 170â184.
[17] John Kim, William J. Dally, and Dennis Abts. 2007. Flattened Butterfly : A Cost-Efficient Topology for High-Radix
Networks. In ISCA.[18] Gengchen Liu, Roberto Proietti, Marjan Fariborz, Pouya Fotouhi, Xian Xiao, and S.J. Ben Yoo. 2020. Architecture and
Performance Studies of 3D-Hyper-FleX-LION for Reconfigurable All-to-All HPC Networks. In SC.[19] He Liu, Feng Lu, Alex Forencich, Rishi Kapoor, Malveeka Tewari, Geoffrey M Voelker, George Papen, Alex C Snoeren,
and George Porter. 2014. Circuit Switching Under the Radar with REACToR. In NSDI.[20] He Liu, Matthew K. Mukerjee, Conglong Li, Nicolas Feltman, George Papen, Stefan Savage, Srinivasan Seshan,
Geoffrey M. Voelker, David G. Andersen, Michael Kaminsky, George Porter, and Alex C. Snoeren. 2015. Scheduling
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:22 Shizhen Zhao, Peirui Cao, and Xinbing Wang
Techniques for Hybrid Circuit/Packet Networks. In CoNEXT.[21] William M Mellette, Rajdeep Das, Yibo Guo, Rob McGuinness, Alex C Snoeren, and George Porter. 2020. Expanding
across time to deliver bandwidth efficiency and low latency. In NSDI.[22] William M. Mellette, Rob Mcguinness, Arjun Roy, Alex Forencich, and George Porter. 2017. RotorNet: A Scalable,
Low-complexity, Optical Datacenter Network. In SIGCOMM.
[23] Gurobi Optimization. [n.d.]. https://www.gurobi.com/.
[24] George Porter, Richard Strong, Nathan Farrington, Alex Forencich, Pang Chen-Sun, Tajana Rosing, Yeshaiahu Fainman,
George Papen, and Amin Vahdat. 2013. Integrating microsecond circuit switching into the data center. In SIGCOMM.
[25] Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai,
Bob Felderman, Paulie Germano, et al. 2015. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in
Googleâs Datacenter Network. In SIGCOMM.
[26] Guohui Wang, David G Andersen, Michael Kaminsky, Konstantina Papagiannaki, TS Ng, Michael Kozuch, and Michael
Ryan. 2010. c-Through: Part-time optics in data centers. In SIGCOMM.
[27] Ke Wen, Payman Samadi, Sebastien Rumley, Christine P. Chen, Yiwen Shen, Meisam Bahadori, Keren Bergman, and
Jeremiah Wilke. 2016. Flexfly: Enabling a Reconfigurable Dragonfly through Silicon Photonics. In SC.[28] Mingyang Zhang, Radhika Niranjan Mysore, Sucha Supittayapornpong, and Ramesh Govindan. 2019. Understanding
lifecycle management complexity of datacenter topologies. In NSDI.[29] Shizhen Zhao, Rui Wang, Junlan Zhou, Joon Ong, Jeffery C. Mogul, and Amin Vahdat. 2019. Minimal Rewiring: Efficient
Live Expansion for Clos Data Center Networks: Extended Version. In NSDI https://ai.google/ research/pubs/pub47492.
A NP-COMPLETENESS OF CALCULATING ` (G, ð )We prove the following theorem in this section.
Theorem 12. The problem of calculating ` (G, ð ) for a given physical topology G and a giventraffic pattern ð is NP-Complete.
We prove Theorem 12 by reducing the following three-dimensional sum problem to the problem
of calculating ` (G, ð ). The three-dimensional sum problem was proven to be NP-Complete [16].
Definition 5. (Three-dimensional Sum Problem) Given a positive integer ð and three non-negativeinteger matrices ð· (ð, ð), ðž (ð, ð) and ð¹ (ð, ð), where ð, ð, ð = 1, 2, ..., ð and
ðâð=1
ðâð=1
ð· (ð, ð) =ðâð=1
ðâð=1
ðž (ð, ð) =ðâð=1
ðâð=1
ð¹ (ð, ð),
does there exist a non-negative integer assignment of ðŽ(ð, ð, ð) such thatâðð=1ðŽ(ð, ð, ð) = ð· (ð, ð),âð
ð=1ðŽ(ð, ð, ð) = ðž (ð, ð),âð
ð=1ðŽ(ð, ð, ð) = ð¹ (ð, ð).
(17)
We are now ready to prove Theorem 12.
Proof. Given an instance of the three-dimensional sum problem, we construct an ODC physical
topology G and a trafficmatrix ð as follows. G has 2ð EPS nodes andð OCS nodes. The 2ð EPS nodes
are separated into two groups, each with ð EPS nodes. There are ð· (ð, ð) number of bidirectional
links between the ð-th EPS node in Group 1 and the ð-th OCS node; there are ðž (ð, ð) number of
bidirectional links between the ð-th OCS node and the ð-th EPS node in Group 2. Each link is
assumed to have its link capacity ðµ = 1. The traffic demand from the ð-th EPS node in Group 1 to the
ð-th EPS node in Group 2 is ð¹ (ð, ð). There is no traffic demand within each EPS group or from Group
2 to Group 1. In total, this network has ð2flows, and the traffic matrix ð is an 2ð Ã 2ð matrix with
at most ð2non-zero entries (see Fig.7). For topology engineering, we only allow generating one
logical topology for each traffic matrix, i.e.,ð = 1. Next, we will prove that the three-dimensional
sum problem is satisfiable if and only if ` (G, ð ) = 1.
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
Understanding the Performance Guarantee of Physical Topology Design for Optical Circuit Switched Data Centers42:23
âŠ
âŠâŠâŠ ⊠âŠ
Traffic demand
Fig. 7. Network topology and traffic model.
Sufficiency (â): Let ðŽ(ð, ð, ð) be a solution of the three-dimensional sum problem. Then, we
can set up the OCS configurations as follows: for the ð-th OCS, establish ðŽ(ð, ð, ð) number of
bidirectional links between the ð-th EPS node in Group 1 and the ð-th EPS node in Group 2. Under
this set of OCS configurations, we establish
âðð=1
ðŽ(ð, ð, ð) = ð¹ (ð, ð) number of logical links between
the ð-th EPS node in Group 1 and the ð-th EPS node in Group 2. Then, if we route every flow
along its shortest path, it is easy to check that the achievable throughput ` (G, ð ) = 1. Note that
` (G, ð ) = 1 is indeed the optimal throughput, because the bisection bandwidth from Group 1 to
Group 2
âðð=1
âðð=1
ð· (ð, ð) = âðð=1
âðð=1
âðð=1
ðŽ(ð, ð, ð) divided by the total traffic demand from from
Group 1 to Group 2
âðð=1
âðð=1ð¹ (ð, ð) = âð
ð=1
âðð=1
âðð=1
ðŽ(ð, ð, ð) is exactly 1.
Necessity (â): Let Gð be the logical topology such that the optimal throughput ` (G, ð ) = 1 is
achieved, and let Î = {Î 1,Î 2, ...,Î ð} be the corresponding configurations of the ð OCS nodes.
Since the throughput is 1 under the logical topology Gð , the bisection bandwidth from Group 1 to
Group 2 in Gð must be at least
âðð=1
âðð=1ð¹ (ð, ð). Note that the egress capacity of Group 1 and the
ingress capacity of Group 2 are both at most
ðâð=1
ðâð=1
ð· (ð, ð) =ðâð=1
ðâð=1
ðž (ð, ð) =ðâð=1
ðâð=1
ð¹ (ð, ð).
Hence, in order to deliver
âðð=1
âðð=1ð¹ (ð, ð) amount of traffic, the following statements must be true
for Gð :(1) Every egress EPS port of Group 1 must be connected with an ingress EPS port of Group 2
throughput an OCS node;
(2) Every flow must be delivered directly from its source to its destination.
Now, let ðŽ(ð, ð, ð) be the number of links from the ð-th EPS node in Group 1 to the ð-th EPS node
in Group 2 formed in the ð-th OCS node. Note that there are ð· (ð, ð) number of bidirectional
links between the ð-th EPS node in Group 1 and the ð-th OCS node; there are ðž (ð, ð) number of
bidirectional links between the ð-th OCS node and the ð-th EPS node in Group 2. Combined with
the first statement, we must have
âðð=1ðŽ(ð, ð, ð) = ð· (ð, ð) and âð
ð=1ðŽ(ð, ð, ð) = ðž (ð, ð). According
to the second statement and the fact that the bisection bandwidth from Group 1 to Group 2 is at
most
âðð=1
âðð=1ð¹ (ð, ð), the total number of links from the ð-th EPS node in Group 1 to the ð-th EPS
node in Group 2 must be exactly ð¹ (ð, ð). Hence, âðð=1
ðŽ(ð, ð, ð) = ð¹ (ð, ð).The above analysis indicates that all the three-dimensional sum problem can be reduced to a
ODC throughput optimization problem. Since the three-dimensional sum problem is NP-Complete,
the ODC throughput optimization problem must also be NP-Complete. â¡
B PROOF OF LEMMA 2Proof. Let (Gâ(ð)
ð, ð âð ð (ð¢, ð£)) be the optimal topology+routing solution that attains the optimal
throughput ` (G, ð ). Next we consider ` (Gâ², ð ).
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.
42:24 Shizhen Zhao, Peirui Cao, and Xinbing Wang
Note that the logical topologies Gâ(ð)ð
are realizable as overlay topologies of Gâ². Let G
â²â(ð)ð
be
the corresponding underlay logical topology of Gâ(ð)ð
. Consider the following overlay-topology
based routing allocation in Gâ². For every flow from ðð to ð ð , we allocate ð
âð ð (ð¢, ð£) amount of traffic
to the overlay paths from ðð¢ to ðð£ . It is easy to verify that the resulting throughput is ` (G, ð ). Notethat restricting routing allocation to overlay paths reduces routing flexibility. In contrast, our TE
formulation (3)(4) computes network throughput based on the underlay topologies Gâ²â(ð)ð
directly,
and thus the resulting throughput must be no smaller than ` (G, ð ). Combined with the definition
of ` (Gâ², ð ), we must have ` (Gâ², ð ) ⥠` (G, ð ). â¡
C NOTATIONSThe notations in this paper are summarized in Table 4.
Table 4. Notations used in this paper
ð , S ð electrical-packet-switching (EPS) nodes S = {ð1, ð2, ..., ðð }.ðŸ , O ðŸ optical-circuit-switching (OCS) nodes O = {ð1,ð2, ...,ððŸ }.G Physical topology G = (V, E), where V = S ⪠O. (see §3.1).
Gð Logical topology Gð = Gð (G,Î ) = (Vð , Eð ) (see §3.1).ð¥ð ð ð¥ð ð (Gð ) denotes the number of links from ðð to ð ð in Gð .ðœ (G) Performance ratio of the physical topology G (see §3.3).
ð Traffic matrix among the ð EPS nodes (see §3.1).
` (G, ð ) Optimal throughput for the given physical topology G and traffic matrix ð (see §3.2).
ð¿ Each EPS node has ð¿ ports.
ð Each OCS node has ð ingress ports and ð egress ports.
ðµ EPS port capacity.
Î OCS configuration Î = {Î 1,Î 2, ...,Î ðŸ } (see §3.1).ð The number of logical topologies generated by ToE for each traffic matrix (see §3.2).
ðð ð (ð¢, ð£) Amount of traffic from ðð to ð ð that is allocated to the link (ðð¢, ðð£) (see §3.2).ð¶ , Að ð¶ mutually disjoint EPS node groups satisfying âªð¶ð=1
Að = S.
Received August 2021; revised October 2021; accepted November 2021
Proc. ACM Meas. Anal. Comput. Syst., Vol. 5, No. 3, Article 42. Publication date: December 2021.