.
Setting the Mood
• "It's time to get rid of TCP/UDP protocols in DCs"
• DCs/Clouds are closed worlds, brand new technologies are OK
• with bulk transfers (BigData, ...), the business value of a TCP/UDP alternative is high
• circuits are an alternative to packets
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 2/32...
2/32
.
Ethernet is the Best
.Ethernet.....
.
... is the cheapest and most available technology with e2esupport
• Fiber Channel (FC), SATA, etc. require expensive hardware, lowcompatibility, no e2e support
• FCoE = Ethernet, same problems, expensive hardware, no e2e support
• network virtualization is best fit for Ethernet
• disclaimer: one of proposed models will work with optical networks aswell
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 3/32...
3/32
.
Ethernet is the Worst
.Ethernet.....
.... is the worst technology in terms of throughput• CSMA/CD is the biggest throughput limitation
◦ not in modern switches, but still major problem in wireless
• contention problem cannot be easily resolved
• same applies to OBS/OPS optical technologies
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 4/32...
4/32
.
Ethernet Contention
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 5/32...
5/32
.
Ethernet and Contention
• whaterver you do, Ethernet L2 domains cannot avoid contention
Switch Switch
Qualitatively Identical
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 6/32...
6/32
.
Parallel vs Sequential (2 flows)
20 24 28 32 36 40Transfer time in contention (s)
20
24
28
32
36
40Tr
ansf
er ti
me
by e
xclu
sive
circ
uits
(s)
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 7/32...
7/32
.
Ethernet Switches : Basic Facts
• cut-through versus store-and-forward• cut-through is 10..15x better
• Cisco has advanced cut-through : +bytes versus routing decision tradeoff
• store-and-forward is subjected to QoS classes◦ L3 DSCP versus L2 CoS, AF, EF, BE, SBE models
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 8/32...
8/32
.
Switchess : Modeling
C: Cut Through
Check, etc. Q: Queue
D: Drop QoS classes
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 9/32...
9/32
.
Proposal
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 10/32...
10/32
.
Proposal : Circuits
.Circuits..
.
... are emulations which allow for exclusive access to L2 domain byindividual parties
• circuits-over-packets emulation
• cut-through mode for each circuit is guaranteed
• highest possible throughput
• NOTE: will work with cheepest switches
• NOTE2: applies to optical networks as well (L2=lightpaths)
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 11/32...
11/32
.
Implementation : 2 cases• left: book-then-send, right: separate control layer
SWITCH
NOC
Storage Node A
Storage Node B
Step 1: Book
session
Step 2: Transfer bulk
SWITCH
Storage Node A
Storage Node B
SWITCH
Bookingsegment
BulkSegment
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 12/32...
12/32
.
Impl.: Centralized Case
SWITCH
NOC
Storage Node A
Storage Node B
Step 1: Book
session
Step 2: Transfer bulk
• same network for booking andcircuits
• inefficient but still valid/practical
• legacy-compatible,partial implementation, etc.
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 13/32...
13/32
.
Impl.: Distributed Case
SWITCH
Storage Node A
Storage Node B
SWITCH
Bookingsegment
BulkSegment
• book on one network, send on another
• legacy-incompatible• contention-sensing possible →fully distributed models
• can also use sensing andcontention control
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 14/32...
14/32
.
Optimization
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 15/32...
15/32
.
Optimization : Basics
• same for distributed and centralized models◦ does not matter, optimization shows the overall utility of a heuristic
• practical optimization = formulation + heuristic• given: demand matrix
• expected result: a routing table mapping demand to topology
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 16/32...
16/32
.
Optimization : Basics
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 17/32...
17/32
.
Optim. : OSPF → tuple notation
• OSPF is traditional in such optimizations, but too rigid for many practical cases◦ too complex for lightpaths in optical networks◦ no good heuristics for complex topologies
• OSPF notation is not very convinient1. capacity constraints2. flow preservation3. contention/congestion metrics
• alternative: tuples ... for example ⟨s, d, v, t⟩ defines demand of traffic
volume v at time t from source s to destionation d◦ this notation ismuch more flexible for several coming formulations
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 18/32...
18/32
.
Optim. : Basic Tuple Notation
• nodes: source s, destination: d and others a, b, c• individual demand tupleTi = ⟨s, d, v, t⟩• lightpathλ for optical networks
• time t, can be start time, start and end of a period, etc.
• we do not care about utility so far, just the notation, but utility is obvious inmost cases
• → means results in... or leads to...
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 19/32...
19/32
.
tOSPF : Traditional OSPF
Ti = ⟨s, d, v, t⟩ → ⟨s, a, b, ..., d⟩.Externals..
.
Using demand matrix, creates a set of per-linkweights, which define a unique route for eachdemand item.
.Internals..
.
Per-link capacity constraint, in/out flowconservation constraint, unstable for largetopologies and demand matrices
• s source
• d destination
• a, b, c, ... intermediatenodes on e2e paths/routes
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 20/32...
20/32
.
oOSPF : Optical OSPF w/out Switching
Ti = ⟨s, d, v, t⟩ → ⟨s, λ⟩.Externals..
.
Using demand matrix, maps each demand item onisolated lightpath
.Internals..
.
Simple but inefficient because the number ofe2e lightpaths is small
• s source
• d destination
• λ a wavelength for a fixed e2elightpath from s to destination
d
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 21/32...
21/32
.
oOSPFs : Optical OSPF with Switching
Ti = ⟨s, d, v, t⟩ → ⟨s, λs, λa, λb, ...⟩.Externals..
.
Using demand matrix, maps each demand item on aroute of wavelengths
.Internals..
.
Efficient, but suffers from the same problemsas traditional OSPF
• s source
• d destination
• λx an exit wavelength at agiven node x
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 22/32...
22/32
.
Proposal : Sensing Formulation
Ti = ⟨s, d, v, t1, t2⟩ → ⟨s, λ, t⟩.Externals..
.
Using a matrix of loosely scheduled demand, createa schedule of sequential sessions withexlusive access to paths
.Internals..
.
Same approach for Ethernet (one wavelength) andoptical networks
• s source
• d destination
• t1 and t2 areuser-preferred range forthe start of a session, a valuet is picked between them
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 23/32...
23/32
.
Heuristics
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 24/32...
24/32
.
Centralized Case
SWITCH
NOC
Storage Node A
Storage Node B
Step 1: Book
session
Step 2: Transfer bulk
• all optimization formulations exceptsensing
• very close to traditional OSPF• same problems as in OSPF
• the biggest problem is to knowdemand matrix in advance
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 25/32...
25/32
.
Distributed Case
SWITCH
Storage Node A
Storage Node B
SWITCH
Bookingsegment
BulkSegment
• can be used for all formulations
• pefectly suited for the Sensingformulation
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 26/32...
26/32
.
The Sensing Model• contention methods in wireless and OBS will work
◦ in practice: sensing can beSNMP-like feedback on gate's status◦ no sync among users is necessary
• same model for Ethernet (+virtual nets) and optical networks
• main advantage: the offload, no need to implement funny OSPFheuristics
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 27/32...
27/32
.
Realistic Gate/Sensing Model
• an approximate view of JGNtopology
• two way = one way + ring• Gates are created at optical/ethernet border
• NOTE: already working for Ethernet
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 28/32...
28/32
.
Wrapup
• circuit emulation is necessary for effective bulk transfers◦ up to 40% faster in our lab tests
• intra-DC, DC-DC, federations, etc. -- all can benefit from circuits
• circuits formulated as OSPF are bad -- a Gate/Sensing model is better• validity: worst case is the existing technology, but upper performancebound is very high
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 29/32...
29/32
.
That’s all, thank you ...
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 30/32...
30/32
.
[01] myself (2014)High Availability Cloud Storage...NS研
[02] Cisco (2014)LAN Switching and Wireless, CCNA Exploration Companion GuideCisco Press
[03] Cisco (2014)Cut-Through and Store-and-Forward Ethernet Switching for Low-Latency....Cisco Press
[04] NetOptics (2014)Cut-Through Ethernet Switching: A Versatile Resource for Low Latency...White Paper
[05] Cisco (2006)QoS: DSCP Classification GuidelinesRFC4594
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 30/32...
30/32
.
[06] Cisco (2010)A Differentiated Services Code Point (DSCP)...RFC5865
[07] open source (current)PICA8 Project for Low Latency Virtual Networkinghttp://www.pica8.com/
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 31/32...
31/32
.
Wait-n-Send Model
Bulk size per transmission
Goodput
2 potential distributions in practice
Response curve(s)
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 31/32...
31/32
.
Utility of Waiting (curve)
• I called it Wait-n-SeeCurve
• source waits for some time forexclusive access --sensing and accumulating bulk
• on timeout, the current bulkis released at best effort(fallback)
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 32/32...
32/32