is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · fs known in...
TRANSCRIPT
![Page 1: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/1.jpg)
Is advance knowledge of flow sizes a plausible assumption?
Vojislav Dukic, Sangeetha Abdu Jyothi, Bojan Karlas,
Muhsen Owaida, Ce Zhang, Ankit Singla
![Page 2: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/2.jpg)
Flow size information can increase efficiency
2
Performance
FS known in advance
FS not known in advance
![Page 3: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/3.jpg)
Flow size information can increase efficiency
3
…fine-grained circuit scheduling
…packet scheduling within switches
…deadline-aware prioritization
…adaptive routing
…congestion control schemes
![Page 4: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/4.jpg)
100 KB
100 MB
![Page 5: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/5.jpg)
!5
Typically, first-in first-out
Packet scheduling at a switch queue
100 MB
100 KB
![Page 6: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/6.jpg)
!5
Typically, first-in first-out
Packet scheduling at a switch queue
100 MB
100 KB
![Page 7: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/7.jpg)
!6
Packet scheduling at a switch queue
But could apply shortest job first idea!
100 MB
100 KB
![Page 8: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/8.jpg)
!7
Packet scheduling at a switch queue
Least remaining bytes first[*pFabric: Minimal Near-Optimal Datacenter Transport; SIGCOMM ’13]
100 MB
100 KB
![Page 9: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/9.jpg)
!7
Packet scheduling at a switch queue
Least remaining bytes first[*pFabric: Minimal Near-Optimal Datacenter Transport; SIGCOMM ’13]
100 MB
100 KB
![Page 10: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/10.jpg)
!8
Packet scheduling at a switch queue
Simple, but problematic: do we know the shortest flow?
100 MB
100 KB
![Page 11: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/11.jpg)
Assumption:Flow size is know in advance
![Page 12: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/12.jpg)
“In many datacenter applications flow sizes or deadlines are known at initiation time and can be conveyed to the network stack (e.g., through a socket API)...”
pFabric – SIGCOMM ’13Improves FCT by 4x
“When an application calls send() or sendto() on a socket, the operating
system sends this demand in a request message to the Fastpass arbiter,
specifying the destination and the number of bytes.”
FastPass – SIGCOMM ‘14 Improves FCT by 15x
“The sender must specify the size of a message when presenting its first byte to the transport... Knowledge of message sizes is particularly valuable because it allows transports to prioritize shorter messages.”
Homa – SIGCOMM ’18Reduces tail latency by 100x
Time
![Page 13: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/13.jpg)
Assumption: Flow size is know in advance
![Page 14: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/14.jpg)
Assumption: Flow size is know in advance
![Page 15: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/15.jpg)
“In this paper, we question the validity of this assumption, and point out that, for many applications, such information is difficult to obtain, and may even be unavailable.”
PIAS – NSDI ‘15
“A number of applications are unable to provide size/deadline information at the start of their flows, e.g. database access
and HTTP chunked transfer.”
Karuna – SIGCOMM ‘16
“We ignore flow size and duration as they cannot be acquired until a flow finishes.”
CODA – SIGCOMM ’16
Time
![Page 16: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/16.jpg)
Example alternative: Flow aging
Packets sent = packets remaining
012340123 5
100 MB
100 KB
![Page 17: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/17.jpg)
Example alternative: Flow aging
Packets sent = packets remaining
45
100 MB
100 KB
![Page 18: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/18.jpg)
Possible alternative: flow aging
Works well for long tail flow size distributions
![Page 19: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/19.jpg)
Possible alternative: flow aging
Works well for long tail flow size distributions
Works only when relative flow size is needed
Doesn’t work for equally sized flows
![Page 20: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/20.jpg)
Is advance knowledge of flow sizes a plausible assumption?
![Page 21: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/21.jpg)
100% knowledge is likely intractable* * or at least too expensive
Is advance knowledge of flow sizes a plausible assumption?
![Page 22: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/22.jpg)
100% knowledge is likely intractable* * or at least too expensive
… but partial knowledge is plausible and useful
Is advance knowledge of flow sizes a plausible assumption?
![Page 23: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/23.jpg)
17
Sources of flow size information?
How useful is imprecise / partial knowledge?
Incentives for operators and users?
![Page 24: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/24.jpg)
18
Sources of flow size information?
How useful is imprecise / partial knowledge?
Incentives for operators and users?
![Page 25: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/25.jpg)
Flow size estimation: design space
19
Many apps know flow sizes
Doable in private DCs
Exact sizes given by application
TCP buffer occupancy
Monitoring system calls
Learning from past traces
![Page 26: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/26.jpg)
Flow size estimation: design space
19
Many apps know flow sizes
Doable in private DCs
Exact sizes given by application
TCP buffer occupancy
Monitoring system calls
Learning from past tracesSome apps don’t
Change a lot of applications
Change network API
![Page 27: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/27.jpg)
0KB
50KB
100KB
150KB
200KB
250KB
1 1.002 1.004 1.006 1.008 1.01
Buffe
r occ
upan
cy
Time (s)
1G network
Flow size estimation: design space
20
Exact sizes given by application
TCP buffer occupancy
Monitoring system calls
Learning from past traces
Buffer occupancy
![Page 28: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/28.jpg)
0KB
50KB
100KB
150KB
200KB
250KB
1 1.002 1.004 1.006 1.008 1.01
Buffe
r occ
upan
cy
Time (s)
1G network
Flow size estimation: design space
21
Exact sizes given by application
TCP buffer occupancy
Monitoring system calls
Learning from past traces
Buffer occupancy
![Page 29: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/29.jpg)
0KB
50KB
100KB
150KB
200KB
250KB
1 1.002 1.004 1.006 1.008 1.01
Buffe
r occ
upan
cy
Time (s)
1G network
Flow size estimation: design space
22
Exact sizes given by application
TCP buffer occupancy
Monitoring system calls
Learning from past traces
Buffer occupancy
![Page 30: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/30.jpg)
0KB
50KB
100KB
150KB
200KB
250KB
1 1.002 1.004 1.006 1.008 1.01
Buffe
r occ
upan
cy
Time (s)
1G network
Flow size estimation: design space
23
Exact sizes given by application
TCP buffer occupancy
Monitoring system calls
Learning from past traces
Buffer occupancy
![Page 31: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/31.jpg)
0KB
50KB
100KB
150KB
200KB
250KB
1 1.002 1.004 1.006 1.008 1.01
Buffe
r occ
upan
cy
Time (s)
1G network10G network
Flow size estimation: design space
24
Exact sizes given by application
TCP buffer occupancy
Monitoring system calls
Learning from past traces
Buffer occupancy
![Page 32: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/32.jpg)
Flow size estimation: design space
25
while(data in buffer):y = write(socket_desc,buffer+x,100KB
)
Exact sizes given by application
TCP buffer occupancy
Monitoring system calls
Learning from past traces
![Page 33: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/33.jpg)
Flow size estimation: design space
26
flow_size = f(disk I/O,memory I/O,past network traffic,computation
)
Exact sizes given by application
TCP buffer occupancy
Monitoring system calls
Learning from past traces
![Page 34: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/34.jpg)
Learning flow size
27
* Values are in R2
1 – perfect prediction0 – mean value prediction
WorkloadWeb server
TensorFlow
PageRank
KMeans
SGD
Performance0.960.970.830.880.79
+
TraceDisk I/O
Memory I/O
CPU cycles
Past traffic
+
ModelGBDT
FFNN
LSTM =
![Page 35: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/35.jpg)
28
Sources of flow size information?
How useful is imprecise / partial knowledge?
Incentives for operators and users?
![Page 36: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/36.jpg)
Flow scheduling using imprecise knowledge
[TensorFlow workload]
0 0.05
0.1 0.15
0.2 0.25
0.3 0.35
0.4 0.45
pFabric pHost FastPass
Mea
n FC
T (m
s)Perfect
Fifo
Mean FCT (ms)
1
[1. pHost: Distributed Near-Optimal Datacenter Transport Over Commodity Network Fabric; CoNEXT ’15]
2
[2. Fastpass: A Centralized “Zero-Queue” Datacenter Network; SIGCOMM ’14]
![Page 37: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/37.jpg)
Flow scheduling using imprecise knowledge
0 0.05
0.1 0.15
0.2 0.25
0.3 0.35
0.4 0.45
pFabric pHost FastPass
Mea
n FC
T (m
s)Perfect
Fifo
Mean FCT (ms)
[TensorFlow workload]
1
[1. pHost: Distributed Near-Optimal Datacenter Transport Over Commodity Network Fabric; CoNEXT ’15]
2
[2. Fastpass: A Centralized “Zero-Queue” Datacenter Network; SIGCOMM ’14]
Least remaining bytes first
![Page 38: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/38.jpg)
Flow scheduling using imprecise knowledge
0 0.05
0.1 0.15
0.2 0.25
0.3 0.35
0.4 0.45
pFabric pHost FastPass
Mea
n FC
T (m
s)Perfect
Fifo
Mean FCT (ms)
[TensorFlow workload]
1
[1. pHost: Distributed Near-Optimal Datacenter Transport Over Commodity Network Fabric; CoNEXT ’15]
2
[2. Fastpass: A Centralized “Zero-Queue” Datacenter Network; SIGCOMM ’14]
![Page 39: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/39.jpg)
Flow scheduling using imprecise knowledge
0 0.05
0.1 0.15
0.2 0.25
0.3 0.35
0.4 0.45
pFabric pHost FastPass
Mea
n FC
T (m
s)Perfect
FifoPrediction
Mean FCT (ms)
[TensorFlow workload]
1
[1. pHost: Distributed Near-Optimal Datacenter Transport Over Commodity Network Fabric; CoNEXT ’15]
2
[2. Fastpass: A Centralized “Zero-Queue” Datacenter Network; SIGCOMM ’14]
![Page 40: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/40.jpg)
Flow scheduling using imprecise knowledge
0 0.05
0.1 0.15
0.2 0.25
0.3 0.35
0.4 0.45
pFabric pHost FastPass
Mea
n FC
T (m
s)Perfect
FifoPrediction
Mean FCT (ms)
[TensorFlow workload]
1
[1. pHost: Distributed Near-Optimal Datacenter Transport Over Commodity Network Fabric; CoNEXT ’15]
2
[2. Fastpass: A Centralized “Zero-Queue” Datacenter Network; SIGCOMM ’14]
![Page 41: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/41.jpg)
0 0.05
0.1 0.15
0.2 0.25
0.3 0.35
0.4 0.45
pFabric pHost FastPass
Mea
n FC
T (m
s)Perfect
FifoPrediction
Aging
Flow scheduling using imprecise knowledge
Mean FCT (ms)
[TensorFlow workload]
1
[1. pHost: Distributed Near-Optimal Datacenter Transport Over Commodity Network Fabric; CoNEXT ’15]
2
[2. Fastpass: A Centralized “Zero-Queue” Datacenter Network; SIGCOMM ’14]
![Page 42: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/42.jpg)
1 1.2 1.4 1.6 1.8
2 2.2 2.4 2.6
PageRankKMeans SGD
TensorFlowWeb Server
Rela
tive
perfo
rman
ce
degr
adat
ion
Coflow scheduling using imprecise knowledge (Sincronia*)
35
Slowdown
[*Sincronia: Near-Optimal Network Design for Coflows; SIGCOMM ’18]
![Page 43: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/43.jpg)
1 1.2 1.4 1.6 1.8
2 2.2 2.4 2.6
PageRankKMeans SGD
TensorFlowWeb Server
Rela
tive
perfo
rman
ce
degr
adat
ion
Coflow scheduling using imprecise knowledge (Sincronia*)
36
Slowdown
[*Sincronia: Near-Optimal Network Design for Coflows; SIGCOMM ’18]
Workload R2
Web server 0.96
TensorFlow 0.97
PageRank 0.83
KMeans 0.88
SGD 0.79
![Page 44: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/44.jpg)
1 1.2 1.4 1.6 1.8
2 2.2 2.4 2.6
PageRankKMeans SGD
TensorFlowWeb Server
Rela
tive
perfo
rman
ce
degr
adat
ion
Coflow scheduling using imprecise knowledge (Sincronia*)
37
Slowdown
[*Sincronia: Near-Optimal Network Design for Coflows; SIGCOMM ’18]
Workload R2
Web server 0.96
TensorFlow 0.97
PageRank 0.83
KMeans 0.88
SGD 0.79
![Page 45: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/45.jpg)
1 1.2 1.4 1.6 1.8
2 2.2 2.4 2.6
PageRankKMeans SGD
TensorFlowWeb Server
Rela
tive
perfo
rman
ce
degr
adat
ion
Coflow scheduling using imprecise knowledge (Sincronia*)
38
Slowdown
[*Sincronia: Near-Optimal Network Design for Coflows; SIGCOMM ’18]
Workload R2
Web server 0.96
TensorFlow 0.97
PageRank 0.83
KMeans 0.88
SGD 0.79
![Page 46: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/46.jpg)
0
0.2
0.4
0.6
0.8
1
1 10 100
CDF
Latency (µs)
CPU
Fast enough, deployable learning?
39
CDF
![Page 47: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/47.jpg)
0
0.2
0.4
0.6
0.8
1
1 10 100
CDF
Latency (µs)
CPU
Fast enough, deployable learning?
40
CDF
![Page 48: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/48.jpg)
0
0.2
0.4
0.6
0.8
1
1 10 100
CDF
Latency (µs)
CPUFPGA
Fast enough, deployable learning?
41
CDF
![Page 49: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/49.jpg)
Flow size estimation: design space
42
Works only for repetitive workloads
Must identify the application Substantial implementation effort.
Exact sizes given by application
TCP buffer occupancy
Monitoring system calls
Learning from past traces
![Page 50: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/50.jpg)
Learning from past traces
Every technique provides some knowledge
43
Estimation effort
Flow sizeknowledge
Exact sizes given by application
TCP buffer occupancy
Monitoring system calls
![Page 51: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/51.jpg)
Learning from past traces
Every technique provides some knowledge
43
Estimation effort
Flow sizeknowledge
Exact sizes given by application
TCP buffer occupancy
Monitoring system calls
![Page 52: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/52.jpg)
44
Sources of flow size information?
How useful is imprecise / partial knowledge?
Incentives for operators and users?
![Page 53: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/53.jpg)
What we want: More knowledge ⇒ better performance
Knowledge = Effort
Performance
0% 100%
![Page 54: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/54.jpg)
Time
![Page 55: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/55.jpg)
Unknown
Time
![Page 56: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/56.jpg)
Unknown Known
TimeTime
![Page 57: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/57.jpg)
Unknown Known
Q1: Can this flow’s performance deteriorate?
TimeTime
![Page 58: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/58.jpg)
Unknown Known
Q2: Can the whole system’s performance deteriorate?
TimeTime
![Page 59: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/59.jpg)
Q1: Can this flow’s performance deteriorate?
Unknown Known
Q2: Can the whole system’s performance deteriorate?
![Page 60: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/60.jpg)
Q1: Can this flow’s performance deteriorate?
Unknown Known
Old scheduling problem, but novel, interesting questions raised by our unique angle!
Q2: Can the whole system’s performance deteriorate?
![Page 61: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/61.jpg)
Q1: Can this flow’s performance deteriorate?
Unknown Known
![Page 62: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/62.jpg)
Q1: Can this flow’s performance deteriorate?
Unknown Known
Least remaining bytes first
![Page 63: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/63.jpg)
Q1: Can this flow’s performance deteriorate?
Unknown Known
Least remaining bytes first
+ 1
Flow aging0101223
![Page 64: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/64.jpg)
Q1: Can this flow’s performance deteriorate?
Unknown Known
A flow’s performance cannot deteriorate!
Least remaining bytes first
+ 1
Flow aging0101223
![Page 65: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/65.jpg)
For coflow scheduling with mean flow size used for unknowns,performance can in fact deteriorate
Unknown Known
Q1: Can this flow’s performance deteriorate?
![Page 66: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/66.jpg)
Q2: Can the whole system’s performance deteriorate?
Unknown Known
![Page 67: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/67.jpg)
Q2: Can the whole system’s performance deteriorate?
Average co/flow completion time system-wide can deteriorate!
Unknown Known
![Page 68: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/68.jpg)
0 0.5
1 1.5
2 2.5
3 3.5
0% 20% 40% 60% 80% 100%
Slow
down
Percentage of known flows
Slowdown
Perfect knowledge performance
Unknown Known
Q2: Can the whole system’s performance deteriorate?
![Page 69: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/69.jpg)
0 0.5
1 1.5
2 2.5
3 3.5
0% 20% 40% 60% 80% 100%
Slow
down
Percentage of known flows
Slowdown
Open question: positive results with some restrictions?
Perfect knowledge performance
Unknown Known
Q2: Can the whole system’s performance deteriorate?
![Page 70: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/70.jpg)
55
Sources of flow size information? Partial knowledge available from many sources
How useful is imprecise / partial knowledge?Can still provide performance improvements
Incentives for operators and users? Need better guarantees on value of investment
Is advance knowledge of flow sizes a plausible assumption?
![Page 72: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/72.jpg)
Performance improvement
0
0.2
0.4
0.6
0.8
1
0 1KB 10KB 100KB 1MB 10MB All
Perfo
rman
ce
Size of known flows
SGDPageRank
TensorFlow
![Page 73: Is advance knowledge of flow sizes a plausible assumption? · 2019. 12. 18. · FS known in advance FS not known ... Web server 0.96 TensorFlow 0.97 PageRank 0.83 KMeans 0.88 SGD](https://reader033.vdocuments.site/reader033/viewer/2022052100/603a0fa9a1f40e0cbf232055/html5/thumbnails/73.jpg)
Features
58
Feature DescriptionStart time, tf Start time of f relative to job start timeFlow gap Time since the end of the previous flowFirst call Size of the first system call tfNetwork in Data received until tfNetwork out Data sent until tfNetwork in(d) Data received from flow’s dest. d until tfNetwork out(d) Data sent by this host to d until tfCPU CPU cycles used until tfDisk I/O Total disk I/O until tfMemory I/O Total memory I/O until tfPrevious flows Flow sizes for last k flows