a comparison of performance and accuracy of measurement...
TRANSCRIPT
![Page 1: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/1.jpg)
A Comparison of Performance and Accuracy of Measurement Algorithms in Software
Omid Alipourfard, Masoud Moshref1, Yang Zhou2, Tong Yang2, Minlan Yu3
Yale University, Barefoot Networks1, Peking University2, Harvard University3
![Page 2: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/2.jpg)
Network function virtualization is trending
Data centers Edge networks
Use cloud to manage cloud
Firewall Load balancer
Mini-clouds at the edge
CDN NAT
2
WAN opt.
![Page 3: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/3.jpg)
Network function virtualization is trending
Data centers Edge networks
Use cloud to manage cloud
Firewall Load balancer
Mini-clouds at the edge
CDN NAT
3
WAN opt.
Virtualization, dynamic scale-out, fast iterations ...
![Page 4: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/4.jpg)
Network function virtualization is trending
Firewall Load balancer CDN WAN opt. NAT
4
Measurement
Control loop
![Page 5: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/5.jpg)
Measurement Task Tree/Heap Sketch Hash table
Heavy hitter ANCS ’11, ICDT’ 05 NSDI’ 13, SIGCOMM’ 17 SIGCOMM’ 02
Super spreader SIGCOMM’ 17, PODS’ 05 IMC’ 10, NDSS’ 05
Flow size distrib. SIGMETRICS’ 04 IMC’ 10
Change detection CoNEXT’ 13 TON’ 07 IMC’ 10
Entropy estimation COLT’ 11 SIGMETRICS’ 06
Quantiles SIGMOD’ 01, 99, 13 Hot ICE’ 11
5
Measurement algorithms come with many implementations
![Page 6: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/6.jpg)
Measurement Task Tree/Heap Sketch Hash table
Heavy hitter ANCS ’11, ICDT’ 05 NSDI’ 13, SIGCOMM’ 17 SIGCOMM’ 02
Super spreader SIGCOMM’ 17, PODS’ 05 IMC’ 10, NDSS’ 05
Flow size distrib. SIGMETRICS’ 04 IMC’ 10
Change detection CoNEXT’ 13 TON’ 07 IMC’ 10
Entropy estimation COLT’ 11 SIGMETRICS’ 06
Quantiles SIGMOD’ 01, 99, 13 Hot ICE’ 11
6
Measurement algorithms come with many implementations
Which algorithm works best for NFs running on software ...
![Page 7: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/7.jpg)
Design concerns for software switches
7
Domain Hardware switches Software switches
Constraint Limited memory size
Objective Fit in memory
Opportunity Deterministic throughput
![Page 8: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/8.jpg)
Design concerns for software switches
8
Domain Hardware switches Software switches
Constraint Limited memory size Limited cache size
Objective Fit in memory Maximize throughput
Opportunity Deterministic throughput Large memory (hierarchical)
![Page 9: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/9.jpg)
Hash table based Count sketch Heap based
Update the entry (e) in the hash table.
Report if e > threshold.
Hash the header n times and update relevant entries (es).
Report if min(es) > threshold.
Keep a heap of counters.
Replace the smallest counter if no space available.
Report if entries > threshold.
Closer look at heavy hitter detection
Find the most popular items (flows) in a packet stream.
9
![Page 10: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/10.jpg)
Hash table based Count sketch Heap based
Update the entry (e) in the hash table.
Report if e > threshold.
Hash the header n times and update relevant entries (e).
Report if e > threshold.
Keep a heap of counters.
Replace the smallest counter if no space available.
Report if entries > threshold.
Closer look at heavy hitter detection
Find the most popular items (flows) in a packet stream.
10
What hash table works best?
![Page 11: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/11.jpg)
Cuckoo vs. linear hash tableTwo popular hash tables: Cuckoo hash table and Linear hash table.
11
H (pkt) H1 (pkt)
x
x
H2 (pkt)
Linear hash table Cuckoo hash table
![Page 12: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/12.jpg)
Evaluation settingsSettings
● DPDK Framework● Intel Xeon-E5 2650 v3, 10G NIC● CAIDA (1.4 mil flows, 40 mil pkts, 64B pkts)● Zero packet loss test - RFC 2544● Reporting interval 100ms ~ control loop frequency
Metrics● Performance: average packet processing time● We also measure precision/recall in the paper
12
![Page 13: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/13.jpg)
Linear hashing outperforms Cuckoo hashing
● Performance: Linear table is 10~30% faster than Cuckoo table.
Why?
● Computation: Two hashes (Cuckoo) vs one hash (Linear).
● Random access: Two for Cuckoo vs. one for Linear.
Different from the database world - Memory is not an issue!
● Make the table large so collisions are rare!
13
![Page 14: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/14.jpg)
Cuckoo vs. linear hash tableTwo popular hash tables: Cuckoo hash table and Linear hash table.
● Performance: Linear table is 10~30% faster than Cuckoo table.
Why?
● Computation: Two hashes (Cuckoo) vs one hash (Linear).
● Random access: Two for Cuckoo vs. one for Linear.
● Memory is not an issue! Make the table large so collisions are rare.
14
Takeaways
- Use the least # of computations and random memory accesses.- If you can, use large memory to reduce your computations.
![Page 15: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/15.jpg)
Hash table based Count sketch Heap based
Update the entry (e) in the hash table.
Report if e > threshold.
Hash the header n times and update relevant entries (es).
Report if min(es) > threshold.
Keep a heap of counters.
Replace the smallest counter if no space avail.
Report if entries > threshold.
Comparison of algorithm classes
15
![Page 16: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/16.jpg)
Hash table based Count sketch Heap based
Linear hash table Count sketch with one hash(Count-array) Heap + Linear hash table
Comparison of algorithm classes
16
![Page 17: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/17.jpg)
Results
● Count array is the fastest.
● Hash table performance converges to count-array with larger tables.
● Heap based algorithms are slow because of random memory access.
Hash table based Count sketch Heap based
Linear hash table Count sketch with one hash(Count-array) Heap + Linear hash table
Simplest data structure works best
17
![Page 18: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/18.jpg)
● Other measurement tasks
● Other traffic skews
● Amount of data kept per packet/flow
● Shared vs. separate data structure
How general are the results?
18
![Page 19: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/19.jpg)
Results hold for other measurement tasks
Change detection
Computationally heavy
19
Superspreader detection
Memory heavy
Model flow’s traffic
Report flows outside model’s predictions
Update a bloom filter per packet
Does CPU behave differently dealing with other measurement task types?
![Page 20: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/20.jpg)
Superspreaders: Count-array is the fastest
20MB
![Page 21: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/21.jpg)
Superspreaders: Count-array is the fastest
21
96% Precision
MB
![Page 22: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/22.jpg)
Superspreaders: Count-array is the fastest
22
96% Precision
The trend is similar for change detection:Fastest Count-array with Linear hash table a close second.
MB
![Page 23: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/23.jpg)
Impact of traffic skew on latency
23
Concerns
- Working set gets larger with lower skew.- More items read in cache per packet batch.
![Page 24: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/24.jpg)
Impact of traffic skew on latencyConcerns
- Working set gets larger with lower skew.- More items read in cache per packet batch.
Observations
- Perf. degradation depends on the # of memory accesses per pkt.- Count-array and linear hash table still the fastest.
24
![Page 25: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/25.jpg)
Impact of bytes kept per flow on latencyConcerns
- Less number of items fit in the cache.- Traverse multiple cache lines on a miss.
25
![Page 26: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/26.jpg)
Impact of bytes kept per flow on latencyConcerns
- Less number of items fit in the cache.- Traverse multiple cache lines on a miss.
Observations
- 1.9x higher latency - 4 bytes (70ns~) to 60 bytes (130ns~)- Solution: Separate keys and values in the hash table.
- 1.16x higher latency - 4 byte (90ns~) to 60 byte (105ns~)26
![Page 27: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/27.jpg)
Shared: Easy to report measured results.
- More cache bouncing between cores.
Separate: Merging to report is difficult.
- No cache bouncing between cores.
Impact of shared/separate data-structure
27
MemCore1
Core2
Core3
Data
Core1
Core2
Core3
![Page 28: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/28.jpg)
Observations
Sharing is expensive.
- Cache bouncing causes L3 latency for most memory accesses.- Does not scale to many cores.
Merging is cheap.
- Very low memory bandwidth (even at 10ms reporting intervals).
Impact of shared/separate data-structure
28
![Page 29: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/29.jpg)
ConclusionsMeasurement in software servers is different than hardware:
- Use more memory to do less computation.- Reduce data pulled into the cache per packet.
Calls for new:
- Algorithms, e.g., “sketch” over computation not memory.- Data structures, e.g., seq. access pattern to match the CPU arch.
29
![Page 30: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/30.jpg)
Thanks!The code and benchmarks are available at:
https://github.com/SiGe/measure-pkt
30
![Page 31: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/31.jpg)
![Page 32: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/32.jpg)
Results
● Count array is the fastest.
● Hash table performance converges to count-array with larger tables.
● Heap based algorithms are slow because of random memory access.
Hash table based Count sketch Heap based
Linear hash table Count sketch with one hash Heap + Linear hash table
Simplest data structure works best
32
Least amount of computation wins.
![Page 33: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/33.jpg)
Change-detection: Count-array is the fastest
33
![Page 34: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/34.jpg)
Change-detection: Count-array is the fastest
34
60% Precision
~100% Precision
![Page 35: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/35.jpg)
Change-detection: Count-array is the fastest
35
Large # of heapify ops.
Deep heaps
![Page 36: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/36.jpg)
Impact of traffic skew on latency
36
![Page 37: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/37.jpg)
Impact of traffic skew on latency
37
Prefetching can mask the memory access latency.
![Page 38: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/38.jpg)
Impact of traffic skew on latency
38
More uniform packet count makes it more likely that heapify traverses multiple levels.
![Page 39: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/39.jpg)
Bytes fetched impacts the performance
39
![Page 40: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/40.jpg)
Mask the latency by keeping the values away
40
![Page 41: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/41.jpg)
- Cache exhaustion: working set not fitting in memory.
- Memory BW exhaustion: higher latency to fetch data.
Impact of other apps. on measurement
41
![Page 42: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/42.jpg)
Impact of other apps. on measurement
42
![Page 43: A Comparison of Performance and Accuracy of Measurement …minlanyu.seas.harvard.edu/talk/sosr18.pdf · 2018-04-27 · A Comparison of Performance and Accuracy of Measurement Algorithms](https://reader036.vdocuments.site/reader036/viewer/2022070706/5e9df892e30e7a02066f7be1/html5/thumbnails/43.jpg)
Impact of other apps. on measurement
43