observability for service slicing - eucnc...monitoring for service provider devops." in...
TRANSCRIPT
FINDING THE TRIGGERS FOR
DYNAMIC SERVICE SLICING Rebecca Steinert, PhD
Head of the Network Intelligence group
Decisions, Networks and Analytics Lab
RISE SICS AB
ADVANCING THE SOTA OF RAN
CONTROL FRAMEWORKS
o So far: Main focus on sophisticated mechanisms.
o Next step: Robust and proactive
scaling of slices based on triggers.
o Triggers are:
o Scalable.
o Universal – applicable in
HetNets.
o Capable of quantifying
uncertainty.
INFORMATION-DRIVEN TRIGGERS –
THE ENABLER OF DYNAMIC SLICING.
o Proper triggers require sophisticated observation
points/models that are:
o Computationally lightweight.
o Provide information-effective representations of the
network state and its variability.
o Decentralized / distributed for serving short-term /
long-term control loops.
o Offloading observability processes to the networks reduces the monitoring overhead significantly *.
* John, Wolfgang, Catalin Meirosu, Bertrand Pechenot, Pontus Sköldström, Per Kreuger, and Rebecca Steinert. "Scalable software defined
monitoring for service provider devops." In Software Defined Networks (EWSDN), 2015 Fourth European Workshop on, pp. 61-66. IEEE, 2015.
Local data sources
(counters, estimates)
Composite models (aggregates,
joint probabilistic models)
Logically centralized control
Programmable infrastructure w. local observability points/triggers
Management framework w. triggers
Aggregated high-level composite models
Co
ntro
l po
licie
s
Estim
ate
s
DISTRIBUTED OBSERVABILITY. Kreuger, Per, and Rebecca Steinert. "Scalable in-network rate monitoring." In Integrated Network
Management (IM), 2015 IFIP/IEEE International Symposium on, pp. 866-869. IEEE, 2015
DB Raw meas.
DB Est. / aggr.
DB
EXAMPLE – THE HIDDEN ISSUE OF
5 MIN SNMP AVERAGES.
Measurements
over a 1 Gb link.
All good…
Kreuger, Per, and Rebecca Steinert. "Scalable in-network rate monitoring." In Integrated Network Management (IM), 2015 IFIP/IEEE
International Symposium on, pp. 866-869. IEEE, 2015.
SURPRISE - PERSISTENT RISK OF
LINK OVERLOAD.
Fig. 5: Time series of 5 m averages over one day from 15-24 on theheavily loaded 1 G link (top), and estimates of the risk of exceedinglink capacity over consecutive periods of 5 m, and 0.3 s respectively.
TABLE II: Detection rates for naive congestion detector.
threshold 5 m 0.3 st = 0.01 true positive false positive true negative hit miss
t /15 72 23 9 1184 17
t /10 69 11 24 1180 21
t /5 49 6 49 1088 113
correctly predicted absence of congestion at 0.3 s during thefollowing 5 m period. The “hit” and “miss” columns indicatehow many of the “ true” 0.3 scongestions thedetector captured.
We can see that even this relatively naive mechanism hasa fairly impressive hit rate. For example, a threshold of onetenth of the “ true” 0.3 s threshold at the 5 m level, the detectorcaptures 1180 out of 1201 “ true” 0.3 s congestions, or 98.3%.The main problem with this simple mechanism is the amountof high rate monitoring required to find the true positives. Still,this simple assessment clearly indicates the potential for usingthe estimates derived at lower time resolutions as predictors
for higher rate events.IV. CONCLUSION AND PERSPECTIVE
We have proposed a generic local and scalable approachto traffic rate monitoring based on high rate updates of twocounters in the data plane for recording the first and secondstatistical moments of each observed rate. The moments areused to estimate the parameters (using a MoM estimator) of alognormal distribution at predefined and/or variable intervals.Different aspects of the method have been evaluated usingreal-world data sets. We have verified that the data can befitted using a lognormal distribution, compared the estimationaccuracy relative to observations at different time scales andtested a naive probabilistic method for detecting increasedrisk of congestion on a link using probabilistic thresholds onproperties of the estimated distributions.
Analysis of the percentiles of estimates obtained at lowrates shows clear potential for methods for autonomouslyand robustly detecting high risk of congestion. Future workincludes development of a more robust detector based onadaptive probabilistic thresholds, and extension of the studyto less aggregated flows, and shorter time scales.
ACKNOWLEDGEMENT
This work was supported in part by the FP7 UNIFY EUproject. Theauthors would like to thank thestaff at TeliaSonerafor providing access to the traffic rate measurements: PerTholin, Johanna Nieminen and Patrik Lindwall.
REFERENCES
[1] W. John, K. Pentikousis, G. Agapiou, E. Jacob, M. Kind, A. Manzalini,F. Risso, D. Staessens, R. Steinert, and C. Meirosu, “Research direc-tions in network service chaining,” in Future Networks and Services(SDN4FNS), 2013 IEEE SDN for. IEEE, 2013, pp. 1–7.
[2] A. Csaszar, W. John, M. Kind, C. Meirosu, G. Pongracz, D. Staessens,A. Takacs, and F.-J. Westphal, “Unifying Cloud and Carrier Network:EU FP7 Project UNIFY,” in Utility and Cloud Computing (UCC), 2013IEEE/ACM 6th International Conference on. IEEE, 2013, pp. 452–457.
[3] R. Presuhn, “Management information base (MIB) for the simplenetwork management protocol (SNMP),” 2002, RFC 3418, InternetEngineering Task Force.
[4] S. Waldbusser, R. Cole, C. Kalbfleisch, and D. Romascanu, “ Intro-duction to the Remote Monitoring (RMON) Family of MIB Modules;RFC-3577,” Internet RFC 3577, August, Tech. Rep., 2003.
[5] Cisco IOS, “NetFlow,” 2008.
[6] P. Phaal, S. Panchen, and N. McKee, “ Inmon corporations sflow: Amethod for monitoring traffic in switched and routed networks,” RFC3176, Tech. Rep., 2001.
[7] C. Yu, C. Lumezanu, Y. Zhang, V. Singh, G. Jiang, and H. V.Madhyastha, “FlowSense: monitoring network utilization with zeromeasurement cost,” in Proceedings of the 14th International Conferenceon Passive and Active Measurement, ser. PAM’13. Springer, 2013, pp.31–41.
[8] K. Fukuda, “Towards modeling of traffic demand of node in largescale network,” in Communications, 2008. ICC’08. IEEE InternationalConference on. IEEE, 2008, pp. 214–218.
[9] A. B. Downey, “Lognormal and pareto distributions in the internet,”Computer Communications, vol. 28, no. 7, pp. 790–801, 2005.
[10] K. Papagiannaki, R. Cruz, and C. Diot, “Network performance moni-toring at small time scales,” in Proceedings of the 3rd ACM SIGCOMMConference on Internet Measurement. ACM, 2003, pp. 295–300.
[11] S. Miller and D. Childers, Probability and random processes: With ap-plications to signal processing and communications. Academic Press,
2004.
Ris
k o
f o
ve
rlo
ad
Measurements
over a 1 Gb link.
All good… NOT!
Kreuger, Per, and Rebecca Steinert. "Scalable in-network rate monitoring." In Integrated Network Management (IM), 2015 IFIP/IEEE
International Symposium on, pp. 866-869. IEEE, 2015.
SURPRISE - PERSISTENT RISK OF
LINK OVERLOAD
Ris
k o
f o
ve
rlo
ad
Measurements
over a 1 Gb link.
All good… NOT!
Kreuger, Per, and Rebecca Steinert. "Scalable in-network rate monitoring." In Integrated Network Management (IM), 2015 IFIP/IEEE
International Symposium on, pp. 866-869. IEEE, 2015.
For the sake of scalability and observability:
THOU SHALT NOT MEASURE CENTRALLY.
THOU SHALT MEASURE LOCALLY.
TRIGGER: RAW SIGNAL STRENGTH
VS ESTIMATED ATTAINABLE THROUGHPUT. Rao, Akhila, and Rebecca Steinert. "Probabilistic multi-RAT performance abstractions." (2018).
LTE
Wifi
Signal strength Throughput Throughput distribution
TRIGGER: RAW SIGNAL STRENGTH
VS ESTIMATED ATTAINABLE THROUGHPUT. Rao, Akhila, and Rebecca Steinert. "Probabilistic multi-RAT performance abstractions." (2018).
LTE
Wifi
Signal strength Throughput Throughput distribution
o Probability of fulfilling an SLO related to throughput
requirements.
o RAT-agnostic representation by conditional probabilities
for decision making and less monitoring overhead.
o Significantly reduced control signalling (due to
handovers) and tamed control over the tolerated
amount of performance violations.
OBSERVABILITY FOR RESOURCE COORDINATION.
Serving
cell A
(LTE) Serving
cell B
(WiFi) A client with SLO:
Minimum 12 Mbit
throughput at 90%
of the session
duration.
Controller
?
Local
estimate
of attainable
throughput
Serving Cell
Attainable throughput estimate
Serves client
A P(est. TP >= 12) = 73%
B P(est. TP >= 12) = 95% X
Policy
enforcement:
Processing and
decision making
Rao, Akhila, and Rebecca Steinert. "Probabilistic multi-RAT performance abstractions." (2018).
CHALLENGES AND OPPORTUNITIES FOR RAN SLICING.
Unified representation of heterogeneous RATs
User/service-specific
triggers for SLA/SLO
Proactive adaptive
control loops
Grand challenges
Improved
resource utilization.
Improved
service reliability and
quality;
More clients and
subscribers and profit.
Opportunities
ML & AI for networks Network intelligence
Develop learning distributed systems
Gains
FINAL THOUGHTS.
o Programmability and virtualization enables flexible observability.
o Advancements of ML at scale and computational capabilities paves
the way for novel observability processes at different scales.
o Increased synergies between the telecom industry and open source
communities is likely to accelerate development towards high-granular dynamic RAN-slicing.
THANKS FOR LISTENING.