observability for service slicing - eucnc...monitoring for service provider devops." in...

FINDING THE TRIGGERS FOR

DYNAMIC SERVICE SLICING Rebecca Steinert, PhD

Head of the Network Intelligence group

Decisions, Networks and Analytics Lab

RISE SICS AB

ADVANCING THE SOTA OF RAN

CONTROL FRAMEWORKS

o So far: Main focus on sophisticated mechanisms.

o Next step: Robust and proactive

scaling of slices based on triggers.

o Triggers are:

o Scalable.

o Universal – applicable in

HetNets.

o Capable of quantifying

uncertainty.

INFORMATION-DRIVEN TRIGGERS –

THE ENABLER OF DYNAMIC SLICING.

o Proper triggers require sophisticated observation

points/models that are:

o Computationally lightweight.

o Provide information-effective representations of the

network state and its variability.

o Decentralized / distributed for serving short-term /

long-term control loops.

o Offloading observability processes to the networks reduces the monitoring overhead significantly *.

* John, Wolfgang, Catalin Meirosu, Bertrand Pechenot, Pontus Sköldström, Per Kreuger, and Rebecca Steinert. "Scalable software defined

monitoring for service provider devops." In Software Defined Networks (EWSDN), 2015 Fourth European Workshop on, pp. 61-66. IEEE, 2015.

Local data sources

(counters, estimates)

Composite models (aggregates,

joint probabilistic models)

Logically centralized control

Programmable infrastructure w. local observability points/triggers

Management framework w. triggers

Aggregated high-level composite models

Co

ntro

l po

licie

s

Estim

ate

s

DISTRIBUTED OBSERVABILITY. Kreuger, Per, and Rebecca Steinert. "Scalable in-network rate monitoring." In Integrated Network

Management (IM), 2015 IFIP/IEEE International Symposium on, pp. 866-869. IEEE, 2015

DB Raw meas.

DB Est. / aggr.

DB

EXAMPLE – THE HIDDEN ISSUE OF

5 MIN SNMP AVERAGES.

Measurements

over a 1 Gb link.

All good…

Kreuger, Per, and Rebecca Steinert. "Scalable in-network rate monitoring." In Integrated Network Management (IM), 2015 IFIP/IEEE

International Symposium on, pp. 866-869. IEEE, 2015.

SURPRISE - PERSISTENT RISK OF

LINK OVERLOAD.

Fig. 5: Time series of 5 m averages over one day from 15-24 on theheavily loaded 1 G link (top), and estimates of the risk of exceedinglink capacity over consecutive periods of 5 m, and 0.3 s respectively.

TABLE II: Detection rates for naive congestion detector.

threshold 5 m 0.3 st = 0.01 true positive false positive true negative hit miss

t /15 72 23 9 1184 17

t /10 69 11 24 1180 21

t /5 49 6 49 1088 113

correctly predicted absence of congestion at 0.3 s during thefollowing 5 m period. The “hit” and “miss” columns indicatehow many of the “ true” 0.3 scongestions thedetector captured.

We can see that even this relatively naive mechanism hasa fairly impressive hit rate. For example, a threshold of onetenth of the “ true” 0.3 s threshold at the 5 m level, the detectorcaptures 1180 out of 1201 “ true” 0.3 s congestions, or 98.3%.The main problem with this simple mechanism is the amountof high rate monitoring required to find the true positives. Still,this simple assessment clearly indicates the potential for usingthe estimates derived at lower time resolutions as predictors

for higher rate events.IV. CONCLUSION AND PERSPECTIVE

We have proposed a generic local and scalable approachto traffic rate monitoring based on high rate updates of twocounters in the data plane for recording the first and secondstatistical moments of each observed rate. The moments areused to estimate the parameters (using a MoM estimator) of alognormal distribution at predefined and/or variable intervals.Different aspects of the method have been evaluated usingreal-world data sets. We have verified that the data can befitted using a lognormal distribution, compared the estimationaccuracy relative to observations at different time scales andtested a naive probabilistic method for detecting increasedrisk of congestion on a link using probabilistic thresholds onproperties of the estimated distributions.

Analysis of the percentiles of estimates obtained at lowrates shows clear potential for methods for autonomouslyand robustly detecting high risk of congestion. Future workincludes development of a more robust detector based onadaptive probabilistic thresholds, and extension of the studyto less aggregated flows, and shorter time scales.

ACKNOWLEDGEMENT

This work was supported in part by the FP7 UNIFY EUproject. Theauthors would like to thank thestaff at TeliaSonerafor providing access to the traffic rate measurements: PerTholin, Johanna Nieminen and Patrik Lindwall.

REFERENCES

[1] W. John, K. Pentikousis, G. Agapiou, E. Jacob, M. Kind, A. Manzalini,F. Risso, D. Staessens, R. Steinert, and C. Meirosu, “Research direc-tions in network service chaining,” in Future Networks and Services(SDN4FNS), 2013 IEEE SDN for. IEEE, 2013, pp. 1–7.

[2] A. Csaszar, W. John, M. Kind, C. Meirosu, G. Pongracz, D. Staessens,A. Takacs, and F.-J. Westphal, “Unifying Cloud and Carrier Network:EU FP7 Project UNIFY,” in Utility and Cloud Computing (UCC), 2013IEEE/ACM 6th International Conference on. IEEE, 2013, pp. 452–457.

[3] R. Presuhn, “Management information base (MIB) for the simplenetwork management protocol (SNMP),” 2002, RFC 3418, InternetEngineering Task Force.

[4] S. Waldbusser, R. Cole, C. Kalbfleisch, and D. Romascanu, “ Intro-duction to the Remote Monitoring (RMON) Family of MIB Modules;RFC-3577,” Internet RFC 3577, August, Tech. Rep., 2003.

[5] Cisco IOS, “NetFlow,” 2008.

[6] P. Phaal, S. Panchen, and N. McKee, “ Inmon corporations sflow: Amethod for monitoring traffic in switched and routed networks,” RFC3176, Tech. Rep., 2001.

[7] C. Yu, C. Lumezanu, Y. Zhang, V. Singh, G. Jiang, and H. V.Madhyastha, “FlowSense: monitoring network utilization with zeromeasurement cost,” in Proceedings of the 14th International Conferenceon Passive and Active Measurement, ser. PAM’13. Springer, 2013, pp.31–41.

[8] K. Fukuda, “Towards modeling of traffic demand of node in largescale network,” in Communications, 2008. ICC’08. IEEE InternationalConference on. IEEE, 2008, pp. 214–218.

[9] A. B. Downey, “Lognormal and pareto distributions in the internet,”Computer Communications, vol. 28, no. 7, pp. 790–801, 2005.

[10] K. Papagiannaki, R. Cruz, and C. Diot, “Network performance moni-toring at small time scales,” in Proceedings of the 3rd ACM SIGCOMMConference on Internet Measurement. ACM, 2003, pp. 295–300.

[11] S. Miller and D. Childers, Probability and random processes: With ap-plications to signal processing and communications. Academic Press,

2004.

Ris

k o

f o

ve

rlo

ad

Measurements

over a 1 Gb link.

All good… NOT!



SURPRISE - PERSISTENT RISK OF

LINK OVERLOAD

Ris

k o

f o

ve

rlo

ad

Measurements

over a 1 Gb link.

All good… NOT!



For the sake of scalability and observability:

THOU SHALT NOT MEASURE CENTRALLY.

THOU SHALT MEASURE LOCALLY.

TRIGGER: RAW SIGNAL STRENGTH

VS ESTIMATED ATTAINABLE THROUGHPUT. Rao, Akhila, and Rebecca Steinert. "Probabilistic multi-RAT performance abstractions." (2018).

LTE

Wifi

Signal strength Throughput Throughput distribution

TRIGGER: RAW SIGNAL STRENGTH

VS ESTIMATED ATTAINABLE THROUGHPUT. Rao, Akhila, and Rebecca Steinert. "Probabilistic multi-RAT performance abstractions." (2018).

LTE

Wifi

Signal strength Throughput Throughput distribution

o Probability of fulfilling an SLO related to throughput

requirements.

o RAT-agnostic representation by conditional probabilities

for decision making and less monitoring overhead.

o Significantly reduced control signalling (due to

handovers) and tamed control over the tolerated

amount of performance violations.

OBSERVABILITY FOR RESOURCE COORDINATION.

Serving

cell A

(LTE) Serving

cell B

(WiFi) A client with SLO:

Minimum 12 Mbit

throughput at 90%

of the session

duration.

Controller

?

Local

estimate

of attainable

throughput

Serving Cell

Attainable throughput estimate

Serves client

A P(est. TP >= 12) = 73%

B P(est. TP >= 12) = 95% X

Policy

enforcement:

Processing and

decision making

Rao, Akhila, and Rebecca Steinert. "Probabilistic multi-RAT performance abstractions." (2018).

CHALLENGES AND OPPORTUNITIES FOR RAN SLICING.

Unified representation of heterogeneous RATs

User/service-specific

triggers for SLA/SLO

Proactive adaptive

control loops

Grand challenges

Improved

resource utilization.

Improved

service reliability and

quality;

More clients and

subscribers and profit.

Opportunities

ML & AI for networks Network intelligence

Develop learning distributed systems

Gains

FINAL THOUGHTS.

o Programmability and virtualization enables flexible observability.

o Advancements of ML at scale and computational capabilities paves

the way for novel observability processes at different scales.

o Increased synergies between the telecom industry and open source

communities is likely to accelerate development towards high-granular dynamic RAN-slicing.

THANKS FOR LISTENING.

observability for service slicing - eucnc...monitoring for service provider devops." in...

Documents