fair benchmarking and root cause analysis in …...fair benchmarking and root cause analysis in...

1
, Fair Benchmarking and Root Cause Analysis in Mobile Networks Vaclav Raida, Michael Rindler, Philipp Svoboda, Markus Rupp Institute of Telecommunications, TU Wien , The Network Perspective Network Wide Measurements I Task: Fusion of network wide measurements into a network perspective I The challenge: turn benchmark into network performance metric It is not very meaningful to compare tariff-limited data rate of User A with tariff-unlimited data rate of User B , indoor measurements with outdoor measurements or 2G with 4G. I Common grouping criteria: I Indoor / outdoor (detected e.g. based on signal strength) I Different UE hardwares I Different mobile network generations I Different tariffs (traffic shaping detection) I Repeated / automatized measurements (majority of tests conducted by few devices) Passively Active Measurements: Users perform active measure- ments. But we can’t choose when, where and how they will do so. Experimental Results 12-Sep-2016 03:00 06:00 09:00 12:00 15:00 18:00 21:00 13-Sep-2016 03:00 06:00 09:00 12:00 15:00 18:00 21:00 14-Sep-2016 03:00 06:00 09:00 12:00 15:00 18:00 21:00 15-Sep-2016 03:00 06:00 09:00 12:00 15:00 18:00 21:00 16-Sep-2016 03:00 06:00 09:00 12:00 15:00 18:00 21:00 17-Sep-2016 03:00 06:00 09:00 12:00 15:00 18:00 21:00 18-Sep-2016 03:00 06:00 09:00 12:00 15:00 18:00 21:00 19-Sep-2016 10 20 30 40 50 60 R / (Mbit/s) Cell empty Figure 1: The Time-of-Day Effect: One week (starting on Monday) of static data rate (R ) measurements. Every dot represents mean rate of one test. We can clearly recognize time-of-day effect. The rate is highest between ca 0 and 6 AM (lowest cell load) and lowest in the afternoon / evening (many users active). 0 1000 2000 3000 4000 5000 6000 7000 t / ms 0 10 20 30 40 R(t) / (Mbit/s) Average data rate (101ms bins) Beginning of level shift End of level shift Network limitation Tariff limitation Figure 2: The Effect of Tariff Limit: Time series of a single data rate test conducted with a limited tariff. Rate is limited by some traffic shaper (e.g. leaky bucket). If we detect rate level shift, we can calculate for example bucket depth (burst size), token generation rate and to some accuracy also an estimate of capacity (rate without tariff limitation). Network limited Data rate is tariff limited Automatic detector of tariff limits Figure 3: Tariff Limit Detector: We use PAR (peak to average ratio) metric to quickly identify tariff limitation. In this case we see five different scenarios – different UEs and different indoor locations. Location impacts signal strength which impacts data rate. Data rate is further impacted by cell load (time-of-day effect). Combination of these factors leads to different overlapping clusters in our PAR vs R scatter plot. Tariff limitation is reveled by strong vertical line. We can apply the same method for crowdsourced data and detect tariff limits by identifying vertical lines. (One week of RTR-NetTest 1 measurements with CMPT.) Mobile Network Benchmarking: A Spatial, Dynamic Challenge Crowdsourced Look at the Network Operator (Austria) DL rate UL rate Ping Quantity A1 27 Mbit/s 6.3 Mbit/s 25 ms 24 925 T-Mobile 25 Mbit/s 9.6 Mbit/s 29 ms 13 018 Hutchison Drei 23 Mbit/s 11.0 Mbit/s 37 ms 18 586 Table 1: Comparison of three Austrian operators carried out by RTR 1 . Figure 4: What looks like an area with a bad coverage. . . Figure 5: . . . are actually just few “outliers.” Problem: We can’t derive network or user perspective by simply taking data rate median in some area, because we don’t know whether the low rate results were caused by location or rather by a different factor like tariff limitation (left column) or BS-handover (right column). We need to understand spatial properties together with a network perspective. Ultimate Goal: Fair Benchmarking and Root Cause Analysis I Task: Fair comparison of mobile networks based on crowdsourced benchmark data I How: I Create network perspective, extract performance metrics (left column) I Create user perspective, active and controlled measurements (right column) I Create network benchmark by fusion of user and network perspective (Fig. 6) -140 -120 -100 -80 -60 -40 RSRP / dBm 0 50 100 150 200 250 300 R / (Mbit/s) A1 Hutchison Drei T-Mobile A1 - reference cell measurements Figure 6: Benchmarking Example: Two dimensional histogram based on RTR 1 open data, showing distribution of tests’ RSRP values and data rates for operator A1. The green dashed line shows a rate-signal capacity curve, i.e. given cer- tain signal strength the curve tells us what is highest achiev- able rate. The green solid line represent measurements in reference cell (in cooperation with A1 operator; only one UE in the cell no impact of cell load, predefined RSRP level no fading). The orange and magenta dashed line show boundaries obtained from distributions of other operators. Conclusion: One possible benchmarking me- thod could be to compare capacity curves of different networks. 1 RTR = Austrian Regulatory Authority for Broadcasting and Telecommunications Source: https://www.netztest.at References [1] M. Rindler, P. Svoboda, M. Rupp, FLARP, Fast Lightweight Available Rate Probing: Benchmarking Mobile Broadband Networks, ICC17, Paris, May 2017 [2] S. Homayouni, V. Raida, P. Svoboda, CMPT: a Methodology of Comparing Performance Measurement Tools, ICUMT, Lisbon, October 2016 The User / Application Perspective Measurements in Mobile Networks and Other Reactive Setups I Task: Conduct active measurements, gain user perspective ground truth I Challenges: I Mobility (BS-Handover) I Cell load (Capacity shared among users) I Crosstraffic (User’s capacity split among multiple apps) I Various network changes Uncontrollable Influences: Filter criteria in the left column can be either directly extracted from the open database because they are re- ported by the measurement tool (UE hardware category, mobile network generation) or they can be derived (tariff limitation from the shape of data rate curve, indoor / outdoor based on signal strength). On the other hand, data rate decreasing factors like BS-handover, high cell load or user’s crosstraffic are difficult or even impossible to reconstruct. FLARP: Spatial Measurement, Short Monitoring of Capacity Fast Lightweight Available Rate Probing (FLARP) [1] is able to estimate available data rate in sub-second time which allows to record at high granularity in the space. Figure 7: Repeatable local avg. performance. Results Server Settings Client Set Probe Pattern Upload Results Chirp Probing Figure 8: The block diagram of the implemented probing system. CMPT: Framework for Reference Measurements Results CMPT Server Settings Configuration Upload Results Test Server Web Interface Test Server Test Server UE CMPT App UE CMPT App UE CMPT App Figure 9: Generic CMPT probing setup Figure 10: Screenshots of CMPT Android app. Crowdsourcing Mobile Performance Tool (CMPT) [2] is an Android ap- plication developed in order to per- form automatized performance mea- surement tasks. It repeatedly ex- ecutes predefined experiments with randomized parameters and reports the results to a centralized database. This enables a continuous monitoring of a ground truth at predefined static places. Acknowledgment This work was supported by the Austrian Research Promotion Agency (FFG) as a part of the Bridge Project No:850742, Methodical Solution for Cooperative Hybrid Performance Analytics in Mobile Net- works (Mc.Hypa-Miner). www.nt.tuwien.ac.at 7th TMA PhD School on Traffic Monitoring and Analysis, Dublin, June 19-20, 2017 [email protected]

Upload: others

Post on 24-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fair Benchmarking and Root Cause Analysis in …...Fair Benchmarking and Root Cause Analysis in Mobile Networks Vaclav Raida, Michael Rindler, Philipp Svoboda, Markus Rupp Institute

,

Fair Benchmarking and Root Cause Analysis in Mobile NetworksVaclav Raida, Michael Rindler, Philipp Svoboda, Markus RuppInstitute of Telecommunications, TU Wien

,

The Network Perspective

Network Wide Measurements

I Task: Fusion of network wide measurements into a network perspective

I The challenge: turn benchmark into network performance metricIt is not very meaningful to compare tariff-limited data rate of UserA with

tariff-unlimited data rate of UserB, indoor measurements with outdoor measurements

or 2G with 4G.

I Common grouping criteria:I Indoor / outdoor (detected e.g. based on signal strength)I Different UE hardwaresI Different mobile network generationsI Different tariffs (traffic shaping detection)I Repeated / automatized measurements (majority of tests conducted by few devices)

Passively Active Measurements: Users perform active measure-ments. But we can’t choose when, where and how they will do so.

Experimental Results

12-S

ep-2

016

03:0

006

:00

09:0

012

:00

15:0

018

:00

21:0

0

13-S

ep-2

016

03:0

006

:00

09:0

012

:00

15:0

018

:00

21:0

0

14-S

ep-2

016

03:0

006

:00

09:0

012

:00

15:0

018

:00

21:0

0

15-S

ep-2

016

03:0

006

:00

09:0

012

:00

15:0

018

:00

21:0

0

16-S

ep-2

016

03:0

006

:00

09:0

012

:00

15:0

018

:00

21:0

0

17-S

ep-2

016

03:0

006

:00

09:0

012

:00

15:0

018

:00

21:0

0

18-S

ep-2

016

03:0

006

:00

09:0

012

:00

15:0

018

:00

21:0

0

19-S

ep-2

016

10

20

30

40

50

60

R /

(Mbi

t/s)

Cell empty

Figure 1: The Time-of-Day Effect: One week (starting on Monday) of static data rate (R) measurements. Every dotrepresents mean rate of one test. We can clearly recognize time-of-day effect. The rate is highest between ca 0 and 6 AM(lowest cell load) and lowest in the afternoon / evening (many users active).

0 1000 2000 3000 4000 5000 6000 7000

t / ms

0

10

20

30

40

R(t

) / (

Mbi

t/s) Average data rate (101ms bins)

Beginning of level shiftEnd of level shift

Network limitation

Tariff limitation

Figure 2: The Effect of Tariff Limit: Time series of a single data rate test conducted with a limited tariff. Rate islimited by some traffic shaper (e.g. leaky bucket). If we detect rate level shift, we can calculate for example bucket depth(burst size), token generation rate and to some accuracy also an estimate of capacity (rate without tariff limitation).

Net

wor

klim

ited

Dat

ara

teis

tari

fflim

ited Automatic detector

of tariff limits

Figure 3: Tariff Limit Detector: We use PAR (peak to average ratio) metric to quickly identify tariff limitation. In thiscase we see five different scenarios – different UEs and different indoor locations. Location impacts signal strength whichimpacts data rate. Data rate is further impacted by cell load (time-of-day effect). Combination of these factors leads todifferent overlapping clusters in our PAR vs R scatter plot. Tariff limitation is reveled by strong vertical line. We can applythe same method for crowdsourced data and detect tariff limits by identifying vertical lines.(One week of RTR-NetTest1 measurements with CMPT.)

Mobile Network Benchmarking: A Spatial, Dynamic Challenge

Crowdsourced Look at the NetworkOperator (Austria) DL rate UL rate Ping Quantity

A1 27 Mbit/s 6.3 Mbit/s 25 ms 24 925T-Mobile 25 Mbit/s 9.6 Mbit/s 29 ms 13 018Hutchison Drei 23 Mbit/s 11.0 Mbit/s 37 ms 18 586

Table 1: Comparison of three Austrian operators carried out by RTR1.

Figure 4: What looks like an area with a bad coverage. . . Figure 5: . . . are actually just few “outliers.”

Problem: We can’t derive network or user perspective by simply takingdata rate median in some area, because we don’t know whether the lowrate results were caused by location or rather by a different factor liketariff limitation (left column) or BS-handover (right column). We needto understand spatial properties together with a network perspective.

Ultimate Goal: Fair Benchmarking and Root Cause Analysis

I Task: Fair comparison of mobile networks based on crowdsourcedbenchmark data

I How:I Create network perspective, extract performance metrics (left column)I Create user perspective, active and controlled measurements (right column)I Create network benchmark by fusion of user and network perspective (Fig. 6)

-140 -120 -100 -80 -60 -40

RSRP / dBm

0

50

100

150

200

250

300

R /

(Mbi

t/s)

A1Hutchison DreiT-MobileA1 - reference cell measurements

Figure 6: Benchmarking Example: Two dimensionalhistogram based on RTR1 open data, showing distribution oftests’ RSRP values and data rates for operator A1. The greendashed line shows a rate-signal capacity curve, i.e. given cer-tain signal strength the curve tells us what is highest achiev-able rate. The green solid line represent measurements inreference cell (in cooperation with A1 operator; only one UEin the cell → no impact of cell load, predefined RSRP level→ no fading). The orange and magenta dashed line showboundaries obtained from distributions of other operators.

Conclusion: One possible benchmarking me-thod could be to compare capacity curves ofdifferent networks.

1RTR = Austrian Regulatory Authority for Broadcasting and Telecommunications

Source: https://www.netztest.at

References

[1] M. Rindler, P. Svoboda, M. Rupp, FLARP, Fast Lightweight Available Rate Probing: Benchmarking

Mobile Broadband Networks, ICC17, Paris, May 2017

[2] S. Homayouni, V. Raida, P. Svoboda, CMPT: a Methodology of Comparing Performance Measurement

Tools, ICUMT, Lisbon, October 2016

The User / Application Perspective

Measurements in Mobile Networks and Other Reactive Setups

I Task: Conduct active measurements, gain user perspective ground truthI Challenges:I Mobility (BS-Handover)I Cell load (Capacity shared among users)I Crosstraffic (User’s capacity split among multiple apps)I Various network changes

Uncontrollable Influences: Filter criteria in the left column can beeither directly extracted from the open database because they are re-ported by the measurement tool (UE hardware category, mobile networkgeneration) or they can be derived (tariff limitation from the shape ofdata rate curve, indoor / outdoor based on signal strength). On theother hand, data rate decreasing factors like BS-handover, high cell loador user’s crosstraffic are difficult or even impossible to reconstruct.

FLARP: Spatial Measurement, Short Monitoring of CapacityFast Lightweight Available Rate Probing (FLARP) [1] is able to estimateavailable data rate in sub-second time which allows to record at highgranularity in the space.

Figure 7: Repeatable local avg. performance.

our algorithm in mobile and wired networks and Section VIconcludes our paper.

II. RELATED WORK

There is extensive work on rate based IP performancemonitoring spanning several decades, see [4]–[6]. In contrastto measurements on the physical link, measurements on thehigher layers of the Open Systems Interconnection (OSI)model include certain abstraction and interaction with the pro-tocols, e.g., the slow-start of the Transmission Control Protocol(TCP) flow control. The Internet Protocol (IP) supports packetbased routing at every node in meshed networks. Traffic flowson each link are an overlapping of instantaneous routing. Earlywork focused on the detection of bottlenecks, see [6], [7].

The probe rate methods actively inject traffic into thenetwork at different rates, conducting short stress testing atdifferent data rates. When the instantaneous probing rateexceeds the Available Bandwidth (AB) on the path the receivedpattern starts deviating from the expected behavior due toqueueing. In [8] probing techniques are grouped into twocategories: direct and iterative probing. While direct probingmethods require the tight link capacity to be known andmeasure the data rate at the receiver while loading the linkwith a constant stream, iterative probing considers the changein One Way Delay (OWD) while sweeping through a set ofprobing data rates, see [9] by Hu and Steenkiste, [5] TOPPby Melander et al., [10] Pathload by Jain and Dovrolis, [11]PathChirp, [12] PathMon, and [13] Pathvar.

As an example, PathChirp’s way of estimating the availablebandwidth, via determining the data rate at which the one-way-delay significantly increases, assumes that this happensaround the time when the instantaneous data rate starts toexceed AB [8]. This approach has shown several limitations,especially if applied to reactive networks such as cellularmobile networks, see [?], [14], [15]. In mobile networksassociated scheduling algorithms cause significant variationof OWD [14]. The estimation algorithms generate errors asthey assume static connection parameters. The common agreedengineering solution for performance tests includes preloadingphases to put the network under test in a mode of maximumperformance.

Dispersion based methods for estimating AB have beendesigned for fixed networks with high link capacity (1Gb/sand more). These methods measure the dispersion of packetsalong the path and use this information to estimate AB, seethe tools Delphi, Spruce and ASSOLO [1], [17].

Summarizing previous works, we can state that the as-sumptions these probing techniques rely on are not valid inmodern high speed mobile networks. In fact our tests withestablished tools [5], [9], [10] in an emulated mobile networkscenario produced estimates which were off by several ordersof magnitude. Even the implementation targeted to estimatelink speed in wireless networks WBest, see [18], did not resultin satisfying results.

Our contribution is merging the world of reactive networkswith research on active, pattern based probing. In order to

TABLE IDATA VOLUME USED FOR A SINGLE CAPACITY ESTIMATE.

Tool Data Volumeat 25 Mbit/s at 75 Mbit/s

iPerf3 30MByte 93MByteoRMBT 20MByte 65MByteFLARP < 3.6MByte < 3.6MByte

Results

ServerSettings

Client

Set Probe Pattern

Upload Results

Chirp Probing

Fig. 1. The block diagram of the implemented probing system.

keep this solution practical for crowdsourcing applications, weaimed to develop the new probing and monitoring scheme in aresource minimal fashion. This will allow operators to provideservices at a precisely defined QoS level. To our knowledgethere are not yet any similar methods for measuring capacityof end-to-end links in current mobile networks (3G or 4G) ina resourceful, non-intrusive and timely fashion.

III. IMPLEMENTATION

Developing our Fast Lightweight Available Rate Probing(FLARP) technique the objective was to measure bandwidthat the transport layer of a network. Such measurements shall bepossible in an unknown network environment at any coveredremote location. As discussed in the previous section, stateof the art tools use TCP based data rate measurements toestimate the available end-to-end bandwidth at the applicationlevel. These methods transfer large amounts of data volumelinearly increasing with the link speed. Our target was to keepthe amount for a single probing event nonelastic with the linkspeed and to decrease the time needed for a measurement.Table I provides an example of our tool’s data volume usagecompared to two state of the art performance measurementtools: iPerf31 with five parallel flows of ten seconds duration,and OpenRMBT (oRMBT)2, standard configuration, usingfour flows.

We aimed to develop a solution which can be used oncommodity hardware with as little modification as possible,instead of providing a plain Proof of Concept using modifiedhardware and requiring time stamping with a synchronizedclock.

Our implementation combines a server capable of deliveringa stream of arbitrary UDP probing patterns with a clientconnecting to it and requesting the pattern to be sent, seeFig. 1. The server retrieves the pattern configuration datafrom a settings database. After exchanging parameters in an

1https://iperf.fr/2https://github.com/alladin-IT/open-rmbt

Figure 8: The block diagram of the implemented probing system.

CMPT: Framework for Reference Measurements

Results

CMPT Server

SettingsConfiguration

Upload Results

Test Server

Web Interface

Test ServerTest

Server

UECMPT App

UECMPT App

UECMPT App

Figure 9: Generic CMPT probing setup

Figure 10: Screenshots of CMPT Android app.

Crowdsourcing Mobile PerformanceTool (CMPT) [2] is an Android ap-plication developed in order to per-form automatized performance mea-surement tasks. It repeatedly ex-ecutes predefined experiments withrandomized parameters and reportsthe results to a centralized database.This enables a continuous monitoringof a ground truth at predefined staticplaces.

Acknowledgment

This work was supported by the Austrian Research Promotion Agency (FFG) as a part of the BridgeProject No:850742, Methodical Solution for Cooperative Hybrid Performance Analytics in Mobile Net-works (Mc.Hypa-Miner).

www.nt.tuwien.ac.at 7th TMA PhD School on Traffic Monitoring and Analysis, Dublin, June 19-20, 2017 [email protected]