bayesian estimation of network-wide mean failure...

12
Bayesian Estimation of Network-wide Mean Failure Probability in 3G Cellular Networks Angelo Coluccia 1 , Fabio Ricciato 1,2 , and Peter Romirer-Maierhofer 2 1 University of Salento, Lecce, Italy 2 FTW Forschungszentrum Telekommunikation Wien, Vienna, Austria {ricciato,coluccia,romirer}@ftw.at Abstract. Mobile users in cellular networks produce calls, initiate connections and send packets. Such events have a binary outcome — success or failure. The term “failure” is used here in a broad sense: it can take different meanings de- pending on the type of event, from packet loss or late delivery to call rejection. The Mean Failure Probability (MFP) provides a simple summary indicator of network-wide performance — i.e., a Key Performance Indicator (KPI) — that is an important input for the network operation process. However, the robust esti- mation of the MFP is not trivial. The most common approach is to take the ratio of the total number of failures to the total number of requests. Such simplistic approach suffers from the presence of heavy-users, and therefore does not work well when the distribution of traffic (i.e., requests) across users is heavy-tailed — a typical case in real networks. This motivates the exploration of more ro- bust methods for MFP estimation. In a previous work [1] we derived a simple but robust sub-optimal estimator, called EPWR, based on the weighted average of individual (per-user) failure probabilities. In this follow-up work we tackle the problem from a different angle and formalize the problem following a Bayesian approach, deriving two variants of non-parametric optimal estimators. We apply these estimators to a real dataset collected from a real 3G network. Our results confirm the goodness of the proposed estimators and show that EPWR, despite its simplicity, yields near-optimum performance. 1 Introduction The users of a third-generation (3G) mobile network generate various types of activity. Consider the following events: transmission of IP packets, opening of Transport- and Application-layer connections (i.e., envoy of TCP SYN packets and HTTP GET com- mands), activation of phone call, SMS envoy, data connection and signaling procedures (Attach Request, Location Area Update, Authentication Request, Paging etc.). All such events have in common two characteristics. First, each event is naturally associated with an individual user in the mobile network: the caller (for outgoing calls) or the callee (for incoming ones), the sender (for uplink packets) or the receiver (for downlink packets). Second, each event has a binary outcome: success or failure. The term “failure” can take on different meanings depending on the type of event: for packets, failure can rep- resent the missed delivery (e.g. due to queue loss or corruption by link-level errors) or late delivery after a delay threshold.

Upload: duongdien

Post on 25-Aug-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Estimation of Network-Wide Mean Failure ...dl.ifip.org/db/conf/ifip6-3/perform2010/ColucciaRR10.pdf · Bayesian Estimation of Network-wide Mean Failure Probability in 3G

Bayesian Estimation of Network-wide Mean FailureProbability in 3G Cellular Networks

Angelo Coluccia1, Fabio Ricciato1,2, and Peter Romirer-Maierhofer2

1 University of Salento, Lecce, Italy2 FTW Forschungszentrum Telekommunikation Wien, Vienna, Austria

{ricciato,coluccia,romirer}@ftw.at

Abstract. Mobile users in cellular networks produce calls, initiate connectionsand send packets. Such events have a binary outcome — success or failure. Theterm “failure” is used here in a broad sense: it can take different meanings de-pending on the type of event, from packet loss or late delivery to call rejection.The Mean Failure Probability (MFP) provides a simple summary indicator ofnetwork-wide performance — i.e., a Key Performance Indicator (KPI) — that isan important input for the network operation process. However, the robust esti-mation of the MFP is not trivial. The most common approach is to take the ratioof the total number of failures to the total number of requests. Such simplisticapproach suffers from the presence of heavy-users, and therefore does not workwell when the distribution of traffic (i.e., requests) across users is heavy-tailed— a typical case in real networks. This motivates the exploration of more ro-bust methods for MFP estimation. In a previous work [1] we derived a simplebut robust sub-optimal estimator, called EPWR, based on the weighted averageof individual (per-user) failure probabilities. In this follow-up work we tackle theproblem from a different angle and formalize the problem following a Bayesianapproach, deriving two variants of non-parametric optimal estimators. We applythese estimators to a real dataset collected from a real 3G network. Our resultsconfirm the goodness of the proposed estimators and show that EPWR, despiteits simplicity, yields near-optimum performance.

1 Introduction

The users of a third-generation (3G) mobile network generate various types of activity.Consider the following events: transmission of IP packets, opening of Transport- andApplication-layer connections (i.e., envoy of TCP SYN packets and HTTP GET com-mands), activation of phone call, SMS envoy, data connection and signaling procedures(Attach Request, Location Area Update, Authentication Request, Paging etc.). All suchevents have in common two characteristics. First, each event is naturally associated withan individual user in the mobile network: the caller (for outgoing calls) or the callee (forincoming ones), the sender (for uplink packets) or the receiver (for downlink packets).Second, each event has a binary outcome: success or failure. The term “failure” cantake on different meanings depending on the type of event: for packets, failure can rep-resent the missed delivery (e.g. due to queue loss or corruption by link-level errors) orlate delivery after a delay threshold.

Page 2: Bayesian Estimation of Network-Wide Mean Failure ...dl.ifip.org/db/conf/ifip6-3/perform2010/ColucciaRR10.pdf · Bayesian Estimation of Network-wide Mean Failure Probability in 3G

Generally speaking, failures can be caused by the unavailability of some resource— e.g. due to exhaustion by too many concurrent requests — or by its unreachability.The involved resources can be shared — e.g., the GGSN or a Core Network link —or dedicated to individual users — e.g. a dedicated radio channel, or the receive bufferinside the terminal itself. The failure probability experienced by each individual userwill be affected by the availability and reachability of both the shared and dedicatedresource components.

Even if the status of shared resources is good, user-specific conditions can severelyimpact the individual failure probability: for example, a user with poor radio link willexperience a high rate of packet losses, while a terminal configured with a wrong AccessPoint Name (APN, ref. [2]) will have all its PDP-context activations rejected. On theother hand, failures affecting multiple users at the same time often indicate a problemand/or overload in the shared section. Therefore, monitoring the incidence of failuresacross users is of great importance for the operation and troubleshooting of the networkinfrastructure.

Network equipments typically maintain logs and/or counters of attempts and fail-ures, from which synthetic indicators are derived — often called Key Performance Indi-cators (KPI). Failures (and successes) can be also measured by a passive monitor eitherfrom the observation of an explicit failure indication (e.g. a Negative ACK or rejectmessage) or by the absence of an explicit success indication (e.g. positive ACK). Oneof the most common KPI is trivially the ratio between the total number of failures andthe total number of attempts (or requests). Here we show that such simplistic approachhas some fundamental limitations and does not work well in scenarios of practical inter-est. The problem arises when the individual rate of requests (calls, procedures, packets)varies wildly across users — a case that is typically encountered in real networks whenthe distribution of user activity is often heavy-tailed. In this work we formulate theproblem in terms of Bayesian estimation and provide (two variants of) an optimal esti-mator. Furthermore, we compare its performance to a simpler estimator developed in aprevious work [1], discussing the trade-offs between optimality and simplicity.

2 Problem formulation

2.1 System model

To preserve generality, we will adopt the term “REQUEST” to refer to a general re-source access attempt — to avoid specific terms like packet, call, connection. Unsuc-cessful requests are referred to as “FAILURES”. Each request is associated with onemobile “USER”.

For a generic measurement timebin (e.g. 1 minute), let I denote the total numberof active users for which at least one request was observed. In operational networks Iis often quite large, from thousands to millions depending on the timebin duration. Forevery user i (i = 1 . . . I) we introduce the following variables:

– ni is the total number of requests associated to i (ni ≥ 1);– mi is the number of failures (0 ≤ mi ≤ ni);– ri

def= mi

nidenotes the empirical failure ratio for i;

2

Page 3: Bayesian Estimation of Network-Wide Mean Failure ...dl.ifip.org/db/conf/ifip6-3/perform2010/ColucciaRR10.pdf · Bayesian Estimation of Network-wide Mean Failure Probability in 3G

– ai the (unknown) failure probability for i (ai ∈ [0, 1]).

We denote by N def=∑i ni and M def=

∑imi respectively the total number of requests

and failures across all terminals. To simplify the notation we will occasionally use thevectors n

def= [n1n2 · · ·nI ]T , mdef= [m1m2 · · ·mI ]T and a

def= [a1a2 · · · aI ]T . For thesake of mathematical tractability we assume independence between failures: each re-quest for user i fails with probability ai independently from any other request of thesame or other users. Therefore mi is the sum of ni Bernoulli trials with probability ai,and all mi’s are independent Binomial random variables:

mi ∼ B(ni, ai)⇒{

E[mi] = niaiVAR[mi] = niai(1− ai).

(1)

Throughout the paper we assume that m and n have been measured in some wayand serve as input for the problem at hand.

2.2 Goal definition

A central component of the model is that the (unknown) failure probabilities ai’s are re-garded as i.i.d. random variables generated from a common underlying distribution p(a)with mean value a def= E[a]. Therefore it is natural to take a as the summary indicatorrepresentative of “network-wide failure probability”. In other words, we are interestedin obtaining an estimator of the Mean Failure Probability a from the measured vectorsn and m.

The goodness of such an estimator can be evaluated against the following criteria:

– Optimality We consider only unbiased estimators and take minimum variance asthe optimality criterion. In fact, when the estimate of a is used for performancemonitoring, lower variance (i.e., smaller statistical fluctuations) allows better dis-crimination of change-points and/or trends.

– Generality In passive monitoring the vector n is given and can not be controlled.Typically, the traffic volume is distributed very unevenly across users, often withlong-tails. Moreover, the traffic distribution change across time, following daily orweekly cycles and long-term trends in user activity (see e.g. [4, §VI-A]). Thereforewe seek a robust estimator that does not rely on specific assumptions about (thedistribution of) n nor requires manual re-tuning when the distribution changes.

– Simplicity The ideal estimator should be easy to implement, fast to compute andconceptually simple — as a matter of fact, methods which can be understoodstraightforwardly by practitioners are more likely to be adopted in practice.

2.3 Resolution Approach

In a previous contribution [1] we focused on a particular class of weighted estimatorsand casted the problem in terms of constrained optimization. We derived a very sim-ple sub-optimal estimator, hereafter called Empirical Piecewise-linear Weighted Ratio(EPWR), which showed excellent performance in simulations. The EPWR involves a

3

Page 4: Bayesian Estimation of Network-Wide Mean Failure ...dl.ifip.org/db/conf/ifip6-3/perform2010/ColucciaRR10.pdf · Bayesian Estimation of Network-wide Mean Failure Probability in 3G

cut-off parameter θ to be set heuristically — we showed that it is not too sensitive to theexact value of θ as far as extreme settings (very small or very large) are avoided.

In this work we tackle the problem from a more theoretically-grounded perspective:following a formal Bayesian approach, we identify two estimators that are well suitedfor our purposes. Moreover, we compare the performance of the Bayesian estimatorsagainst the EPWR estimator derived in [1] plus two other simplistic estimators. Thecomparison is carried out on a sample dataset from a real operational network, basedon data obtained with the METAWIN system [7].

3 Simple estimators

Hereafter we present two common estimators and highlight their limitations. In §3.3 werecall the EPWR estimator which was derived earlier in [1].

3.1 Empirical Global Ratio (EGR)

The Key Performance Indicator (KPI) most widely adopted by practitioners is simplythe ratio between the total number of failures and the total number of requests acrossall users, formally:

SEGRdef=∑Ii=1mi∑Ii=1 ni

=M

N. (2)

We refer to such quantity as the Empirical Global Ratio (EGR). It can be seen that SEGR

is indeed an unbiased estimator for the mean failure probability:

E[SEGR] =∑Ii=1 E[E[mi|ai]]∑I

i=1 ni=∑Ii=1 niE[ai]∑Ii=1 ni

= a

It is worth remarking that SEGR is the simplest estimator to implement in practice. Infact, it does not require knowledge of the full vectors n,m (each composed of I el-ements) but only of two global counters N and M . Therefore, it does not require themeasurement platform to associate requests and failures to individual users.

Intuitively, the problem with SEGR is the presence of few users with very high traffic(large ni) that occasionally inflate the value of eq. (2), thus increasing the estimatorvariance. One possible approach to “correct” SEGR is to pick users with large ni as “out-liers” and filter them out before computing the ratio in eq. (2). Such strategy has twodrawbacks: for one, it requires an heuristically-tuned method to classify outliers. Sec-ond, filtering out “fat” users with large ni means discarding their sample estimates fora — actually the most reliable ones — and for long-tailed ni’s the loss of informationmight not be negligible. In other words, the variance increase due to a lower number ofreliable measurement samples will partially offset the reduction due to filtering.

3.2 Empirical Mean Ratio (EMR)

Note that the empirical loss ratio ri for each user i is an unbiased estimator of the meanfailure probability a, formally E[ri] = E

[E[ri|ai]

]= E[ai] = a. Therefore an intuitive

4

Page 5: Bayesian Estimation of Network-Wide Mean Failure ...dl.ifip.org/db/conf/ifip6-3/perform2010/ColucciaRR10.pdf · Bayesian Estimation of Network-wide Mean Failure Probability in 3G

summary indicator can be obtained as the arithmetic mean across all users:

SEMRdef=

1I

I∑i=1

ri =1I

I∑i=1

mi

ni. (3)

We refer to such indicator as the Empirical Mean Ratio (EMR). From a system-levelperspective, the implementation of SEMR is more costly than SEGR: in fact, it requires themeasurement platform to count requests and failures separately for each user. The ad-ditional resource consumption (memory, processing) should be compensated by betteraccuracy of the estimation. However, SEMR is a sub-optimal estimator for a. The prob-lem lies in the fact that the variance of ri (conditioned to ai) is inversely proportionalto the number of packets ni, i.e. VAR[ri|ai] = ai(1− ai)/ni: intuitively, the larger thesample size, the better the accuracy of the estimate. Therefore, large variability of ni’smaps to large variability of the uncertainty (variance) of the individual estimates — acase of heteroscedasticity. In the simple arithmetic mean as in eq. (3), more accurateestimates (for large ni) weight the same as poor ones (for small ni), and if the numberof low-traffic users is high they will drive up the overall variance.

Similarly to SEGR, one possible “correction” for SEMR is to filter out (discard) theobservations associated to low-traffic users and compute the ratio in eq. (3) only on thesamples above a minimum sample size ni ≥ γ. Again, the drawback of such strategy isthat a certain amount of information available in the data is simply discarded, which isa grossly suboptimal approach to the problem.

3.3 Empirical Piecewise-linear Weighted Ratio (EPWR)

We have seen that both SEMR and SEGR suffer respectively from small and large users —the user size refers to its traffic volume ni — and that a simplistic workaround would beto just discard the measurements associated with the smallest or biggest users for SEMR

and SEGR respectively. This approach involves a certain loss of information. Moreover,the decimation of samples works against the goal of reducing the uncertainty of theestimate. Therefore we are set to find a more clever strategy: instead of selectively dis-carding measurement samples based on their size, one can simply weight them differ-ently. Following this idea, in a previous work [1] we have derived a simple sub-optimalestimator that takes the form of a weighted average of the ri’s:

SEPWRdef=

I∑i=1

wiri (4)

with piecewise-linear weigths given by:

wi =xi∑Ii=1 xi

, xi = min(ni, θ) ∀i (5)

Such definition involves a single parameter θ that represents a sort of “cut-off” point di-viding the users into two regions: those with ni < θ are weighted proportionally to theirsize ni, while those with ni > θ are weigthed equally — proportionally to θ. The anal-ysis reveals that the optimal value of the cut-off parameter is given by θ = a−a−σ2

a

σ2a

, i.e.

5

Page 6: Bayesian Estimation of Network-Wide Mean Failure ...dl.ifip.org/db/conf/ifip6-3/perform2010/ColucciaRR10.pdf · Bayesian Estimation of Network-wide Mean Failure Probability in 3G

it depends on the first two moments of the distribution of a. Notably the optimal settingdoes not depend on n, which is an advantage in applications where n varies in time.Unfortunately, in practice the optimal value of θ cannot be identified since a and σ2

a areunknown. However it can be shown that the performances of SEPWR depend only weaklyon the exact value of θ, provided that it falls in a “reasonable” intermediate range awayfrom extreme values (very small or very large), indicating that an heuristically fixedvalue for θ would be sufficient to achieve near-optimal performance in most practicalcases. This claim, supported by simulation results in [1], will be further confirmed bythe numerical results presented below in §5.

4 Bayesian estimators

Since a is regarded as a random vector, we can apply Bayesian techniques, which pro-vide a theoretically well grounded approach to the estimation problem at the cost ofsomewhat higher complexity of the resolution procedure. In this section we discussdifferent possible approaches under the common framework of Bayesian inference.

The structure of our problem is that of a Bayesian hierarchical model [5]: the datam are described by a probability distribution whose parameters a are random variablesthemselves. The Bayesian approach requires to provide the a priori distribution p(a),which is unknown in our case. If p(a) were known, the optimal estimator for a would beobtained by following the classical Bayes’ procedure, i.e. minimizing a Bayes risk. Twoof the most common choices for the Bayes risk are the MAP (Maximum A Posteriori)criterion and the MMSE (Minimum Mean Square Error) criterion. In both cases the keyelement is the posterior distribution p(a|m), expressed by the Bayes’ Theorem as:

p(a|m) =p(m,a)p(m)

=p(m,a)∫p(m,a) da

where p(m,a) = p(m|a)p(a).When p(a) is unknown, as in our case, it is possible to choose a parametric distri-

bution family with unknown parameters for the prior, and to resort to a procedure calledEmpirical Parametric Bayes [5]. The basic idea is quite simple: since the parametersof the prior distribution — called hyperparameters — are unknown, their estimates areused instead. It is then possible to derive a Bayesian estimator (MAP or MMSE) inthe usual way. The estimation of the hyperparameters is preferably obtained via Maxi-mum Likelihood (ML), although other techniques are sometimes used — e.g. Methodof Moments [5].

In some applications it is not clear which family of distributions should be adopted.In such cases, a well-established practice is to use the so-called conjugate prior, i.e. theprior distribution p(a) that results in a posterior distribution p(a|m) belonging to thesame family. In our case, the conjugate prior corresponding to the Binomial distributionis the Beta distribution, with support in [0, 1], defined as:

p(ai) =1

B(α, β)aα−1i (1− ai)β−1 i = 1, . . . , I (6)

6

Page 7: Bayesian Estimation of Network-Wide Mean Failure ...dl.ifip.org/db/conf/ifip6-3/perform2010/ColucciaRR10.pdf · Bayesian Estimation of Network-wide Mean Failure Probability in 3G

where

B(x, y) =∫ 1

0

tx−1(1− t)y−1 dt (7)

is the Beta function [10]. The Beta distribution is quite general and flexible: it has twopositive shape parameters α, β which allow for a great variety of shapes, ranging fromuniform to U-shape, convex or concave, symmetric or skewed. Moreover, consider thatBayesian estimation is usually found to be robust against deviations from the idealchoice of the prior distribution family. The mean of a Beta distribution is given by

X ∼ Beta(α, β)⇒ E [X] =α

α+ β(8)

The ML estimates of the hyperparameters are given by:

α, βdef= arg max

α,βp(m|α, β) (9)

where the marginal distribution p(m|α, β) is obtained by marginalization:

p(m|α, β) =∫ 1

0

· · ·∫ 1

0

p(m,a|α, β) da1 · · · daI . (10)

Due to independence, we can factorize the joint density function as follows:

p(m,a|α, β) =I∏i=1

p(mi, ai|α, β) =I∏i=1

p(mi|ai, α, β)p(ai)

=I∏i=1

(nimi

)amii (1− ai)ni−mi · 1

B(α, β)aα−1i (1− ai)β−1

and recalling eq. (7) we can rewrite eq. (10) as follows:

p(m|α, β) =∫ 1

0

· · ·∫ 1

0

I∏i=1

p(mi, ai|α, β) dai =I∏i=1

(nimi

)B(α+mi, β + ni −mi)

B(α, β)(11)

i.e., the marginal distribution of m is the product of I Beta-Binomial distributionsBetaBin(ni, α, β). This expression can be used in eq. (9) to obtain the ML estimatesof the hyperparameters. The solution cannot be expressed in closed-form and must beobtained numerically. Notably, the function (11) is unimodal (see [13] for a proof) andits negative logarithm is convex — a consequence of the log-concavity of the likelihoodfunction for Beta [12] [8] — therefore standard numerical methods can be applied (e.g.the simplex method) which are likely to locate the optimum quickly and accurately.

Given the estimates of the hyperparameters, the posterior distribution

p(ai|mi, α, β) =p(mi, ai|α, β)

p(mi|α, β)

7

Page 8: Bayesian Estimation of Network-Wide Mean Failure ...dl.ifip.org/db/conf/ifip6-3/perform2010/ColucciaRR10.pdf · Bayesian Estimation of Network-wide Mean Failure Probability in 3G

is obtained in a similar way, and as expected is a Beta distribution (conjugate prior):

ai|mi, α, β ∼ Beta(α+mi, β + ni −mi) (12)

The (empirical) Bayes estimator for the generic ai is then derived in the classical wayby minimizing a Bayes risk. Two of the most common criteria are MAP and MMSE.

The MAP estimator maximizes the a posteriori probability:

aMAPi

def= arg maxai

p(ai|mi, α, β) =α+mi − 1

α+ β + ni − 2(13)

for i = 1 . . . I , as can be easily verified by taking the derivatives of log p(ai|mi, α, β)(ref. eq. (6) and (12)). From the denominator of eq. (13) we observe that the MAPestimator may lead to inconsistent values for small ni. This is a problem in networkapplications, where typically a considerable fraction of users generate only very fewrequests. Therefore the MAP estimator is not well suited for our purposes.

The MMSE estimator minimizes the mean squared error, and coincides with theconditional mean [5]. Recalling eq. (8) we can write:

aMMSEi

def= E[ai

∣∣∣mi, α, β]

=α+mi

α+ β + ni

for i = 1 . . . I , and by taking the arithmetic mean we obtain the following estimator:

QMMSEdef=

1I

I∑i=1

aMMSEi =

1I

I∑i=1

α+mi

α+ β + ni. (14)

Otherwise, an alternative approach is to estimate directly a from eq. (8) with theML estimates of the hyperparameters:

QHYPdef=

α

α+ β(15)

where the subscript “HYP” indicates that the estimator is obtained solely from the hy-perparameters’ estimates, without the posterior information m.

Note that in both cases the α, β are estimated from the data vector m,n: the dif-ference between the two estimators lies in the fact that QMMSE is suboptimal — becauseit heuristically adopts an arithmetic mean — but uses the posterior information m forboth parameters and hyperparameters, while QHYP is optimal in the ML sense but usesthe posterior information only for estimating the hyperparameters and not a. Despitesuch difference, we found that QMMSE and QHYP always lead to extremely similar valueswhen applied to our real datasets, as discussed later in §5.

Finally we remark that both estimators require a numerical procedure to be com-puted. However the shape of the objective function (unimodal, convex) allows the useof fast and accurate numerical methods — for the sake of space we do not providehere further details of the implemented numerical procedure. Nonetheless, the compu-tational gap with SEPWR remains large: in a MATLAB simulation with I = 105 users,the computation time of QMMSE and QHYP is about 10 seconds on a standard computer(Core2 Duo 2Ghz), against a few milliseconds of SEPWR.

8

Page 9: Bayesian Estimation of Network-Wide Mean Failure ...dl.ifip.org/db/conf/ifip6-3/perform2010/ColucciaRR10.pdf · Bayesian Estimation of Network-wide Mean Failure Probability in 3G

GGSN 

Monitoring Probe

Radio Access Network (RAN) 

Core Network (CN) 

Fig. 1. Reference measurement scenario.

5 Numerical Results from Real Datasets

In the previous sections we have presented three sub-optimal estimators, very simpleto implement and to compute — SEGR, SEMR and SEPWR — and derived two Bayesianestimators —QHYP andQMMSE — which are more complex but optimal given the problemmodel at hand. Hereafter we compare their performance on a real dataset.

Our dataset is based on measurements collected from an operational 3G cellularnetwork by the METAWIN system [7]. The measurement setting is sketched in Fig. 1:a passive monitor located on the Gn interface near the GGSN observes the TCP trafficin both directions (for more details on 3GPP network architecture refer e.g. to [2]). Weaim at revealing congestion and/or other performance glitches in the network sectionbetween the monitoring point and the mobile terminals from the observation of TCPhandshaking packets between mobile clients and Internet-side servers. To this purposewe collect two datasets: DATA:INV and DATA:RTT.

In DATA:INV we count for each mobile station i all SYNACK packets flowing indownlink (variable ni) as well as the number of them which failed to be unambiguouslyassociated to a corresponding uplink ACK (variablemi). Such definition includes thosecases where either the SYNACK or the corresponding ACK were lost in the networksection from the monitoring point to the mobile terminal, but also other cases not nec-essarily related to loss events. For example, when two or more identical SYNACKs areobserved but only one ACK (between the same end-points) the SYNACK-to-ACK as-sociation remains ambiguous, i.e. it is not possible to decide which one of the SYNACKtriggered the ACK3. In other words, the SYNACK packet in downlink denotes a “re-quest” event, and the presence [resp. lack] of an unambigously corresponding ACKpacket in uplink denotes the “success” [resp. “failure”] event.

A second dataset DATA:RTT was obtained from the (semi-)RTT measurements: foreach correctly (and unambigously) acknowledged SYNACKs, we measure the client-side RTT, i.e. the elapsed time between the timestamps of the SYNACK and the cor-responding ACK. For this dataset, we mark a “failure” event when the RTT exceeds a

3 In [3] we showed that such cases are not infrequent in GPRS/UMTS networks due to thepresence of “early retransmitter” servers, with initial retransmission timeout set to sub-secondvalues in the same order of the Round Trip Time (RTT) in these networks.

9

Page 10: Bayesian Estimation of Network-Wide Mean Failure ...dl.ifip.org/db/conf/ifip6-3/perform2010/ColucciaRR10.pdf · Bayesian Estimation of Network-wide Mean Failure Probability in 3G

0.02

0.03

0.04

0.05

0.06

0.07

0 24 48 72 96 120 144

EGR

Hours after Day 1, 00:00

EGR

0.02

0.03

0.04

0.05

0.06

0.07

0 24 48 72 96 120 144

EMR

Hours after Day 1, 00:00

EMR

0.02

0.03

0.04

0.05

0.06

0.07

0 24 48 72 96 120 144

EPW

R

Hours after Day 1, 00:00

EPWR

0.02

0.03

0.04

0.05

0.06

0.07

0 24 48 72 96 120 144

MM

SE

Hours after Day 1, 00:00

MMSE

Fig. 2. Estimated mean failure probability for DATA:INV dataset (missing or ambiguousSYNACK/ACK associations).

fixed threshold T — we set T = 0.5 sec and T = 1.5 sec respectively for UMTS/HSPAand GPRS/EDGE. For more details about the measurement setting see [3].

The rationale for extracting such measurements is that congestion in the downlinkpath towards the terminals (e.g. at some SGSN or RNC) would expectedly result in anincrease of downlink packet loss and/or delay, and therefore should be reflected in aincrease of the “failure” rate in DATA:RTT and/or DATA:INV. Note that serious con-nectivity problems in the Radio Access Network might impede even the transmissionof uplink SYN packets, which would translate into a reduction of SYNACK — i.e.we would observe missing rather than unacknowledged SYNACKs. While in principlesuch kind of events could be revealed by monitoring the absolute number of SYNs andcorresponding SYNACKs, this aspect is left outside the scope of this work.

The measurements are binned in 5 minutes intervals. For this work we consider ameasurement period of one week collected in August 2010. The analysis was conductedseparately for UMTS/HSPA and GPRS/EDGE users, but for the sake of space we reportonly results for UMTS/HSPA. The time-series computed with different estimators arereported in Fig. 2 and Fig. 3 respectively for DATA:INV and DATA:RTT datasets. Wefound that QHYP and QMMSE always lead to almost identical results, with only negligibledifferences in a few timebins — for this reason and for the sake of space we report onlythe time-series of QMMSE and skip the graphs for QHYP.

10

Page 11: Bayesian Estimation of Network-Wide Mean Failure ...dl.ifip.org/db/conf/ifip6-3/perform2010/ColucciaRR10.pdf · Bayesian Estimation of Network-wide Mean Failure Probability in 3G

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0 24 48 72 96 120 144

EGR

Hours after Day 1, 00:00

EGR

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0 24 48 72 96 120 144

EMR

Hours after Day 1, 00:00

EMR

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0 24 48 72 96 120 144

EPW

R

Hours after Day 1, 00:00

EPWR

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0 24 48 72 96 120 144

MM

SE

Hours after Day 1, 00:00

MMSE

Fig. 3. Estimated mean failure probability for DATA:RTT dataset (unambiguous SYNACK/ACKpairs with semi-RTT exceeding 500 ms).

From Fig. 2 and Fig. 3 we first observe that SEGR exhibits larger fluctuations (highervariance) and occasionally large spikes due to the sporadic presence of heavy users(high ni) with many failures4 (high mi). Second, both SEMR and SEPWR perform quiteclose to the optimal reference QMMSE. To dig further, we resorted to the inspection ofthe scatterplots 〈SEMR, QMMSE〉 and 〈SEPWR, QMMSE〉 (not shown here) which revealed thatSEPWR correlates slightly better than SEMR to the reference QMMSE values: in DATA:INVthe Pearson the correlation coefficient is 0.997 and 0.954 respectively for 〈SEMR, QMMSE〉and 〈SEPWR, QMMSE〉, while in DATA:RTT SEMR exhibits a larger bias.

6 Conclusions

So called Key Performance Indicators (KPI) play an important role in the operationof real mobile networks, as they provide a synthetic view of the network-wide statusand quality. A large class of network events can be modeled as binary REQUESTS as-sociated to USERS (e.g. origin or destination entity) and having one of two possibleoutcomes, i.e. SUCCESS or FAILURE. For these, a very popular KPI is the ratio be-tween the total number of failures to the total number of requests, regardless of per-userassociations. We have shown that such simplistic KPI — referred to as EGR in this

4 The “spikes” in DATA:INV are due to mobile terminals receiving very high rate of SYNACKsto which they do not respond: they are likely involved in TCP scanning or SYN flooding.

11

Page 12: Bayesian Estimation of Network-Wide Mean Failure ...dl.ifip.org/db/conf/ifip6-3/perform2010/ColucciaRR10.pdf · Bayesian Estimation of Network-wide Mean Failure Probability in 3G

work — suffers from the presence heavy-users. The problem is of practical relevancein real networks, where the distribution of requests across users is often heavy-tailed.In some cases, the variability of EGR makes it useless for any practical exploitation.

To overcome the limitations of EGR, network operators should adopt more robustKPIs based on separate counts of success and failures per individual users. The problemis then how to make the best possible use of such data.

We have introduced a system model that motivates the adoption of the Mean FailureProbability (MFP) as a natural KPI. Since MFP is unknown, it must be estimated fromthe observed data. We have shown that EGR can be considered an unbiased estimatorfor MFP, but not the optimal one. In a previous work we had derived a more robustestimator, namely the EPWR, very simple to implement, that involves a free parame-ter to be tuned heuristically. In this paper we have formalized the problem in terms ofBayesian estimation, deriving two Bayesian estimators which are provably optimal (intwo different senses) given the system model. The analysis of two real datasets from anoperational 3G mobile network has confirmed that all the proposed estimators are con-siderably more stable than EGR, and that EPWR performs very closely to the optimalreference provided by the Bayesian solution.

Our estimators can be adopted by network operators and/or equipment vendors asrobust KPI. Owing to the generality of the system model, and to the abstract definitionof the notions of “request”, “failure” and “user” therein, the concepts and estimatorsproposed in this work can be applied to a wide range of different measurements, incommunication networks and other application domains.

References1. A. COLUCCIA, F. RICCIATO, P. ROMIRER: On Robust Estimation of Network-wide Packet

Loss in 3G Cellular Networks, 5th IEEE Broadband Wireless Access Workshop (BWA’09),Honolulu, Nov. 2009.

2. H. KAARANEN et al.: UMTS Networks — Architecture, Mobility and Services, 2nd ed.,Wiley, 2005.

3. PETER ROMIRER et al. Network-Wide Measurements of TCP RTT in 3G, Proc. of TMA’09workshop, Aachen, LNCS vol. 5537, May 2009

4. A. D’ALCONZO, A. COLUCCIA, F. RICCIATO, P. ROMIRER: A Distribution-Based Ap-proach to Anomaly Detection for 3G Mobile Network, IEEE GLOBECOM ’09

5. E.L. LEHMANN, G. CASELLA: Theory of Point Estimation, Springer, 19986. K.M. WOLTER: Introduction to Variance Estimation, Springer Series in Statistics, 2007.7. METAWIN and DARWIN projects http://userver.ftw.at/˜ricciato/darwin/8. T. P. MINKA: Estimating a Dirichlet distribution, Microsoft Technical Report, 20039. H. ROBBINS: An Empirical Bayes Approach to Statistics, Proc. Third Berkeley Symposium

on Mathematical Statistics and Probability, vol. 1, Univ. of California Press, 1956, 157-163.10. M. ABRAMOWITZ, I. A. STEGUN: Handbook of Mathematical Functions with Formulas,

Graphs, and Mathematical Tables, Dover Publications, New York, 197211. A. YEREDOR: The Joint MAP-ML Criterion and its Relation to ML and to Extended Least-

Squares, IEEE Trans. on Signal Processing, vol. 48, no. 12, Dec. 200012. E. I. GEORGE, U. E. MAKOV, A. F. M. SMITH: Conjugate Likelihood Distributions, Scan-

dinavian Journal of Statistics, vol. 20, no. 2, 1993, pp. 147–15613. B. KEVIN. J. REEDS: Compound Multinomial Likelihood functions are unimodal: proof of

a conjecture of I. J. Good, The Annals of Statistics, vol. 5, no. 1, 1977, pp. 79–87.

12