security metrics and security investment - university … metrics and security investment ... or...

Security Metrics and Security Investment∗

Rainer Bohme Tyler Moore

September 9, 2013

Security investment models, the academic term for finding the right securitybudget, is the oldest problem studied at the intersection between informationsecurity and economics, or more precisely information security and management.Quite naturally, it is deeply entangled with the question of measuring security inquantitative terms. Both problems have attracted the attention of researchersfrom accounting, business administration, economics, and computer science.

This chapter develops an exemplary security investment model which con-tains most relevant features from the models in literature, but is intentionallykept as simple as possible. The core model and its notation will be reused inlater chapters when security investment by individual parties is part of largereconomic models. The chapter is structured in two main sections. Section 1takes the accounting perspective which is agnostic about technology and re-duces it to monetary amounts. This perspective is most useful when defendinga security budget in front of decision makers who have otherwise incomparableprojects competing for the same budget. Section 2 specializes general invest-ment models to the specifics of the domain. This perspective is most usefulwhen allocating a given security budget on several alternative technical or or-ganizational protection measures.

1 How Much to Invest?

Security investment is at the core of information security management. Manage-ment refers to decision making within an organization. Following the standardmodel of management science, the decision makers are assumed to make rationaldecisions in their organization’s best interest. The standard model of informa-tion security considers lawful organizations, which consequently find themselvesin the position of the defender. Therefore, security investment models aim tohelp a rational defender to optimize her protection efforts with regard to theorganization’s overall objective function.

The need for protection implies that the organization operates in an adver-sarial environment. So the next question is how to define the threat model. Wefollow the majority of security investment literature and assume an exogenous

∗Please note that this is a working draft. Feedback is much appreciated. Please [email protected] with any comments.

1

attacker whose behavior is modeled by a fixed probabilistic rule known to thedefender. Note that this simplification neglects the key aspect of our securitynotion, namely that we defend against intelligent malice acting strategically.With this simplification, there is no difference between security investment andinvestments for safety, where probabilistic rules are more realistic (cf. Chap-ter ??, Section ??). Nevertheless, a probabilistic rule seems to be a reasonableapproximation for many organizations exposed to indiscriminate attackers whodo not bother to adjust their behavior in response to an individual organiza-tion’s actions. The assumption of exogenous attacks is clearly less appropriatefor larger organizations threatened by targeted attackers. Hence, we shall chal-lenge this assumption in Chapter ?? when the method of game theory has beenintroduced.

The broader context of security investment is decision theory, the scien-tific approach to compare different alternatives and choose the most favorableone. This raises the need to quantify security and ideally map it to a mon-etary scale. While, in principle, measuring security levels on an ordinal scaleis sufficient to establish comparisons, the fact that decisions are taken underuncertainty (accounting for realizations of attacks) requires cardinal scales todo basic probability calculus. Monetary scales are needed to anchor securityinvestment options against alternatives that are unrelated to security.

Investment theory distinguishes between an ex ante and an ex post perspec-tive. The ex ante perspective is relevant for planning. It compares hypotheticalinvestment alternatives to select the best action. The ex post perspective isrelevant for controlling. It reviews actions taken in the past to identify po-tential decision errors (and avoid repeating them), or to analyze the decisionmaking process. Both perspectives apply to information security, so we pointout differences where needed.

Like for any other investment decision, it is useful to decompose the contri-bution of security investments into two components, cost of security and benefitof security. If both are measured on monetary scales, it is easy to calculate thebalance (benefit minus cost) to conclude on whether a security investment isworthwhile. However, it is far more difficult to obtain reliable figures for costand benefit. Therefore, we devote separate sections.

1.1 Cost of Security

The high-level view on cost of security is to define a choice variable that connectsspending to realized security.

Definition 1. (Cost of security, security level) The cost of security c is theamount spent to reach a security level s. No security investment (c = 0) impliess = 0, and for any c > 0, s increases monotonically in c.

The security level is an abstract concept of a latent (i.e., not directly observ-able) metric summarizing the security achieved by a bundle of diverse protectionmeasures. Because for most organizations, the number of possible security mea-

2

sures is large and the contribution to overall security of every measure is small,it seems justified to regard the security level s as a continuous variable.

The monotonicity assumption in Definition 1 assumes only a weak notionof effectiveness for security investments (spending more cannot make thingsworse). It is often convenient to strengthen this assumption by requiring thatsecurity investment is always effective. Then, the security level can be expressedon the same scale as costs, thereby eliminating one variable from the model.

Definition 2. (Effective security investment) If security investment is effective,the security level can be approximated by the cost of security, i.e., s ≈ c.

Of course, this simplification is permissible only if the effectiveness of se-curity investments stands out of question. In practice, we often observe thecontrary, leading to the paradox where organizations overspend on the costside, but underinvest in the actual security they get for their money. Therefore,it is important to clarify the perspective in debates, and to keep in mind thatequating both quantities is a simplification only justified in the absence of moreprecise information.

Note that cost of security is a deterministic variable in principle. Firms spenda particular amount of money on security. Consequently, unlike other quantitieson the benefit side, the cost of security does not depend on the randomness usedto model attacker behavior. However, measurement errors such as noise anddemarcation issues from other costs may prevent us from determining c exactlyin a given organization. In the following we discuss relevant breakdowns of thecost of security.

1.1.1 Direct vs indirect

Summing up the expenses for the acquisition, deployment, and maintenance ofsecurity technology gives a lower bound of the cost of security. Yet this re-flects only the direct cost. Some security measures have non-negligible indirectcost, such as time lost due to forgotten passwords after mandatory changes, theinconvenience of transferring data between security zones, or incompatibilitiesof security mechanisms slowing down business processes. Moreover, if secu-rity measures enforce confidentiality in an organization, some business decisionsmight have to be taken less informed and reach suboptimal outcomes comparedto the case with full information. All these opportunity costs are indirect costsand add to the total cost of security.

1.1.2 Fixed vs variable

It is sometimes useful to express the cost of security as a function of the sizeof an organization’s core business. Fixed costs are independent of the activityin the core business whereas variable costs grow monotonically (but not alwaysproportionally) with the size of the business. While it is often sufficient toassume fixed cost of security, the cost of distributing security tokens to customersor indirect costs because of delayed business processes are clearly variable costs.

3

Furthermore, as a practical matter, many security budgets are allocated as aproportion of the IT budget, which is typically tied to a firm’s size.

1.1.3 One-off vs recurring

If a security investment amortizes over a time horizon with multiple periods, onecan distinguish one-off from recurring (i.e., per-period) costs of security. Theacquisition and deployment of protection measures is typically a one-off cost,whereas maintenance and most indirect costs are recurring. Whenever costs aredistributed over several periods, effects of time-dependent discounting and non-linearities due to taxation can be considered. This is common practice in generalinvestment theory. However, given the pace of development and the short-termnature of most security investments, the errors introduced by ignoring thesefactors seem small compared to other sources of uncertainty and do not alwaysjustify complicating the analysis further.

1.1.4 Sunk vs recoverable

Again for the case of multiple periods, it can be useful to consider sunk costs,which cannot be recovered when decommissioning protection measures. Mostsecurity equipment (e.g., firewall devices) can be sold (at a discount) or repur-posed (e.g., as routers), and staff transferred or fired. But other expenses, e.g.,for training, configuration or the distribution of security tokens to customersare sunk.

All these categories can be combined, leading to a complicated cost matrix.Depending on the size of the investment and the information available, this levelof detail may be justified in practice. The models in this book remain simpler toallow a clear view on the core mechanism under examination. Unless otherwisestated, c is the total cost of security, which is assumed to be fixed. Since weconsider one period only, the distinctions regarding the time horizon do notmatter.

1.2 Benefit of Security

The benefit of security is given by a reduction of losses incurred in the absenceof security. Recall that losses due to lack of security are not deterministic.They depend on the state of the world, notably attacker behavior, which is notobservable at the time when a security investment decision is made. In thissense, security investment means spending a fixed amount at present in theanticipation of uncertain future savings.

We need some probability calculus to formally deal with the uncertaintyabout realized losses.

Definition 3. (Loss distribution function) Let Ls : R+ → [0, 1] be the family ofprobability distribution functions describing the monetary losses incurred frominsecurity for a given security level s.

4

Using the anchor of Definition 1, L0 is the loss distribution in the absenceof security.

1.2.1 General treatment

The benefit of security is given by the difference between L0 and Ls. However,as the loss is a random variable in both conditions, it is more convenient touse moment statistics as summaries. The expected value (first moment) isappropriate for risk-neutral decision makers. This leads us to the notion of lossexpectancy.

Definition 4. (ALE) The annual loss expectancy ALEs is the expected loss perperiod due to information security failures given security level s,

ALEs = E(Ls) =

∫ ∞0

x · Ls(x) dx . (1)

Note that the predicate annual in ALE suggests a multi-period view wheretime is discretized to units of one year, a typical planning horizon. We disregardthis restriction to maintain generality, but we also prefer to use an establishedterm, even if it is strictly inaccurate. In single-period settings, the reference toa period is irrelevant anyways.

Now it is straightforward to define metrics for the benefit of security.

Definition 5. (EBIS) The expected benefit of information security EBISs is thedifference between the loss expectancy without security and the loss expectancygiven security level s,

EBISs = ALE0 −ALEs (2)

= E(L0)− E(Ls) =

∫ ∞0

x · (L0(x)− Ls(x)) dx. (3)

EBIS quantifies the benefit of security without considering the cost to reachsecurity level s. This is resolved in the next metric, which describes the netbenefit.

Definition 6. (ENBIS) The expected net benefit of information security in-vestment ENBISs is given by the expected benefit of information security minusthe cost of the investment to reach security level s.

ENBISs = EBISs − c = ALE0 −ALEs − c, (4)

or, assuming effective security investment (Def. 2),

ENBISs = EBISs − s. (5)

5

A simple ex ante security investment rule is to require that ENBIS of aplanned security measure is positive.

Before we advance to additional investment rules in Section 1.3, let us maketwo short detours. The first one introduces a common simplification and thesecond one discusses the canonical extension for decision makers who are notrisk neutral.

1.2.2 Bernoulli loss assumption

Dealing with families of continuous loss distribution functions L can be in-convenient and it does not seem justified in analytical models where we lackinformation (or defendable assumptions) about the shape of the loss distribu-tion. As a remedy, the support of L can be reduced to two elements {0, λ}, sothat λ > 0 is a fixed loss amount incurred with probability ps = Ls(λ). Withprobability 1− ps = Ls(0), the organization suffers no loss.

Then, it is convenient to set λ = 1 and rescale all monetary quantities inan analytical model to the unit loss. This turns the loss distribution into aBernoulli random variable with a single parameter ps, ps < p0, and it simplifiesthe expressions for ALE, EBIS, and ENBIS considerably:

ALEs = ps, (6)

EBISs = p0 − ps, and (7)

ENBISs = p0 − ps − s, (8)

where (ps, s) in the last equation is rescaled to the unit loss, hence s < 1 is anecessary condition for a strictly positive ENBIS.

1.2.3 Incorporating attitude towards risk

Following the convention in economics, we model different attitudes towards riskwith a non-linear transformation of monetary outcomes to a utility space. Riskaversion is reflected in a concave utility function U : R → R, U ′ ≥ 0, U ′′ ≤ 0,whereas risk-seeking requires a convex utility function, U ′′ ≥ 0 (see Chapter ??,Section ??).

One important caveat is the order in which expectation operators and util-ity transformations are applied. Rational decision makers maximize expectedutility. For non-neutral attitudes towards risk, this expectation has to be takenover all random variable in the utility space. Therefore, it is not possible to cal-culate summary measures over random variable, such as ALE, and then applythe utility function. Instead, we have to decompose the calculations and therebyalso lose the convenience to rearrange cost and benefit side independently.

Starting with the general treatment of ENBIS in Eq. (4),

ENBISs = ALE0 −ALEs − c = E(L0)− E(Ls)− c (9)

=

∫ ∞0

x · L0(x) dx−∫ ∞0

x · Ls(x) dx− c, (10)

6

we pull the cost of security inside the expression for the loss given security levels,

=

∫ ∞0

x · L0(x) dx−∫ ∞0

(x+ c) · Ls(x) dx, (11)

and can now apply the utility transformation to obtain a risk-adjusted variantof ENBIS, which we call ENUBIS (expected net utility benefit of informationsecurity):

ENUBISs = −∫ ∞0

U(−x) · L0(x) dx︸︷︷︸expected utility without

security investment

+

∫ ∞0

U(−x− c) · Ls(x) dx︸︷︷︸expected utility with

security investment

. (12)

The sign flips are necessary because the summands on the right hand side ofEq. (11) are losses, but the utility function expects positive outcomes as argu-ment and it is not sign invariant. Depending on the class of utility function inuse, an additive offset for the organizations’ initial wealth can be added. Fornotational clarity we assume that this offset is included in U . In this generality,Eq. (12) cannot be simplified further.

Here it becomes evident that simplifying assumptions like the Bernoulliloss assumption (Sect. 1.2.2) are handy to keep things tractable. A utility-transformed version of the simplified ENBIS for a unit loss in Eq. (8) followsfrom simple algebra:

ENBISs = p0 − ps − s (13)

=(p0 · λ+ (1− p0) · 0

)︸︷︷︸E(L0)

−(ps · λ+ (1− ps) · 0

)︸︷︷︸E(Ls)

−s (14)

(Observe that the integrals of Eq. (10) simplify to sums, which we can expand.)

=(p0 · 1 + (1− p0) · 0

)−(ps · (1 + s) + (1− ps) · s

). (15)

ENUBISs = −(p0U(−1) + (1− p0)U(0)

)+(psU(−1− s) + (1− ps)U(−s)

).

(16)

Only four values of the utility function are needed to calculate this expressionfor any probability of attack. We shall build on this model in Chapter ?? wherewe study the interplay between security investment and cyber-insurance.

1.3 High-level Investment Metrics

High-level investment metrics connect the cost of security with benefit of se-curity. The expected net benefit of information security investment, ENBIS,already included variables from both sides. However, accountants never seem

7

to tire of inventing new metrics, and therefore it is useful to know the most pop-ular ones. All metrics have in common that they draw on general investmenttheory and therefore may be understood by decision makers beyond the securitydomain. In this sense, the metrics presented here can be viewed primarily ascommunication tools for security engineers when talking to businesspeople.

Definition 7. (ROSI) The return on information security investment ROSIsis the ratio of the expected net benefit over the cost of security,

ROSIs =ENBISs

c=

ALE0 −ALEs − cc

(17)

It is common to report ROSI in percentage terms (of security spending).

The normalization of ENBIS in ROSI allows to compare the efficiency of se-curity investments independent of the scale of the investment. This is useful forcomparisons between heterogeneous divisions or organizations. Note that ROSIis not defined in cases where the rational decision is to refrain from securityinvestments (risk acceptance, see below), because the denominator is zero inthis case. Looking at ROSI alone can also be misleading if very cheap securitymeasures that mitigate some risk are compared to costly and more comprehen-sive measures. In this case, the cheap alternative can appear more attractiveaccording to ROSI, but the metric hides that substantial portions of the riskremain unmitigated.

The following two high-level metrics aim to capture medium and long-termamortization of security measures and therefore require to look at multiple pe-riods. Costs and benefits are indexed by discrete time steps t ∈ {0, 1, 2, . . . },where by convention t = 0 is the point in time of the initial investment; t = 1accumulates all recurring costs and prevented losses in the first period after theinvestment, and so forth.

The idea behind the net present value is to discount future costs and benefits.

Definition 8. (NPV) The net present value NPVs aggregates the expectednet benefit of information security over multiple future periods into a monetaryequivalent at present,

NPVs = −c0 +

∞∑t=1

ALE0,t −ALEs,t − ct(1 + r)t

, (18)

where

• c0 is the one-off cost of security at t = 0,

• ct are recurring costs of security in period t (if any),

• ALEs,t is the loss expectancy for period t given security level s, and

• r is the discount rate.

8

Observe that Eq. (18) sums over an infinite time horizon. To avoid dealingwith the asymptotes in practice, the sum can be cut off after the last period tmax

in which costs or benefits occur, or if the exponential growth of the denominatorrenders additional terms negligible.

The discount rate r > 0 is set to the rate of return of an alternative invest-ment of similar risk. Therefore, the discount must be interpreted as opportunitycost of capital, i.e., how much return on capital (e.g., interest) an organiza-tion forgoes by spending money on security measures rather than investing itelsewhere. Unlike in other models with discount factor (e.g., certain types ofrepeated games), NPV does not account for increasing uncertainty about therealization of distant payoffs.

In practice, it may be difficult to find the right value for r, so many organiza-tions use ad hoc assumptions, for example 5 %. Security managers are advisedto compare their assumption with the ones made for alternative investmentprojects in particular if NPV is used to justify a security budget.

A different way to look at this is to solve Eq. (18) for r (in general, thisrequires numerical methods) and then use the discount rate for NPV equalszero as an indicator.

Definition 9. (IRR) The internal rate of return IRRs is the discount rate r∗

at which a decision maker using NPV as a sole criterion is indifferent betweenmaking the security investment or not, i.e., NPVs = 0.

A simple ex ante investment rule is to require that the IRR of a plannedsecurity measure is higher than the cost of capital. Like ROSI, the IRR met-ric suffers a disconnect from the actual size of the investment. Therefore itshould not be used as a sole criterion for prioritizing mutually exclusive securitymeasures.

Let us clarify the relation between NPV and IRR by example. Suppose anonline merchant dealing with a large amount of customer credit card numbersconsiders two security investment alternatives to better protect against datalosses. Option 1 is a technical data loss prevention (DLP) system; think of it asa filter that checks outgoing traffic for text sequences that might be credit cardnumbers and automatically blocks those messages. Option 2 is a regular securityawareness training for all staff potentially dealing with credit card numbers, tobe repeated on an annual basis as employees come and leave, or just forget.Table 1 summarizes the estimated cost and benefit for both options.

For convenience, we set tmax = 10 for all calculation. This is consistent with,say, management expecting the online merchant’s current business model to beviable for the next ten years. Let us first look at the expected net benefit if wenaively sum up costs and benefits over all of ten years:

Recall from Eq. (4) that

ENBISs = tmax · (ALE0 −ALEs − ct)− c0 . (19)

9

Table 1: Cost and benefit estimates for two hypothetical security investments

Security investment option

1. Data loss prevention 2. User training

Variable Est. Remark Est. Remark

c0 Initial investment 15 K License anddeployment

6 K Trainingmaterial

ct Recurring cost per year 1 K Maintenance,opportunity costof false positives

3 K Fee and lostwork time

ALE0 w/o security investment 5 K 20 K legal settlement, probability 25%

ALEs with security investment 2 K False negatives 1 K Residual risk(lapses etc.)

For option 1 (DLP), we obtain

ENBISs(1) = 10 · (5− 2− 1) K− 15 K = 5 K . (20)

For option 2 (user training), we obtain

ENBISs(2) = 10 · (5− 1− 3) K− 6 K = 4 K . (21)

Disregarding the timing of costs and benefits suggests that data loss preventionpromises a higher net benefit and therefore is the more attractive investmentoption. (We leave it to the reader to calculate ROSI metrics, but their messageis the same.)

The picture gets more complete if we calculate metrics that use the dis-tribution over multiple periods. Figure 1 shows the NPV for both investmentoptions as a function of the assumption for the discount rate r. Let us firstlook at what message NPV tells for a typical ad hoc assumption of r = 5%. Inthis case, both security investment options have positive NPV, meaning theyare worthwhile. However, in this example, the present value of training people(1.7 K) is much higher than buying technology (0.4 K). This is so because DLPhas high upfront costs which amortize over time, whereas the bulk of securityspending for training can be deferred and hence does not block valuable capitalat present. The chart (Fig. 1) shows that the NPV of DLP exceeds trainingonly if the cost of capital is very low (below 2%).

Finally, Figure 1 illustrates the connection between NPV and IRR at theintersections of the curves with the abscissa. The internal rate of return forDLP is estimated at 5.6%. We calculate this by solving for r when we set thenet present value of DLP to zero:

NPVs = −15K +

10∑t=1

5K − 2K − 1K

(1 + r)t= 0 (22)

10

Net

pre

sent

valu

e(N

PV

)

−2 K

−1 K

0 K

1 K

2 K

3 K

4 K

Discount rate r

1% 5% 10% 15%

Option 1: Data loss prevention

Option 2: User training

NPV

NPV

IRR IRR

Figure 1: Amortization of security investments: Net present value of two in-vestment options as a function of the discount rate assumption. (See Table 1for the numerical assumptions.)

It turns out that the IRR of 5.6% is just above the ad hoc assumption. Ifinstead, the assumption was replaced by something in the order of 10% to moreaccurately reflect the risk of this type of investment, the security manager wouldhave a harder time defending a DLP budget in front of the chief financial officer.By contrast, the training measure has an IRR of approximately 10.5%, suggest-ing that it would still be worthwhile if the organization had to borrow moneyat an interest rate of 10%. To summarize this in other words, although usertraining is more costly and promises lower net benefit than data loss prevention,if capital is not abundant, user training is the more attractive investment optionover a longer time horizon.

This completes the discussion of high-level investment metrics. We note inpassing that, unsurprisingly, more complicated approaches exist in the litera-ture. The one to remember is real option theory, the adaptation of financialoption pricing to real investments, more specifically security investments. Thesalient advantage of real options over all metrics discussed in this section is thatreal options do not prescribe a fixed sequence of costs and benefits over multipleperiods. Instead, they explicitly allow to account for the possibility to changeplans in future periods. In the context of plan changes, it also makes a differ-ence whether costs of security are sunk or recoverable. As flexibility is key in adynamic industry, security managers with serious budgets are recommended tofamiliarize themselves with real options.

11

1.4 The Gordon–Loeb Model

In this section we review one of the first, and without doubt most prominent,security investment models in the literature, proposed by Lawrence Gordon andMartin Loeb in 2002 [10]. For consistency we translate the original model toour notation and terminology.

So far, we have discussed the cost and benefit side of security investmentsindependently. For a given set of estimates and assumptions, we also knowhigh-level metrics to answer the question: should we invest or not? However,what budget planners really want to know is: how much to invest? To answerto this question, we need a model that relates security spending to benefits in afunctional form.

1.4.1 Breach probability function

Gordon and Loeb devise a single-period model. They use the Bernoulli lossassumption (Sec. 1.2.2) and define a continuous and twice differentiable breachprobability function S : R+× [0, 1]→ [0, 1], which maps the security investmentc and an exogenous vulnerability v ∈ [0, 1] to the probability p of incurring aloss of size λ.1 The vulnerability v normalizes the range of S(c, v), so that

• a (hypothetical) organization without vulnerability (v = 0) is exposed tono risk regardless of its security investment: p = S(c, 0) = 0 for all c; and

• the vulnerability determines the probability of loss of an organizationwhich does not invest in security: p = S(0, v) = v for all v.

Although the functional form of S is not further specified within these con-straints, Gordon and Loeb study two classes of breach probability functions,

SI(c, v) =v

(αc+ 1)β, (23)

which is linear in v, and a somewhat simpler form,

SII(c, v) = vαc+1 . (24)

The parameters α > 0 and β > 1 are parameters for the security productivity,that is, they measure how efficiently the security investment reduces the prob-ability of loss. Another way to interpret α ∈ (0, 1] is to think of it as a linearmodel for relating the cost of security c to the security level s. Then, we cansubstitute s = α · c. In analytical models, parameters α and β are typically setby assumptions, although it is often difficulty to justify them. Therefore, weshall reduce the number of parameters to one in the baseline models presentedin Sect. 1.5 and used throughout this book. If cross-sectional data are available,these parameters can be estimated or calibrated so that the resulting shape ofS has a good fit with the data.

1The original model includes one more variable which scales λ by the threat probability.We ignore this factor here because for risk-neutral decision makers, it does not add anythingto the analysis except for a linear scaling of two exogenous parameters (loss and threat).

12

0

Pro

bab

ilit

yof

lossp

1

v = 1

12

14

Security investment c

0 1 2 3 4 5

β = 54

β = 2

v = 12

Figure 2: Gordon–Loeb breach probability function SI(c, v) for α = 1: largervalues for the security productivity β translate to a more aggressive reductionof the probability of loss p.

13

Figure 2 plots the shape of SI as a function of c for different settings ofv and β. Observe that v anchors the probability of loss in the absence ofsecurity investment. In other words, SI(0, v) = v, which is reasonable sincethe exogenous probability of loss is defined for when no security investment hasbeen made.

The second parameter β, then, controls the rate at which the probability ofloss declines as the security investment c increases. Higher values of β reduceloss probability more aggressively as c increases. That is, higher β correspondsto more effective security investment. We fixed α = 1 in the figure, but it is aneasy exercise for the reader to imagine (or plot) its linear scaling effect alongthe abscissa.

1.4.2 Decreasing marginal returns

It is a simple exercise to express the security metrics of Section 1.2 in terms ofthe Gordon–Loeb model. For EBIS, cf. Eq. (3), we obtain,

EBIS = λ (v − S(c, v)) , (25)

leading to the following expression for the expected net benefit, cf. Eq. (4),

ENBIS = λ (v − S(c, v))− c . (26)

An important formal assumption about the shape of S is that for all v ∈ [0, 1],and all c > 0, S is strictly convex in c, i.e.,

δc S(c, v) < 0 and δcc S(c, v) > 0 . (27)

(A word on notation: δc is the first partial derivative with respect to c, andδcc is the second partial derivative with respect to c.)

As a consequence of Eq. (25), EBIS is strictly concave in c, as depicted inFigure 3. This concavity has significant implications for how the effectivenessand cost of security investment functions. It implies that as security invest-ment increases, the loss probability decreases (i.e., the organization is moresecure) and the expected benefit of information security increases. However,both change at decreasing rates.

To see this effect graphically, consider two example increments annotatedin Figure 3. A firm that has a low initial investment level c1 benefits muchmore from an additional security investment ∆c than does a firm with a higherinitial investment c2. In mathematical terms, ∆EBIS1 > ∆EBIS2. In economicterms, when ∆c = 1 is a unit increment, ∆EBIS is called the marginal return.Purchasing each ∆EBIS unit of additional protection benefit becomes moreexpensive the higher the baseline security investment already is. Due to theserising costs, we say that security investment has decreasing marginal returns.

In the Gordon–Loeb model, decreasing marginal returns emerge from as-sumptions about S. A natural question one might ask, then, is how realistic theassumption is in practice. We offer two arguments to defend the assumption.

14

0

λv


v

S(c, v)

EBIS

c1

∆c

c2

∆c

∆EBIS1

∆EBIS2

Figure 3: Decreasing marginal returns of security investment

15

The first argument justifies the concavity of expected benefits. Contrary towhat is presented in the investment model, in the real world, an organizationcannot acquire protection on a continuous scale. Instead, protection comesfrom many discrete countermeasures, each with a price tag and some expectedbenefit: a packet filter here, a password policy there, and so on. A rationaldefender with limited security budget would first implement the measures withthe best cost-benefit ratio. As a result, if one wants to increase the securityfurther, only the less efficient protection measures remain to be implemented.This way, decreasing marginal returns emerges naturally from prioritization ofdiscrete, independent and heterogeneous investment options.

The second argument justifies the convexity of costs. It is based on the obser-vation that protection measures are not independent of each other. Combiningseveral defenses is often more costly than the sum of their individual implemen-tation efforts. This has technical and managerial reasons. Technically, we haveto account for incompatibility (think of two desktop security suites fighting forthe same hook in the operating system). The effort required to test the interplaybetween components, such as protection measures, increases super-linearly withthe number of components. Moreover, people have to be in charge for maintain-ing and managing the protection measures. Organization science suggests thatmore and diverse measures add super-linear administrative costs, for instanceto pay a manager who directs several employees, each an expert for a particularprotection measure.

At the time of writing, the authors are not aware of empirical evidence sup-porting or challenging the hypothesis of decreasing marginal returns of securityinvestment, although it is possible that something has been reported in theliterature. We would be grateful for pointers in this regard.

1.4.3 Optimal security investment

If security investment is a continuous choice variable like in this model, securitymanagers might want to solve the model to find the optimal amount of securityinvestment in order to set (or defend) a security budget for the organization.

The property of decreasing marginal returns gives us a simple criterion tosolve the optimization problem and uniquely identify the security investment c∗

that maximizes the net benefit. A rational decision maker will keep increasingher security investment until the additional expected benefit equals the addi-tional cost. That is, if the marginal return equals one, because both cost andbenefit are measured on the same scale with a monetary unit. At this point,the decision maker is exactly indifferent between investing more or not.

Formally, we use the first-order condition (FOC),

δc EBIS(c∗) = 1 ⇔ c∗ = maxc

ENBIS(c) . (28)

The left hand side is the marginal benefit and the right hand side is the marginal

16

0

λv


EBIS

ENBIS = EBIS− c

c

c∗

maxc ENBIS

45◦

Figure 4: Optimal information security investment c∗

17

return. Now insert Eq. (25) and simplify:

δc(λ(v − S(c∗, v)

)= 1 (29)

δc(λv − λS(c∗, v)

)= 1 (30)

−λ δcS(c∗, v) = 1. (31)

For c∗ > 0, this condition maximizes Eq. (26) because EBIS is concave.Figure 4 visualizes this solution approach. It plots EBIS as a function of

cost c, along with c on the main diagonal. ENBIS, the difference between EBISand cost, has a unique maximum exactly at the point c∗ where the distancebetween EBIS and cost is maximal. This coincides with the gradient of EBIS(dashed line) being parallel to the cost diagonal, i.e., the first derivative is equalto one. Such a point can be found for any concave function with a gradient atzero greater than one.

If the gradient of EBIS at zero is one or lower, then the security productivityof the available protection measures is so bad that the organization is calledindefensible. In this case, the information security risk cannot be mitigated andone has to resort to other risk management tools (see Section 1.6 below andChapter ??).

1.4.4 A security-investment rule of thumb

After defining their model, Gordon and Loeb in [10] investigate its sensitivity tothe parameter v, the vulnerability of the organization (or information asset, intheir terminology). They note that it can be very hard to estimate v in practice,leaving substantial uncertainty about the validity of ex ante predictions fromtheir model. As a remedy, they propose a handy rule of thumb that is widelyindependent of the exact value of the unknown parameter v.

Definition 10. (Gordon–Loeb Rule): The optimal security investment c∗ isbounded from above by λ/e, where e is the base of the natural logarithm.

In plain English: never spend more than 37 % of your expected loss onsecurity.

The bound can be tightened by linear scaling if v is known to be lower than1. The Gordon–Loeb rule has initially been proven for the two families of breachprobability functions SI and SII, and its generalization was conjectured. Follow-up research [12, 19, 1] presented counter-examples and finally established preciseconditions for the result: if S is log-convex, then the λ/e rule follows from theLyapunov convexity theorem. It remains to be seen whether this mathematicalresult will be as appreciated by practitioners as the original Gordon-Loeb rulehas been.

1.5 Baseline Models

The Gordon–Loeb model is very established in the literature and certainly agood choice if the main focus of an analysis is security investment. This book

18

covers a range of topics where security investment of more than one player is onlypart of a larger analysis. In these cases, it is reasonable to simplify the securityinvestment equations to keep the models as a whole analytically tractable, andthe number of parameters manageable.

First, let us combine the simplifications discussed above, such as the Bernoulliloss assumption (Sect. 1.2.2), and effective security investment (Def. 2), that isc = λs ⇒ c = s for a unit loss λ = 1. This allows us to specify the breachprobability function in terms of vulnerability v and security level s.

1.5.1 Linear breach probability function

The simplest possible breach probability function is linear in the security level,

S(s, v) = v · (1− s) for s ∈ [0, 1]. (32)

A disadvantage of this form is the absence of decreasing marginal returns, whichis not very realistic and prone to analytical corner solutions, i.e., s ∈ {0, 1}. Ifthis is not a serious limitation, we can reduce the action space to two elements,secure (s = 1) and insecure (s = 0). The associated outcome distributions aregiven in Table 2.

Table 2: Two-state model: corner cases of the linear baseline model

State Security s = c/λ Probability of loss p Expected loss E(λ)

Insecure 0 v λvSecure 1 0 0

We shall use this two-state model in Chapter ?? to study security gameswhere security is a hybrid between a public and private good.

1.5.2 Exponential breach probability function

In cases where corner solutions are not acceptable and decreasing marginalreturns are required, we will use a breach probability function of the form,

S(s, v) = vβ−s , (33)

where β > 1 is the security productivity. Like in the Gordon–Loeb framework,S(s, 0) = 0 for all s and S(0, v) = v for all v. Figure 5 plots the breachprobability function of Equation (33) for selected parameters. If the model inEq. (33) is deemed to still have too many parameters, one can consider fixing thevulnerability at v = 1. This assumption can be justified by anecdotal evidencethat connecting a PC to the Internet without any protection leads to an attackwith certainty.

19

For any v ∈ [0, 1], the optimal security investment s∗ ≥ 0 can be obtainedfrom the first-order condition of ENBIS,

δs(v − S(s, v)− s

)= 0 (34)

δs(v − vβ−s − s

)= 0 , (35)

which has an analytical solution for v > 0:

s =log (v log(β))

log(β). (36)

As the right hand side of Equation (36) is negative for β ∈(1, e1/v

), we require

s∗ = max

{log (v log(β))

log(β), 0

}. (37)

We can interpret the interval β ∈(1, e1/v

)as reflecting circumstances where

the security productivity of the available technology is too low for a given vul-nerability v to justify any investment. This concurs with the marginal return ofsecurity investment at s = 0 being less than one. Consequently, the organizationis deemed indefensible.

Figure 6 plots the optimal security investment s∗ as a function of the se-curity productivity β for three different assumptions of vulnerability v. Theindefensible ranges of β are visible where s∗ = 0. Observe that s∗ has a uniquemaximum and decreases as the security productivity raises further. This can beseen in the graph for v = 1, but it is also true for other values of v. We leave itas an exercise to the reader to find the unique maximum for v = 1

2 ,14 . In fact,

limβ→∞

log (v log(β))

log(β)= 0 . (38)

What are the implications? Well, if this model is valid, it suggests to securityvendors that, all else held equal, their revenue might could fall if their technologybecomes too productive (in terms of improving the level of security). Finally,note that λ/e is an upper bound for v = 1, thereby confirming the Gordon–Loebrule (Definition 10).

As a final remark, the two-state model of Table 2 can also be seen as a limit ofthe exponential breach probability function, Eq. (33), where security investmentis constrained to a binary action space s ∈ {0, 1} and security productivityβ =∞.

The models just presented reflect an accounting approach to managing infor-mation security. We next turn to the risk management perspective and explainhow information security fits that widely used terminology.

1.6 Information Security Risk Management

With the increasing dependence of organizations on information and informa-tion technology, the borderline between security investment and general risk

20

management has begun to blur. Therefore, it is useful to briefly review theterminology of risk management.

Risk management is embedded in a three-stage process, comprising ex anterisk analysis, the actual risk management task, and ex post risk monitoring.Risk analysis is often subdivided into two tasks, risk identification and riskquantification. Risk monitoring mainly concerns the continuous validation ofassumptions and the documentation of outcomes (whether in good or bad state)to collect data for future risk analyses.

Once risks are identified and quantified, they need to be “managed”. Thismeans the organization has to make a conscious decision on how to deal withthe risks it is exposed to. There are four canonical responses.

1.6.1 Risk acceptance

Doing nothing is one option, typically referred to as risk acceptance. In practice,risk acceptance is appropriate in two situations:

• First, if the worst-case loss is small enough to be paid from proceeds orreserves. Depending on the ownership and legal status of an organization,risk acceptance may require building up or setting aside capital reserves.2

• Second, if the probability of occurrence is smaller than other business risksthat threaten the organization’s survival.

This explains why there is no one-size-fits-all approach to information secu-rity. A small start-up whose probability of surviving the next six months is onthe order of 50% is likely to regard some information security risks differentlythan, say, an established brand with a reputation to lose. The important les-son to remember is that managing information security risks requires periodicadjustment as the business transforms and exposures change. For example, itwas completely rational for Facebook not to worry about data breaches back in2006, but that would be an unwise approach today. Of course, the challenge forany growing organization is to find the right point in time to update an earlierdecision.

1.6.2 Risk mitigation

If a risk is too big and probable to be accepted, risk mitigation tries to reducethe probability and severity of a loss event with protection measures. Thisoption is exactly where security investment finds its place in risk management.Risk mitigation is efficient whenever the expected benefit of security exceeds thecost of security. Recall from the security investment models that the optimalsecurity investment typically does not mitigate the risk completely. Therefore,other risk management options (including risk acceptance) may be necessary to

2This is why risk acceptance is sometimes called self-insurance. Organizations build acapital buffer like an insurance company, just for themselves. Unfortunately, the term self-insurance has been overloaded in the economics of information security literature. Hence, weuse risk acceptance.

21

deal with the residual risk, i.e., the part of the risk a given protection technologycannot efficiently mitigate. In cases where the organization is indefensible, allrisk has to be managed like residual risk.

1.6.3 Risk avoidance

The third risk management response is risk avoidance. Like risk mitigation, ittries to reduce the severity of loss events, but with different means. A risk isavoided by stopping a risky activity, thereby incurring the opportunity cost oflost business. For example, an online shop could avoid the risk of fraudulentorders from abroad by not accepting overseas customers. Here the opportunitycost is pretty easy to see. In another example, risk avoidance could meanto disconnect a customer database from the Internet to void the risk of databreaches. In this case, the opportunity costs are lost business due to customerswho switch to the competition who offers the convenience of online databaseaccess.

1.6.4 Risk transfer

The final option is risk transfer, a contractual agreement with a third party tocompensate the organization for losses incurred due to the realization of risk.The third party can be an insurance company who pools the risks of manyinsureds and counts on the law of large numbers predicting that not all riskswill realize at the same time. Other constructions are possible as well, includ-ing financial markets or organizations with different attitude towards risk ascounterparts. A full account of risk transfer exceeds the scope of this chapter(because it affects more than one organization and requires methodological pre-requisites to be introduced in the following chapters). We come back to thistopic and devote all of Chapter ?? to the ramifications of cyber-risk transfer.

Figure 7 summarizes the risk management terminology graphically.

1.7 Measuring the Security Level

Most analytical security investment models use a breach probability function tomap the cost of security directly to the benefit of security, both measured onmonetary scales. However, in practice it can be convenient to split this processinto two independent mappings. First, the cost of security is mapped to thesecurity level. Second, the security level is mapped to the benefit of security.Figure 8 illustrates this approach.

Splitting the breach probability function brings several conceptual advan-tages. For one, the first mapping (cost to level) is deterministic, and onlythe second one (level to benefit) is probabilistic for the uncertainty about at-tacker behavior. Figure 8 visualizes this by the area shaded in gray. Moreimportantly, the first mapping is defined by the available protection technology,whereas the second mapping only depends on the organization’s risk exposure.Consequently, the separation facilitates benchmarking exercises. One reason

22

why there is so little empirical evidence for security investment models in theGordon–Loeb style is the difficulty of finding bases of comparison. To directlyvalidate a breach probability function mapping cost to benefit, one would haveto identify many companies choosing between the same protection technologieswhile exposed to the same kinds of risk. (Not to mention that they all mustchoose the same mix of risk management options.) With a two-step mapping,by contrast, we can compare the efficiency of security technology across vari-ous organizations largely independently of their risk exposure. In principle, wecan compare the security productivity of the first mapping between an onlinebank and an online flower shop, who dispose of the same security technology,for example the SSL protocol. Nevertheless, the exposure to risk, and hence thebenefits each organization gains from the same security level, varies substan-tially. This is reflected in the curvature (or slope in Fig. 8, for simplicity) ofthe second mapping. In practice, unfortunately, empirical data has been hardto come by even if a two-step mapping were used.

Observe that the two-step mapping in Figure 8 exhibits the different effectsof two risk management options. Risk mitigation refers to the amount of se-curity investment leading to a specific security level. Its effect materializes inpositive benefits of security. Risk avoidance, by contrast, reduces the benefitof security for a fixed security level. This is so because risk avoidance tries toreduce the organization’s exposure to risk, hence any given protection technol-ogy promises fewer benefit. As the cost of security is not directly affected byrisk avoidance, the optimal security investment may decline, too. For example,a bank whose customers primarily transact at local branches may spend less oninformation security than does an online-only bank. This highlights again thatrisk mitigation and risk avoidance have to be planned jointly; without a visualexample, this insight extends to all four risk management options (see Sect. 1.6above).

A remaining problem is how to measure the security level. We have seen thatboth the costs and benefits of security are hard to measure accurately, mainlybecause of separation issues and measurement noise. Yet in principle, both areobservable monetary quantities. By contrast, we have introduced the securitylevel as not directly observable (i.e., latent) metric in the context of Definition 1.This implies that we must resort to indirect measurement or estimation withthe help of observable security indicators.

Definition 11. (Security indicator) A security indicator is a observable signalconveying information about the security level.

Note that this definition is quite general. It does not require that the signal ismeasurable on a specific scale. “Conveying information” should be understoodin an information-theoretic sense, meaning that the mutual information betweena security indicator and the security level is strictly positive. The indicator inno way determines the security level. Rather, the information of many differentsecurity indicators can be aggregated to an estimate of the unobservable truesecurity level. Fortunately, security indicators can usually be measured withgreater ease than direct metrics such as the costs and benefits of security.

23

Typical security indicators are count data or ratios of observable technicalor organizational events; for example: number of security incidents, mean-timebetween security incidents, mean-time to incident recovery, mean-time to patch,percent of configuration compliance. These and other indicators including moreprecise definitions are documented in a report by the Center for Internet Security[7], a non-profit organization advocating cyber-security.

Note that the aforementioned organization and other literature use the term“security metrics” broadly for what is more accurately deemed indicators. Fewof the indicators fulfill the mathematical properties of a metric. Another concep-tual critique is that security metrics in the colloquial sense are often mixed withmeasures of risk, threat, or cost of security. Our stance is to make the readeraware of this terminological ambiguity and stick to the stricter convention thatindicators convey information about an underlying metric without requiring aspecific scale level in their measurement model. This is well in line with theuse of the term “indicator” in management science (e.g., key performance indi-cators) and the term “metric” in statistics, economics, and behavioral sciences(e.g., psychometrics). The metrics introduced at the start of this chapter (EBIS,etc.) are metrics in this sense of the word.

The list of proposals for security indicators is long and it is beyond thescope of this book to go into details. Julisch [15] suggested a structure ofindicators by their input type, broadly separating between indicators analyzingthe design process of a system, indicators analyzing its operation, and indicatorsanalyzing maintenance and update processes. This level of granularity is alreadytoo technical for someone who takes the high-level accounting perspective ondefining a security budget, but closer to the more domain-specific question onhow to allocate a given budget strategically. We will present economic models toanswer these questions in the next section, right after the box on measurementchallenges. We close this subsection by recapitulating its bottom line, namelythat a two-step security production function gives security indicators a naturalplace in security investment models.

Difficulty of estimating rare and extreme events

Box 1. Whether model or back-of-the-envelope calculation, the parameter valuesunderpinning an investment decision should ideally not be drawn out of thin air,but rather be estimated from observations of reality. A specific difficulty arises inthe realm of information security, because many attacks are actually rare events.This box outlines the consequences, first for the estimation of the probability ofloss p, and then for the loss amount λ.

Estimating the probability of loss

According to the law of large numbers, relative frequencies converge to proba-bilities if the number of observations grows large. Leaving aside the problem ofattacks going unnoticed, if we observe 10 attacks in 20 years, we can set p = 50%

24

per year with some confidence. More precisely, the true p lies in the interval[30%, 70%] with 95% probability. So, to be on the safe side, we can calculatescenarios, multiplying our loss amount, say λ = US$ 1 million, with both endsof the interval and learn that in the 95%-worst case, we are 40% off. That’sunfortunate, but tolerable.

However, in practice, we deal with attacks which happen far less often, sayin the order of p = 0.5%. Thus, we would have to wait on average 100 years tosee the first loss. This is five times longer than the existence of the commercialInternet. Even if we had observations of 200 periods and observed 1 attack(p = 0.5%), the 95% confidence interval would be [0.02%, 3%]. This means thepessimistic scenario is 600% off; clearly too much to come to any quantitativeconclusion.

What can we do about it? A common but barely scientific remedy is toforget about observations from reality and set parameters by expert judgment.However, cognitive science has demonstrated over and over that humans arenotoriously bad at estimating probabilities. A smarter way is to approximateattack probabilities with a set of assumptions about conditional probabilities thatare easier to observe. For example, suppose a server can be compromised ifthere is an unpatched vulnerability in the backend and the attacker successfullybroke a password to get around the perimeter. From patch records and secu-rity advisories we know that the backend is vulnerable 3 days per month, i.e.,P (backend) = 10%. Moreover, server logs of login attempts indicate that apassword compromise happens every two years and remains undetected for oneweek, i.e., P (perimeter) = 3.8%. Now we can calculate the joint probabilityp = P (backend ∧ perimeter) = 0.4%. This example combines two independentfactors using the logical conjunction. The approach generalizes to k factors andto the so-called attack tree method if conjunction (and) disjunction (or) canbe combined in order to model risk arrival in a graph structure.

Other approaches include data sharing to accumulate observations over timeand multiple entities. Chapter ?? will discuss when and why this can work ornot.

Estimating loss amounts

Once we have found p, a second problem emerges: how big is the loss givenattack? If we are lucky and find data records on historical loss amounts, wenotice that losses are not homogeneous, but follow a loss distribution L. All weknow about L are data points, which are realizations of a random variable; likein this figure:

25

0x

L(x)

Exponential tail

Pareto tail

data points on loss events

1 2 3 4million

Typical questions to ask include: what is the mean loss amount λ? Or, whatis the probability of facing a loss larger than λmax? To answer the first question,let’s take the empirical average of the five data points. It is 1.2 million. Usingthe Chebyshev’s inequality in the absence of a distribution assumption, the truemean is in the interval [0.4, 2.0] million with 95% probability. It is desirable tonarrow down these estimates by making some gentle assumptions about the lossdistribution. For example, it is plausible (and visually supported by the data)that small losses are more likely than large ones. When choosing a distributionassumption for L to reflect this characteristic, the shape of its tails mattersmost, because this is where the large losses are. One can broadly distinguishbetween exponential tails, satisfying 1 −

∫ x0L(z) dz ∝ e−x, and Pareto tails.

For example, the Gaussian and exponential distributions have exponential tails.Pareto tails satisfy 1−

∫ x0L(z) dz ∝ x−ν , where ν is the tail coefficient. Smaller

values of ν denote heavier tails and for ν → ∞, the distribution converges toexponential tails.

Unfortunately, it is almost impossible to identify the true shape of the tailsfrom a small number of observations. Consider the data points (y1, y2, . . . , y5) inthis example. The likelihood P (y | L) =

∏i L(yi) is exactly the same for L be-

ing a half-Gaussian distribution with mean one and standard deviation 1.3 (bothin millions), or a half-Student-t distribution exhibiting Pareto tails with ν = 2.These two density functions are drawn in the figure. The ambiguity about theunderlying distribution translates into uncertainty about the right parameter es-timates. For example, estimating the mean loss with a half-Gaussian assumptionyields 1.0 million, whereas the Student-t tail yields 1.4 million. This differencein the order of 16% widens substantially if we turn to out-of-sample estimates togauge the probability of extreme events. Suppose we want to know the probabil-ity of incurring a loss of 4 million or larger. We obtain P (x > 4 million) = 0.1%for the exponential tail assumption, which is a negligible risk in many settings.By contrast, the probability jumps to almost 3% if we use the Pareto tail as-sumption. This illustrates the crux faced by security managers who want toapply quantitative rigor in practice.

We note that the discussion in this box has only considered estimating first-

26

order properties of random variables. The situation is even worse when itcomes to higher-order properties of multiple distributions, such as the corre-lation between two loss distributions driven by different but stochastically depen-dent sources of risk.

In summary, by and large, data availability and the difficulty to obtain ac-curate parameter estimates is the Achilles heel of security investment models inpractice.

2 Where to Invest?

This section turns to questions of security investment which are more related tothe optimal allocation of a given security budget rather than defining or justi-fying a security budget against non-security investment alternatives. Therefore,the models and methods presented here are most appropriate for security man-agers working on a tactical level and who prefer quantitative rigor over gutfeelings to guide their decision making.

But first we make a general remark about the state of the art in this space.As we become more domain-specific when constructing investment models, wedepart the attention range of general accounting and business scholars. As a re-sult, the number of established models is quite limited, and this sub-field is muchmore in development than general investment theory applied to security. Never-theless, in the following subsection, we will review approaches to an importanttactical security investment decisions, namely optimal filter configuration.

Receiver operating characteristic (ROC)

Box 2. Informed decisions, whether made by humans or algorithms, are basedon the aggregations of observations of reality, quantized to a discrete signal x ∈S. In the simplest case, the signal is binary: for example, S = {0, 1} in commu-nications, S = {undervalued, overvalued} for a stock trader, S = {reject, accept}for a hypothesis in research, and S = {benign,malicious} for a filter, such asan intrusion detection system (IDS).

In all these cases, the signal is prone to two types of errors when comparedto reality, which defines the ground truth. This table illustrates the cases for anIDS.

Filter defense mechanism

Reality

Signal no attack attack

benign 1− α βmalicious α 1− β

27

If a normal process is falsely classified as malicious, we call it error of type Iand measure the associated false positive rate (or probability) by α ∈ [0, 1]. Ifthe system fails to detect an attack, we call it error of type II and measurethe associated false negative rate (or probability) by β ∈ [0, 1]. 1 − β is thedetection rate and 1−α is the correct rejection rate. As usual, rates are relativefrequencies and converge to probabilities if the number of observations growslarge.

We can make α arbitrarily small by programming the system to always signal‘benign’, but this pushes β up to 100%; and vice versa. Neither of these twoextremes delivers any useful information about reality, but they define the endpoints of a range of trade-offs between α and β. By convention, the set offeasible combinations of error rates is plotted on a curve, which is called receiveroperating characteristic for its origin in communication signal reception.

0

Detectionrate

1−β

1

False positive rate α 1

45◦

In the figure, the dashed curve achieves fewer false negatives for any choice ofthe false positive rate than the solid curve. In general, ROC curves taking acourse left and above of alternative curves indicate better detection performanceor more information about reality. A step function through the point (0, 1) (sothat (α = β = 0) indicates perfect information. By contrast, any point onthe main diagonal is no better than random guessing. Therefore ROC curvescannot lie below the main diagonal. If this happens for empirical ROC curves,then the detector can be improved by flipping the signal output, that is, believingthe opposite of what the detector reports.

A difficulty with ROC curves is that they may be asymmetric, as well asintersect with each other. Therefore, they cannot be put in a complete order,as can be done for indifference curves. Nonetheless, researchers have come upwith a number of summary measures which condense the information of a ROCcurve to a single scalar. One is the equal error rate (EER). This conventionsuggests to compare ROC curves only at the point where they intersect with theanti-diagonal, i.e., α = β. A value of EER = 0 indicates perfect detection.A downside of the EER metric is that the best operating point depends on theapplication and may not necessarily be where EER is measured. Another con-vention that also accounts for the detection power at the tail of the curves is to

28

measure the area under the curve (AUC) . Here, AUC = 1 indicates perfect de-tection. A caveat is that some definitions of AUC normalize to the upper triangle(i.e., AUC = 0 for random guessing) whereas others do not (i.e., AUC = 1

2 forrandom guessing).

In practice, the operating point on the curve can be selected by adjusting adetection threshold which quantizes observations to a binary signal. In the caseof IDS, one can think of many observable attributes that each convey some in-formation about the benign or malicious nature of a program or process. Theseattributes span a feature space, in which the points representing benign and mali-cious events overlap in certain regions. Here the notion of a detection thresholdgeneralizes to the choice of a hyperplane in feature space which partitions thespace into two regions, one for each detector output. Practical IDS should allowadjustment of the decision threshold(s) to optimize the system for the operator’sneeds.

2.1 Optimal Filter Configuration

Most filters are binary classifiers which approximate authorization decisions withpre-defined heuristics or learned rules. Their decisions are hardly ever perfect.Therefore, finding the optimal operating point of filter-based protection mea-sures, such as intrusion detection systems or spam filters, is a relevant problemin practice, which can be framed as an economic trade-off. It balances the op-portunity cost of false positives against the losses incurred from false negatives.

Within this section (2.1), let β : [0, 1]→ [0, 1] be the false negative rate as afunction of the false positive rate α. We use this function to formalize the ROCcurve given by the technology in use. Recall from Box 2 that we can triviallycreate a detector with no false positives by marking everything benign, but thenwe end up with 100% false negatives. Consequently, we define β(0) = 1 anduse similar logic to see that β(1) = 0. We also assume for now without loss ofgenerality that the function is twice differentiable, β′(x) < 0 and β′′(x) ≥ 0.

Furthermore, let a, b > 0 be the costs incurred per false positive and falsenegative, respectively. In the example of a malware filter (antivirus), a wouldbe the opportunity cost of business lost due to a falsely blocked message. Andb would be the direct loss of a successful malware attack including follow-upcosts like recovery and cleanup. We assume that the cost of installing the filteris fixed and sunk (i.e., the decision whether to invest in a filter has already beenmade) and that adjusting the operating point is cheap enough to ignore in ourcalculations. Our problem is formulated so that we find the optimal α∗ whichminimizes the expected cost of decision errors,

α∗ = arg minαp · β(α) · b+ (1− p) · α · a , (39)

where p is the exogenous prior probability of a message containing malware.

29

The FOC of Eq. (39) is,

0 = δα(p · β(α∗) · b+ (1− p) · α∗ · a

); (40)

after rearranging, we obtain:

β′(α∗) = −1− pp· ab. (41)

The right hand side of Eq. (41) is the product of the odds ratio of the benignevent and the cost ratio. It also defines the slope of an indifference curve, moreprecisely an indifference line, articulating preferences among α and β. Theoptimal operating point is located where the most preferable indifference linetouches the technology bound, in this case the ROC curve, as illustrated inFigure 9. Note that to be consistent with the axes used by ROC curves, thedecision maker is actually trading off between α and 1 − β. Consequently, theslope of the indifference line is set to the negative of Eq. (41), and is thereforepositive as indicated in the figure. The solution is unique if the inequalityβ′′(x) ≥ 0 is strict. Otherwise the decision maker is indifferent within a closedinterval of operating points.

Figure 9 displays the optimal false positive rates α∗ for two ROC curves Aand B, representing different technologies. As the absolute value of the right-hand side of Eq. (41) is greater than one, the probability-weighted error costssuggest a trade-off where one tolerates higher false negative rates to keep thefalse positives low. Observe that both ROC curves A and B have the same equalerror rate (EER) and cover the same area under the curve (AUC). Nevertheless,technology B at its optimal operating point α∗B realizes a better indifference linefor the given cost structure (a, b) and prior probability p of receiving malware.

Now we know how to find the optimal operating point for a classifier witha continuous ROC curve. But what happens if we cannot adjust the decisionthreshold arbitrarily? For example, some protection technology could only allowk discrete thresholds, so that the ROC looks like a step function. In the simplestcase, k = 2 as depicted in Figure 10. The two thresholds are points in the(α, β)-plane marked D and E. By discarding information, we can always reachthe extreme points C and F. Which of the four points shall we chose given (a, b)and p?

We use the concept of randomized classifiers to tackle this question. Everyempirical (i.e., discrete) ROC can be converted to a concave curve by tossing abiased coin to determine the operating point ahead of every classification. Forexample, every point on the dotted line connecting D and E in Figure 10 canbe achieved by choosing bias q ∈ [0, 1] so that the detector operates at point Dwith probability q and at point E with probability 1− q. The slope of the linegives us β′ for the optimality condition in Eq. (41); assuming for now that theoperating point can be changed instantly and without setup cost.

As the so reshaped ROC curve is not necessarily differentiable at its discretesupport points (D and E in the example), we have two asymptotic values forβ′ at every support point, one approaching the point from below and the other

30

from above. The optimal discrete operating point is the one where the slope ofthe indifference line falls into the interval between the two asymptotes,

α∗ =

{α ∈ {α1, . . . , αk}

∣∣∣∣ − 1− pp· ab∈[

limx→α−

β′(x), limx→α+

β′(x)

]}. (42)

If the indifference line is not parallel to one of the ROC curve’s segments, thispoint is unique because of the concavity. In the example of Figure 10, theoptimal α∗ is the false positive rate at point D. Otherwise, if the indifferenceline is parallel to any given segment(s), the decision maker is indifferent (i.e.,can choose) between any of the operating points on the parallel segment(s),including their end points. Consequently, as long as the indifference curvesare linear, we never need a randomized classifier in practice. The associatedassumptions of instant and costless change of the operating point do not imposesignificant constraints.

As a final observation, if the slope of the indifference line is steeper than thesegment C–D, or flatter than the segment E–F, the optimal operating points‘shortcut’ the filter. This connects back to the prior decision on whether it isefficient to invest in a filter at all, given its ROC.

3 Further Reading

ALE has been used in general risk management since the 1970s and was adaptedfor IT security risks in the FIPS publication #65 by the US National Bureau ofStandards [16]. Different variants of definitions for ROSI are summarized in [4].Readers literate in German may appreciate the NPV-based security investmentmodel in [9].

variations of the ROSI metric, all extensions to Gordon–Loeb, including em-pirical validation studies [18], discounting and tax shields, real options approach

For an extension of the Gordon–Loeb model with risk aversion see [13].The baseline models are adapted from [11] for the linear breach probability

functions and from [3] for the exponential breach probability function.Jaquith [14] provides a high-level introduction to security indicators (aka

metrics) targeted to practitioners.Readers who are interested in the minutiae of published security investment

models are advised to start with literature surveys, which appear in regularintervals with varying focus. Examples include (in chronological order) Su [17],Bohme [2], or Demetz and Bachlechner [8].

[5] discuss security investment decisions for multi-stage filter cascades (fire-wall, IDS, manual inspection) with noisy signals. [6] discuss optimal filter con-figuration in a defender-versus-attacker game which models deterrence.

The authors want to acknowledge Thomas Nowey, co-author of Rainer Bohmefor book chapter on economic security metrics [4], from which some informationwas included in this chapter.

31

References

[1] Yuliy Baryshnikov. IT security investment and Gordon–Loeb’s 1/e rule.In Workshop on the Economics of Information Security (WEIS), Berlin,Germany, 2012.

[2] Rainer Bohme. Security metrics and security investment models. In IsaoEchizen, Noboru Kunihiro, and Ryoichi Sasaki, editors, Advances in In-formation and Computer Security (IWSEC 2010), number 6434 in LNCS,pages 10–24, Berlin Heidelberg, 2010. Springer-Verlag.

[3] Rainer Bohme. Security audits revisited. In Angelo Keromytis, editor, Pro-ceedings of Financial Cryptography and Data Security, volume 7397 of Lec-ture Notes in Computer Science, pages 129–147, Berlin Heidelberg, 2012.Springer-Verlag.

[4] Rainer Bohme and Thomas Nowey. Economic security metrics. In I. Eu-sgeld, C. Freiling, F. and R. Reussner, editors, Dependability Metrics, vol-ume 4909 of Lecture Notes in Computer Science, pages 176–187, BerlinHeidelberg, 2008. Springer-Verlag.

[5] Huseyin Cavusoglu, Birendra Mishra, and Srinivasan Raghunathan. Amodel for evaluating IT security investments. Communications of the ACM,47:87–92, 2004.

[6] Huseyin Cavusoglu, Birendra Mishra, and Srinivasan Raghunathan. Thevalue of intrusion detection systems in information technology security ar-chitecture. Information Systems Research, 16(1):28–46, 2005.

[7] The CIS Security Metrics. The Center for Internet Security, 2010.

[8] Lukas Demetz and Daniel Bachlechner. To invest or not to invest? Assess-ing the economic viability of a policy and security configuration manage-ment tool. In Workshop on Economics and Information Security (WEIS),Berlin, Germany, 2012.

[9] Ulrich Faisst, Oliver Prokein, and Nico Wegmann. Ein Modell zur dy-namischen Investitionsrechnung von IT-Sicherheitsmaßnahmen. Zeitschriftfur Betriebswirtschaft, 77(5):511–538, 2007.

[10] Lawrence A. Gordon and Martin P. Loeb. The economics of information se-curity investment. ACM Transactions on Information and System Security,5(4):438–457, 2002.

[11] Jens Grossklags, Nicolas Christin, and John Chuang. Secure or insure? Agame-theoretic analysis of information security games. In Proc. of the Int’lConference on World Wide Web (WWW), pages 209–218, Beijing, China,2008. ACM Press.

32

[12] Kjell Hausken. Returns to information security investment: The effect ofalternative information security breach functions on optimal investmentand sensitivity to vulnerability. Information Systems Frontiers, 8(5):338–349, 2006.

[13] Derrick C. Huang, Qing Hu, and Ravi S. Behara. An economic analysisof the optimal information security investment in the case of a risk-aversefirm. International Journal of Production Economics, 114(793–804), 2008.

[14] A. Jaquith. Security Metrics: Replacing Fear, Uncertainty, and Doubt.Pearson Education, Upper Saddle River, NJ, 2007.

[15] Klaus Julisch. A unifying theory of security metrics with applications.Technical report, IBM Research, 2009.

[16] National Bureau of Standards. Guideline for Automatic Data ProcessingRisk Analysis, FIPS PUB 65, 1979.

[17] Xiaomeng Su. An overview of economic approaches to information securitymanagement. Technical Report TR-CTIT-06-30, University of Twente,2006.

[18] H. Tanaka, K. Matsuura, and O. Sudoh. Vulnerability and informationsecurity investment: An empirical analysis of e-local government in Japan.Journal of Accounting and Public Policy, 24:37–59, 2005.

[19] Jan Willemson. On the Gordon & Loeb model for information security in-vestment. In Workshop on the Economics of Information Security (WEIS),University of Cambridge, UK, 2006.

33

0

Pro

bab

ilit

yof

lossp

1

12

14

Security level s

0 1 2 3 4 5

β = 2

β = 8

β = 54

v = 12

v = 1

Figure 5: Baseline model: exponential breach probability function

0Op

tim

alse

curi

tyle

vels∗

λ2

λ4

Security productivity β

1 5 10 20 30 40 50 60

v = 1

v = 12

v = 14

λ/e (Gordon–Loeb rule of thumb)

e2

indefensible (v = 12)

indefensible (v = 1)

Figure 6: Baseline model: optimal security level s∗ as a function of the securityproductivity β.

34

Riskanalysis

Riskmanagement

Riskmonitoring

• identification

• quantification• acceptance

• mitigation

• avoidance

• transfer

• validation

• documentation

Figure 7: Overview of the risk management terminology

Cost ofsecurity

Benefit ofsecurity

Security level

Risk mitigation

Risk avoidance

Security productivity

Figure 8: Security production function as two-step mapping

35

0

Det

ecti

on

rate

1−β

1


45◦

B

A

(1−p)ap·b

α∗B

α∗A

Figure 9: Optimal filter configuration for continuous ROC curves

0

Det

ecti

on

rate

1−β

1


45◦

(1−p)ap·b

C

FE

Dα∗

Figure 10: Optimal filter configuration for a discretized ROC curve

36

security metrics and security investment - university … metrics and security investment ... or...

Documents