efficient purely-dynamic information flow analysis · efficient purely-dynamic information flow...

12
Efficient Purely-Dynamic Information Flow Analysis Thomas H. Austin Cormac Flanagan University of California at Santa Cruz [email protected] [email protected] Abstract We present a novel approach for efficiently tracking in- formation flow in a dynamically-typed language such as JavaScript. Our approach is purely dynamic, and it detects problems with implicit paths via a dynamic check that avoids the need for an approximate static analyses while still guar- anteeing non-interference. We incorporate this check into an efficient evaluation strategy based on sparse information labeling that leaves information flow labels implicit when- ever possible, and introduces explicit labels only for values that migrate between security domains. We present experi- mental results showing that, on a range of small benchmark programs, sparse labeling provides a substantial (30%–50%) speed-up over universal labeling. Categories and Subject Descriptors D.3.3 [Programming Languages]: Language Constructs and Features; D.4.6 [Operating Systems]: Security and Protection—Information flow controls General Terms Languages, Security Keywords Information flow control, dynamic analysis 1. Introduction The error-prone nature of software systems motivates the desire to separate security from functionality wherever pos- sible. For example, much current software is developed in safe languages, where memory safety is ensured by the lan- guage runtime itself, rather than being an emergent property of complex and buggy application code. Applications written in safe languages such as JavaScript are still vulnerable to other kinds of security problems, how- ever, such as loss of privacy or integrity, and particularly so in a browser setting where JavaScript code from multiple un- trusted or semi-trusted servers executes in the same process. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PLAS ’09 June 15, Dublin, Ireland. Copyright c 2009 ACM 978-1-60558-645-8/09/06. . . $10.00 For example, cross-site scripting attacks exploit confusions about the degree of authority or trust that should be granted to various code and data fragments. To help address these kinds of higher-level security prob- lems, we explore the approach of dynamically tracking in- formation flow in the language runtime. The particular lan- guage we consider is a variant of the untyped λ-calculus, but the general approach should be applicable to JavaScript and other dynamically-typed languages. We note that much prior work on type systems that enforce information flow properties [Myers 1999, Myers and Liskov 1997] is unfor- tunately not applicable to such languages. Furthermore, a static analysis approach could be problematic in a browser setting, where the analysis might need to be re-run on each browser client before program execution [Vogt et al. 2007]. Finally, dynamic analysis also allows for somewhat more flexibility in applying policies, and can allow us to hot-swap information flow policies [Chandra and Franz 2007]. This paper presents two semantics for tracking infor- mation flow. The first semantics uses a straightforward Universal Labeling representation, where every value has an associated information flow label. This explicit rep- resentation makes it straightforward to track information flows and to enforce the key correctness property of non- interference [Goguen and Meseguer 1982]. However, uni- versal labeling incurs significant overhead to allocate, track, and manipulate the labels attached to each value. In practice, programs typically exhibit a significant de- gree of label locality, where most or all items in a data structure will likely have identical labels. For example, in a browser setting, most values will likely be created and ma- nipulated within a single information flow domain. To exploit this label locality property, our second seman- tics uses a more efficient Sparse Labeling representation that leaves labels implicit (i.e., determined by context) whenever possible, and introduces explicit labels only for values that migrate between information flow domains. This strategy eliminates a significant fraction of the overhead usually asso- ciated with dynamic information flow analyses. At the same time, sparse labeling has no effect on program semantics and is observably equivalent to universal labeling. In particular,

Upload: doantram

Post on 30-May-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient Purely-Dynamic Information Flow Analysis · Efficient Purely-Dynamic Information Flow Analysis ... a prerequisite to policy enforcement. We formalize our information flow

Efficient Purely-Dynamic Information Flow Analysis

Thomas H. Austin Cormac FlanaganUniversity of California at Santa Cruz

[email protected] [email protected]

AbstractWe present a novel approach for efficiently tracking in-formation flow in a dynamically-typed language such asJavaScript. Our approach is purely dynamic, and it detectsproblems with implicit paths via a dynamic check that avoidsthe need for an approximate static analyses while still guar-anteeing non-interference. We incorporate this check intoan efficient evaluation strategy based on sparse informationlabeling that leaves information flow labels implicit when-ever possible, and introduces explicit labels only for valuesthat migrate between security domains. We present experi-mental results showing that, on a range of small benchmarkprograms, sparse labeling provides a substantial (30%–50%)speed-up over universal labeling.

Categories and Subject Descriptors D.3.3 [ProgrammingLanguages]: Language Constructs and Features; D.4.6[Operating Systems]: Security and Protection—Informationflow controls

General Terms Languages, Security

Keywords Information flow control, dynamic analysis

1. IntroductionThe error-prone nature of software systems motivates thedesire to separate security from functionality wherever pos-sible. For example, much current software is developed insafe languages, where memory safety is ensured by the lan-guage runtime itself, rather than being an emergent propertyof complex and buggy application code.

Applications written in safe languages such as JavaScriptare still vulnerable to other kinds of security problems, how-ever, such as loss of privacy or integrity, and particularly soin a browser setting where JavaScript code from multiple un-trusted or semi-trusted servers executes in the same process.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists, requires prior specific permission and/or a fee.PLAS ’09 June 15, Dublin, Ireland.Copyright c© 2009 ACM 978-1-60558-645-8/09/06. . . $10.00

For example, cross-site scripting attacks exploit confusionsabout the degree of authority or trust that should be grantedto various code and data fragments.

To help address these kinds of higher-level security prob-lems, we explore the approach of dynamically tracking in-formation flow in the language runtime. The particular lan-guage we consider is a variant of the untyped λ-calculus,but the general approach should be applicable to JavaScriptand other dynamically-typed languages. We note that muchprior work on type systems that enforce information flowproperties [Myers 1999, Myers and Liskov 1997] is unfor-tunately not applicable to such languages. Furthermore, astatic analysis approach could be problematic in a browsersetting, where the analysis might need to be re-run on eachbrowser client before program execution [Vogt et al. 2007].Finally, dynamic analysis also allows for somewhat moreflexibility in applying policies, and can allow us to hot-swapinformation flow policies [Chandra and Franz 2007].

This paper presents two semantics for tracking infor-mation flow. The first semantics uses a straightforwardUniversal Labeling representation, where every value hasan associated information flow label. This explicit rep-resentation makes it straightforward to track informationflows and to enforce the key correctness property of non-interference [Goguen and Meseguer 1982]. However, uni-versal labeling incurs significant overhead to allocate, track,and manipulate the labels attached to each value.

In practice, programs typically exhibit a significant de-gree of label locality, where most or all items in a datastructure will likely have identical labels. For example, ina browser setting, most values will likely be created and ma-nipulated within a single information flow domain.

To exploit this label locality property, our second seman-tics uses a more efficient Sparse Labeling representation thatleaves labels implicit (i.e., determined by context) wheneverpossible, and introduces explicit labels only for values thatmigrate between information flow domains. This strategyeliminates a significant fraction of the overhead usually asso-ciated with dynamic information flow analyses. At the sametime, sparse labeling has no effect on program semantics andis observably equivalent to universal labeling. In particular,

Page 2: Efficient Purely-Dynamic Information Flow Analysis · Efficient Purely-Dynamic Information Flow Analysis ... a prerequisite to policy enforcement. We formalize our information flow

we show that sparse labeling still satisfies the key correct-ness property of non-interference.

We present experimental results showing that, on a rangeof small benchmark programs, sparse labeling provides asubstantial (30%–50%) speed-up over universal labeling.

The presentation of our results proceeds as follows. Thenext section introduces the source language that we useas the basis for our development. Section 3 and 4 presentthe universal and sparse labeling semantics, respectively,together with their non-interference proofs. Section 5 de-scribes our language implementations, benchmarks, and ex-perimental results. Section 6 discusses related work, andSection 7 concludes.

2. Information Flow in the Lambda CalculusWe assume that the set Label of information flow labelsforms a lattice with associated ordering operation v, joinoperation t, and minimal element⊥, and that this lattice hasat least two elements L and H such that L v H . Thus, His a high-confidentiality label, and L is a low-confidentialitylabel.

This lattice may of course contain additional elements.For example, in a browser setting, Label might be the powerset lattice over all web sites that the browser is communicat-ing with. If a data item is labelled with

{ good.com, evil.com }

then this label indicates that that data has been influenced bynetwork messages from both these sites. In particular, thatdata should not be sent to evil.com, since it might containprivate data from good.com. In this paper, however, our fo-cus is not so much on information flow policies, but rather onefficient mechanisms for information flow tracking, which isa prerequisite to policy enforcement.

We formalize our information flow tracking mechanismsin terms of the idealized language λinfo, which is a variantof the λ-calculus extended with imperative reference cellsand with a mechanism for tagging data with informationflow labels. The syntax of λinfo is shown in Figure 1. Termsinclude constants (c), variables (x), functions (λx.e) andfunctional application (e1 e2). In addition, the language alsosupports mutable reference cells, with operations to allocate(ref e), dereference (!e), and update (e1:= e2) a referencecell. Finally, the operation 〈k〉e attaches the information flowlabel k to the result of evaluating e.

This language λinfo is intentionally minimal, in order toclearly present our information flow evaluation strategies.However, as usual, a rich variety of additional constructs(booleans, conditionals, let-expressions, pairs, etc) can beencoded in the language, as illustrated in Figure 1. We willuse some of these encodings in example programs below.

Figure 1: The Source Language λinfo

Syntax:

e ::= Termx variablec constantλx.e abstraction(e1 e2) applicationref e reference allocation!e dereferencee:= e assignment〈k〉e labeling operation

k, l, pc Labelx, y, z Variablec Constant

Standard encodings:

true def= λx.λy.x

false def= λx.λy.y

if e1 then e2 else e3def= (e1 (λd.e2) (λd.e3)) (λx.x)

let x = e1 in e2def= (λx.e2) e1

e1 ; e2def= let x = e1 in e2, x 6∈ FV (e2)

pair e1 e2def= (λx.λy.λb. b x y) e1 e2

fst edef= e true

snd edef= e false

3. Universal Labeling Semantics for λinfo

We formalize a semantics of λinfo that tracks informationflow dynamically to enforce non-interference. In particular,if the result of program execution is public (i.e., labeled L)then that result cannot have been influenced by confidentialdata. Of course, any confidential data accessed during theexecution could influence how long that execution takes,and in the extreme could cause the program to diverge. Tode-emphasise these timing-related issues, we formulate thesemantics of λinfo as a big-step operational semantics.

In this semantics, each reference cell is allocated at an ad-dress a, and the store σ maps addresses to values. A closure(λx.e, θ) is a pair of a λ-expression and a substitution θ thatmaps variables to values. We use ∅ to denote both the emptystore and the empty substitution. A raw value r is either aconstant, an address, or a closure.

Our initial semantics uses a universal labeling strategy,where every value v has the form rk and combines a rawvalue r with an explicit information flow label k.

Page 3: Efficient Purely-Dynamic Information Flow Analysis · Efficient Purely-Dynamic Information Flow Analysis ... a prerequisite to policy enforcement. We formalize our information flow

Figure 2: Universal Labeling for λinfo

Runtime Syntax

a ∈ Addressσ ∈ Storeu = Address →p Valueu

θ ∈ Substu = Var →p Valueu

r ∈ RawValueu ::= c | a | (λx.e, θ)v ∈ Valueu ::= rk

Evaluation Rules: σ, θ, e ⇓pc σ′, v

σ, θ, c ⇓pc σ, cpc

[U-CONST]

σ, θ, (λx.e) ⇓pc σ, (λx.e, θ)pc[U-FUN]

σ, θ, x ⇓pc σ, (θ(x) t pc)[U-VAR]

σ, θ, e1 ⇓pc σ1, (λx.e, θ′)k

σ1, θ, e2 ⇓pc σ2, v2σ2, θ

′[x := v2], e ⇓pctk σ′, v

σ, θ, (e1 e2) ⇓pc σ′, v[U-APP]

σ, θ, e1 ⇓pc σ1, ck

σ1, θ, e2 ⇓pc σ2, dl

r = [[c]](d)σ, θ, (e1 e2) ⇓pc σ2, r

ktltpc [U-PRIM]

σ, θ, e ⇓pc σ′, vσ, θ, 〈k〉e ⇓pc σ′, (v t k)

[U-LABEL]

σ, θ, e ⇓pc σ′, va 6∈ dom(σ′)

σ, θ, (ref e) ⇓pc σ′[a := v], apc[U-REF]

σ, θ, e ⇓pc σ′, ak

σ, θ, !e ⇓pc σ′, (σ′(a) t k t pc)[U-DEREF]

σ, θ, e1 ⇓pc σ1, ak

σ1, θ, e2 ⇓pc σ2, vk v label(σ2(a))

σ, θ, (e1:= e2) ⇓pc σ2[a := (v t k)], v[U-ASSIGN]

We formally define the universal labeling strategy via thebig-step evaluation relation:

σ, θ, e ⇓pc σ′, v

This relation evaluates an expression e in the context of astore σ, a substitution (or environment) θ, and the currentlabel pc of the program counter, and it returns the resultingvalue v and the (possibly modified) store σ′.

This relation is defined via the evaluation rules shownin Figure 2. The rules ensure that the result value v has alabel of at least pc (since this computed value depends onthe program counter). Thus, the rule [U-CONST] evaluates aconstant c to the value cpc . The rule [U-VAR] evaluates x to(θ(x)tpc). Here, we overload the operation t to also take avalue as its left argument, and this operation strengthens thelabel on that value:

rl t k def= rltk

The rule [U-APP] evaluates the body of the called functionwith upgraded program counter label pc t k, where k is thelabel of the called closure, since the callee “knows” that thatclosure was invoked. The notation θ[x := v] denotes thesubstitution that is identical to θ except that it maps x to v.

A primitive function is a constant such as “+” that can beapplied. The rule [PRIM] evaluates applications of primitivefunctions. This rule is defined in terms of the partial func-tion:

[[·]] · : Constant × Constant →p Constant

For example:[[+]](3) = +3

[[+3]](4) = 7

The rule [U-LABEL] joins an additional label k onto acomputed value v.

The last three rules track information flow across refer-ence cells. Allocation of reference cells via [U-REF] returnsa newly-allocated address apc with label pc. When a labeledaddress ak is dereferenced via [U-DEREF], the correspond-ing value σ′(a) is retrieved from the store, and the value(σ′(a) t k t pc) is returned, since this result depends onthe address being dereferenced and on the execution of thiscode branch.

Implicit Flows Finally, we consider the tricky issue of im-perative updates, which introduces the classic problem ofimplicit flows [Denning 1976]. To illustrate this problem,suppose we used the following assignment rule, which dy-namically upgrades the label on a reference cell whenever itis updated. Note that the first antecedent in this rule ensuresthat pc v k:

σ, θ, e1 ⇓pc σ1, ak

σ1, θ, e2 ⇓pc σ2, v

σ, θ, (e1:= e2) ⇓pc σ2[a := (v t k)], v[U-ASSIGN-BAD]

Page 4: Efficient Purely-Dynamic Information Flow Analysis · Efficient Purely-Dynamic Information Flow Analysis ... a prerequisite to policy enforcement. We formalize our information flow

Unfortunately, this rule leaks information via implicit flows.To illustrate this problem, consider the function:

fdef= λx. let y = ref true in

let z = ref true inif x then y := false else skip;if !y then z := false else skip;!z

When this function is applied to confidential boolean data,the rule [U-ASSIGN-BAD] permits both of the following eval-uations:

∅, ∅, (f trueH) ⇓L [ay := falseH , az := trueL], trueL

∅, ∅, (f falseH) ⇓L [ay := trueL, az := falseL], falseL

Thus, the function f leaks the value of its confidential argu-ment (labeled with H) into its public result (labeled with L).In particular, the conditional statement

if x then y := false else skip

leaks information about the argument x into the referencecell y in both branches, but only in one of these branches isthe label on !y upgraded toH (as shown by the values for ay

in the two resulting stores). Thus, this conditional leaks halfa bit, and so the dynamic upgrade strategy illustrated by therule [U-ASSIGN-BAD] is inadequate to prevent informationleaks, essentially because the information flow label is onlyupgraded on one of the two possible branches.

The No-Sensitive-Upgrade Check Our solution to this im-plicit paths problem is to prohibit such dynamic label up-grades that are caused by a confidential program counteror a confidential address (an approach also explored byZdancewic [2002] in his dissertation). Dynamic label up-grades caused by a confidential right-hand-side are not prob-lematic, however, and so are permitted.

Our semantics formalizes this strategy via the followingrule, where the additional antecedent k v label(σ2(a))performs the required no-sensitive-upgrade check. Here, thefunction label extracts the label from a value, and is definedby label(rk) def= k.

σ, θ, e1 ⇓pc σ1, ak

σ1, θ, e2 ⇓pc σ2, vk v label(σ2(a))

σ, θ, (e1:= e2) ⇓pc σ2[a := (v t k)], v[U-ASSIGN]

If this no-sensitive-upgrade check fails, then program evalu-ation terminates with an error. (As usual, program termina-tion may leak one bit of data.)

Under this rule, the problematic programs (f trueH) and(f falseH) can no longer be evaluated with the programcounter labelL. These programs are instead considered erro-neous because they attempt to update a public reference cell

from a confidential program counter. However, the program-mer can preemptively upgrade reference cells as needed, be-fore the conditional branch, as in the following function fok:

fokdef= λx. let y = ref true in

let z = ref true iny := 〈H〉!y;if x then y := false else skip;z := 〈H〉!z;if !y then z := false else skip;!z

This revised function fok then permits the executions:

∅, ∅, (fok trueH) ⇓L [ay := falseH , az := trueH ], trueH

∅, ∅, (fok falseH) ⇓L [ay := trueH , az := falseH ], falseH

where the confidential result is now appropriately labeled.This approach of preemptively upgrading certain refer-

ence cells requires the programmer (or possibly a static anal-ysis tool) to display some foresight about when code with aconfidential program counter may update public referencecells. This foresight then avoids the traditional problem withdynamic information flow–trying to reason about what lo-cations might have been assigned on the branch-not-takenin a conditional statement. Thus, the no-sensitive-upgrademechanism enables precise information flow analysis with-out needing an expensive and conservative static analysis ofevery conditional branch.

In fact, the static analysis can be seen simply as a specialcase of our approach, one that automatically and preemp-tively upgrades appropriate variables. Interestingly, we candrop the strict requirement that the static analysis be con-servative, and use a heuristic analysis instead. If the staticanalysis preemptively upgrades too few variables, the no-sensitive-upgrade check will still prevent secret informationfrom being leaked.

From the evaluation rules for the core language, we canderive corresponding evaluation rules for the encoded con-structs: see Figure 3. Reassuringly, these derived rules matchour expected intuition.

3.1 Correctness of Universal LabelingWe now show that the universal-labeling evaluation strategyguarantees non-interference. In particular, if two programstates differ only in H-labeled data, then these differencescannot propagate into L-labeled data.

To formalize this idea, we say two values areH-equivalent(written v1 ∼H v2) if either:

1. v1 = v2, or

2. both v1 and v2 have the label at least H , or

3. v1 = (λx.e, θ1)k and v2 = (λx.e, θ2)k and θ1 ∼H θ2.

Page 5: Efficient Purely-Dynamic Information Flow Analysis · Efficient Purely-Dynamic Information Flow Analysis ... a prerequisite to policy enforcement. We formalize our information flow

Figure 3: Universal Labeling for Encodings

Abbreviations:

(v1, v2)k def= (λb. b v1 v2, θ)k

Derived Evaluation Rules:

σ, θ, e1 ⇓pc σ1, (true, θ)k

σ1, θ, e2 ⇓pctk σ′, v

σ, θ, (if e1 then e2 else e3) ⇓pc σ′, v[U-THEN]

σ, θ, e1 ⇓pc σ1, (false, θ)k

σ1, θ, e3 ⇓pctk σ′, v

σ, θ, (if e1 then e2 else e3) ⇓pc σ′, v[U-ELSE]

σ, θ, e1 ⇓pc σ1, v1σ1, θ, e2 ⇓pc σ2, v2

σ, θ, (pair e1 e2) ⇓pc σ2, (v1, v2)pc[U-PAIR]

σ, θ, e ⇓pc σ′, (v1, v2)k

σ, θ, (fst e) ⇓pc σ′, (v1 t k)[U-FST]

σ, θ, e ⇓pc σ′, (v1, v2)k

σ, θ, (snd e) ⇓pc σ′, (v2 t k)[U-SND]

σ, θ, e1 ⇓pc σ1, v1σ1, θ[x := v1], e2 ⇓pc σ′, v

σ, θ, (let x = e1 in e2) ⇓pc σ′, v[U-LET]

σ, θ, e1 ⇓pc σ1, v1σ1, θ, e2 ⇓pc σ′, v

σ, θ, (e1; e2) ⇓pc σ′, v[U-SEQ]

Similarly, two substitutions areH-equivalent (written θ1 ∼H

θ2) if they have the same domain and

∀x ∈ dom(θ1). θ1(x) ∼H θ2(x)

LEMMA 1 (Equivalence). The two ∼H relations on valuesand substitutions are equivalence relations.

We define an analogous notion of H-compatible stores:two stores σ1 and σ2 are H-compatible (written σ1 ≈H σ2)if they are H-equivalent at all common addresses, i.e.,

σ1 ≈H σ2def= ∀a∈(dom(σ1)∩dom(σ2)). σ1(a) ∼H σ2(a)

Note that the H-compatible relation on stores is not tran-sitive, i.e., σ1 ≈H σ2 and σ2 ≈H σ3 does not implyσ1 ≈H σ3, since σ1 and σ3 could have a common addressthat is not in σ2.

The evaluation rules enforce a key invariant, namely thatthe label on the result of an evaluation always includes atleast the program counter label:

LEMMA 2. If σ, θ, e ⇓pc σ′, rk then pc v k.

The following lemma formalizes that evaluation with aH-labeled program counter cannot influence L-labeled datain the store.

LEMMA 3 (Evaluation Peserves Compatibility).If σ, θ, e ⇓H σ′, v then σ ≈H σ′.

PROOF By induction on the derivation of σ, e ⇓H σ′, v andcase analysis on the final rule in the derivation.

Finally, we prove non-interference: if an expression e isexecuted twice from H-compatible stores and H-equivalentsubstitutions, then both executions will yield H-compatibleresulting stores and H-equivalent resulting values. Thus, H-labeled data never leaks into L-labeled data.

THEOREM 1 (Non-Interference for Universal Labeling).If

σ1 ≈H σ2

θ1 ∼H θ2σ1, θ1, e ⇓pc σ′1, v1σ2, θ2, e ⇓pc σ′2, v2

thenσ′1 ≈H σ′2v1 ∼H v2

PROOF By induction on the derivation σ1, θ1, e ⇓pc σ′1, v1and case analysis on the final rule. This proof is similar tothe proof of Theorem 2, shown in the appendix.

Although non-interference is an important correctnessproperty, it does not address certain sources of informationleaks, such as those caused by divergence, abrupt termi-nation, timing channels, or input-output operations, as dis-cussed in [Askarov et al. 2008]. Addressing these informa-tion leaks remains an important topic for future work.

4. Sparse Labeling Semantics for λinfo

The universal labeling strategy incurs a significant overheadto allocate, track, and manipulate the labels attached to eachvalue. Moreover, programs typically exhibit a significantamount of label locality, where most or all items in a datastructure will likely have identical labels. For example, ina browser setting, most values will likely be created andmanipulated within a single information flow domain.

We exploit this label locality property to avoid introduc-ing an explicit label on every data item. Instead, we leavelabels implicit (i.e., determined by context) whenever pos-sible, and introduce explicit labels only for values that mi-grate between information flow domains. This strategy ofsparse labeling eliminates a significant fraction of the over-head usually associated with dynamic information flow. At

Page 6: Efficient Purely-Dynamic Information Flow Analysis · Efficient Purely-Dynamic Information Flow Analysis ... a prerequisite to policy enforcement. We formalize our information flow

the same time, sparse labeling has no effect on program se-mantics and is observably equivalent to universal labeling.

Figure 4 revises our earlier operational semantics to in-corporate sparse labeling. A value v now combines a rawvalue r with an optional label k; if this label is omitted, itis interpreted as being ⊥. In addition, each value is implic-itly labeled with the current program counter label pc. Thefollowing function labelpc extracts the true label of a valuewith respect to a program counter label pc:

labelpc(r)def= pc

labelpc(rk) def= pc t k

The revised sparse labeling evaluation relation:

σ, θ, e ↓pc σ′, v

is defined via the evaluation rules shown in Figure 4. Thelabel pc is implicitly applied to all values in both θ andv. Thus, many rules (e.g., [S-CONST], [S-FUN], and [S-VAR])can ignore labeling issues entirely and incur no informationlabeling overhead.

For the other constructs, we provide two rules: a fast pathfor unlabeled values, and a slower rule that deals with explic-itly labeled values.1 For function applications, the fast pathrule [S-APP] handles applications of an unlabeled closure ina straightforward manner with no labeling overhead. If theclosure has label k, then the second rule [S-APP-SLOW] addsthat label to the program counter before invoking the callee,and also adds k to the result of the function application. Thisrule uses the operation 〈k〉pc v, which applies the label k toa value v, unless k is subsumed by the implicit label pc.

〈k〉pc rdef=

{r if k v pcrk otherwise

〈k〉pc (rl) def= rktl

The rule [S-REF] allocates a reference cell at address a tohold a value v. To avoid making the implicit label pc on v ex-plicit, each address a has an associated label label(a), whichis implicitly applied to the value at that address. Hence, byallocating an address a where label(a) = pc, we avoid ex-plicitly labeling v.2

The fast path assignment rule [S-ASSIGN] checks that thetarget address a came from the current domain pc via theantecedent pc = label(a). If this fast-path check passes, thenthe no-sensitive-upgrade rule holds, and also the implicit pclabel on the assigned value v can be left implicit.

1 A dynamically-typed language such as JavaScript already has slow pathsto deal with various exceptional situations (such as attempting to apply anon-function) so handling explicitly-labeled value might naturally fit withinthese existing slow paths.2 An implementation might represent the label on addresses by associatingan entire page of addresses with a particular label.

The slow path rule [S-ASSIGN-SLOW] handles the moregeneral case. This rule extracts k as the label on the targetaddress (where k = ⊥ if that address has no explicit label);identifies the implicit labelm for values at address a; checksthat (pc t k) is not more secret than the label on the valueat address a; and appropriately labels the new value beforestoring it at address a.

Figure 5 shows how this sparse-labeling evaluation strat-egy extends to the various encoded constructs; these derivedrules again match our expectations.

4.1 Correctness for Sparse LabelingAs before, our non-interference argument is based on thenotion ofH-equivalent values, but we now parameterize thatequivalence relation over the implicit label pc. Thus, the newH-equivalence relation v1 ∼pc

H v2 holds if either:

1. v1 = v2

2. H v labelpc(v1) and H v labelpc(v2).

3. v1 = (λx.e, θ1)k and v2 = (λx.e, θ2)k and θ1 ∼pcH θ2.

Similarly, two substitutions areH-equivalent with respect toan implicit label pc (written θ1 ∼pc

H θ2) if they have the samedomain and

∀x ∈ dom(θ1). θ1(x) ∼pcH θ2(x)

We begin by noting some straightforward properties of la-beling and H-equivalence.

LEMMA 4. pc v labelpc(v).

LEMMA 5. If H v k then v1 ∼kH v2.

LEMMA 6 (H-Equivalence). The relations ∼pcH values and

substitutions are equivalence relations.

LEMMA 7 (Monotonicity of H-Equivalence).If k v l then ∼k

H ⊆ ∼lH .

LEMMA 8 (Labeling Equivalence).If v1 ∼k

H v2 then 〈k〉pc v1 ∼pcH 〈k〉pc v2.

Two stores σ1 and σ2 are H-compatible (written σ1 ≈H

σ2) if they are H-equivalent at all common addresses, i.e.,

∀a ∈ (dom(σ1) ∩ dom(σ2)). σ1(a) ∼label(a)H σ2(a)

Note that since every address a has an implicit label label(a),the H-compatible relation is not parameterized by pc.

If an evaluation returns an address a, then the label onthat address is at least label(a).

LEMMA 9. If σ, θ, e ⇓pc σ′, ak then label(a) v (pc t k).The following lemma proves that evaluation with a H-

labeled program counter cannot influence L-labeled data.

LEMMA 10 (Evaluation Preserves Compatibility).If σ, θ, e ↓H σ′, v then σ ≈H σ′.

Page 7: Efficient Purely-Dynamic Information Flow Analysis · Efficient Purely-Dynamic Information Flow Analysis ... a prerequisite to policy enforcement. We formalize our information flow

Figure 4: Sparse Labeling Semantics for λinfo

Runtime Syntaxr ∈ RawValues ::= c | a | (λx.e, θ)v ∈ Values ::= r | rk

θ ∈ Substs = Var →p Values

σ ∈ Stores = Address →p Values

Big-Step Evaluation Rules: σ, θ, e ↓pc σ′, v

σ, θ, c ↓pc σ, c[S-CONST]

σ, θ, (λx.e) ↓pc σ, (λx.e, θ)[S-FUN]

σ, θ, x ↓pc σ, θ(x)[S-VAR]

σ, θ, e1 ↓pc σ1, (λx.e, θ′)

σ1, θ, e2 ↓pc σ2, v2σ2, θ

′[x := v2], e ↓pc σ′, v

σ, θ, (e1 e2) ↓pc σ′, v[S-APP]

σ, θ, e1 ↓pc σ1, cσ1, θ, e2 ↓pc σ2, d

r = [[c]](d)

σ, θ, (e1 e2) ↓pc σ2, r[S-PRIM]

σ, θ, e ↓pc σ′, vσ, θ, 〈k〉e ↓pc σ′, 〈k〉pc v

[S-LABEL]

σ, θ, e ↓pc σ′, va 6∈ dom(σ′)label(a) = pc

σ, θ, (ref e) ↓pc σ′[a := v], a[S-REF]

σ, θ, e ↓pc σ′, aσ, θ, !e ↓pc σ′, σ′(a)

[S-DEREF]

σ, θ, e1 ↓pc σ1, aσ1, θ, e2 ↓pc σ2, v

pc = label(a)

σ, θ, (e1:= e2) ↓pc σ2[a := v], v[S-ASSIGN]

σ, θ, e1 ↓pc σ1, (λx.e, θ′)

k

σ1, θ, e2 ↓pc σ2, v2σ2, θ

′[x := v2], e ↓pctk σ′, v

σ, θ, (e1 e2) ↓pc σ′, 〈k〉pc v[S-APP-SLOW]

σ, θ, e1 ↓pc σ1, ck

σ1, θ, e2 ↓pc σ2, dl

r = [[c]](d)

σ, θ, (e1 e2) ↓pc σ2, 〈k t l〉pc r[S-PRIM-SLOW]

σ, θ, e ↓pc σ′, ak

σ, θ, !e ↓pc σ′, 〈k〉pc σ′(a)[S-DEREF-SLOW]

σ, θ, e1 ↓pc σ1, ak

σ1, θ, e2 ↓pc σ2, vm = label(a)

(pc t k) v labelm(σ2(a)))v′ = 〈pc t k〉m v

σ, θ, (e1:= e2) ↓pc σ2[a := v′], v[S-ASSIGN-SLOW]

Page 8: Efficient Purely-Dynamic Information Flow Analysis · Efficient Purely-Dynamic Information Flow Analysis ... a prerequisite to policy enforcement. We formalize our information flow

Figure 5: Sparse Labeling for Encodings

σ, θ, e1 ↓pc σ1, trueσ1, θ, e2 ↓pc σ′, v

σ, θ, (if e1 then e2 else e3) ↓pc σ′, v[S-THEN]

σ, θ, e1 ↓pc σ1, v1σ1, θ, e2 ↓pc σ2, v2

σ, θ, (pair e1 e2) ↓pc σ2, (v1, v2)[S-PAIR]

σ, θ, e ↓pc σ′, (v1, v2)σ, θ, (fst e) ↓pc σ′, v1

[S-FST]

σ, θ, e ↓pc σ′, (v1, v2)σ, θ, (snd e) ↓pc σ′, v2

[S-SND]

σ, θ, e1 ⇓pc σ1, v1σ1, θ[x := v1], e2 ⇓pc σ′, v

σ, θ, (let x = e1 in e2) ⇓pc σ′, v[S-LET]

σ, θ, e1 ⇓pc σ1, v1σ1, θ, e2 ⇓pc σ′, v

σ, θ, (e1; e2) ⇓pc σ′, v[S-SEMI]

σ, θ, e1 ↓pc σ1, truek

σ1, θ, e2 ↓pctk σ′, v

σ, θ, (if e1 then e2 else e3) ↓pc σ′, 〈k〉pc v[S-THEN-SLOW]

σ, θ, e ↓pc σ′, (v1, v2)k

σ, θ, (fst e) ↓pc σ′, 〈k〉pc v1[S-FST-SLOW]

σ, θ, e ↓pc σ′, (v1, v2)k

σ, θ, (snd e) ↓pc σ′, 〈k〉pc v2[S-SND-SLOW]

PROOF By induction on the derivation of σ, e ⇓H σ′, v, andcase analysis on the final rule in the derivation.

• [S-CONST], [S-FUN], [S-VAR]: σ′ = σ.• [S-APP], [S-APP-SLOW], [S-LABEL], [S-PRIM], [S-PRIM-SLOW],

[S-DEREF], [S-DEREF-SLOW]: By induction.• [S-REF]: σ and σ′ agree on their common domain.• [S-ASSIGN]: Let σ′ = σ2[a := v]. From the no-sensitive-

upgrade check,H = label(a). By Lemma 5, σ2(a) ∼HH v

and so σ2 ≈H σ′. By induction, σ ≈H σ1 ≈H σ2. Also,dom(σ) ⊆ dom(σ1) ⊆ dom(σ2) = dom(σ′). Hence,σ ≈H σ′.• [S-ASSIGN-SLOW]: Similar.

We next show that non-inference holds for the sparse-labeling semantics: if e is executed twice fromH-compatiblestores and H-equivalent substitutions, then the two execu-tions yield H-compatible resulting stores and H-equivalentresulting values.

THEOREM 2 (Non-Interference for Sparse Labeling).If

σ1 ≈H σ2

θ1 ∼pcH θ2

σ1, θ1, e ↓pc σ′1, v1σ2, θ2, e ↓pc σ′2, v2

thenσ′1 ≈H σ′2v1 ∼pc

H v2

PROOF By induction on the derivation σ1, θ1, e ↓pc σ′1, v1and case analysis on the last rule used in that derivation. Thedetails of the case analysis are presented in Appendix A.

5. Experimental ResultsIn order to evaluate the relative costs of universal and sparselabeling, we developed three different language implemen-tations. The implementations all support the same language,which is an extension of λinfo with features necessary for re-alistic programming. These features include pairs and listsbuilt as a native part of the language, strings, and associatedutility functions. The three implementations are:

• NOLABEL is a traditional interpreter that performs nolabeling or information flow analysis, and so establishesour baseline for performance;• UNIVERSALLABEL, which implements the universal la-

beling semantics; and• SPARSELABEL, which implements the sparse labeling

semantics.

Page 9: Efficient Purely-Dynamic Information Flow Analysis · Efficient Purely-Dynamic Information Flow Analysis ... a prerequisite to policy enforcement. We formalize our information flow

Benchmark NOLABEL UNIVERSALLABEL SPARSELABEL

(secs/100k runs) (vs NOLABEL) (vs. NOLABEL) (vs. UNIVERSALLABEL)SumList 2.295382 1.94 0.79 0.41

UserPwdFine 1.248581 1.63 1.12 0.68UserPwdCoarse 1.251994 2.45 1.03 0.42

FileSys0 23.206768 3.38 1.07 0.32FileSys25 24.843616 3.00 1.22 0.41FileSys50 24.840610 3.54 1.27 0.36FileSys100 24.455563 4.12 1.62 0.39

FileSysExplicit 24.470711 Information leak prevented Information leak prevented Information leak preventedImplicitFlowTrue 0.028825 Information leak prevented Information leak prevented Information leak preventedImplicitFlowFalse 0.031577 1.04 1.01 0.98

Average - 2.64 1.14 0.50Table 1: Benchmark Results

We compared these implementations on the followingbenchmark programs:

• SumList: Calculates the sum for a list of 100 numbers.There are no labels so that we can show the overheadwhen information flow is not needed.• UserPwdFine: Simulates a login by looking up a user-

name and password in an association list. The passwordsstored in the list are labeled as “secret”.• UserPwdCoarse: Identical to UserPwdFine, except that

the entire association list is labeled as “secret”.• FileSys0: Reads a file from an in-memory file system

implemented in our target language, and represented asa directory tree structure. The file system contains 1023directories and 2048 regular files, and contains no non-trivial labels.• FileSys25, FileSys50, and FileSys100: Identical toFileSys0, except that 25%, 50%, and 100% of the filesand directories are labeled as “secret”, respectively.• FileSysExplicit: Identical to FileSys100, except

that this benchmark causes a information leak by an ex-plicit flow.• ImplicitFlowTrue and ImplicitFlowFalse: Imple-

ments the implicit information flow leak example dis-cussed in Section 3, where the confidential variable x isgiven values of true and false , respectively.

We ran our tests on a MacBook Pro with a 2.6 GHz IntelCore 2 Duo processor, 4 gigabytes of RAM, and runningOS X version 10.5.6. All three language implementationswere interpreters written in Objective Caml and compiled tonative code with ocamlopt version 3.10.0. All benchmarkswere run 100,000 times, and Table 1 summarizes the results.

In almost all cases, NOLABEL performs the fastest butpermits information leaks, as on FILESYSEXPLICIT andIMPLICITFLOWTRUE benchmarks. (Note that IMPLICIT-FLOWFALSE leaks one bit of termination information in allthree implementations, as expected.) Column three showsthe slowdown of UNIVERSALLABEL over NOLABEL, which

is on average more than a 2.6x slowdown, and may be unac-ceptable in many situations. In contrast, column five showsthat the SPARSELABEL running time is only 50% of theUNIVERSALLABEL running time. Thus, our results showthat the sparse labeling runs much closer to the speed ofcode with no labels.

Our tests also identified an additional, unexpected bene-fit of the sparse labeling strategy. The SPARSELABEL im-plementation was noticeably less affected by differences inthe style of programmer annotations. This is most visiblein the results of UserPwdFine and UserPwdCoarse. TheUNIVERSALLABEL implementation suffered a 50% perfor-mance penalty, even though there were less annotations.Whenever a field is pulled from a secure list, it must begiven a label matching the list. In contrast, the SPARSELA-BEL implementation’s performance was comparable on bothUserPwdFine and UserPwdCoarse. Thus, with a sparse la-beling strategy, the programmer is to some degree insulatedfrom performance concerns, and can instead focus on theproper policy from a security perspective.

While these experimental results are for a preliminary,interpreter-based implementation, these results do suggestthat sparse labeling may also provide significant benefits ina highly-optimized language implementation. We are cur-rently adding sparse labeling into the Narcissus JavaScriptimplementation [Eich] and are exploring how to incorporatethese ideas into the SpiderMonkey trace-based compiler forJavaScript [Gal et al. 2009].

6. Related WorkDenning’s seminal work [Denning 1976] outlines the gen-eral approach to dynamic information flow. Denning andDenning [1977] presents a static analysis to certify pro-grams as being information-flow secure. Sabelfeld and My-ers [2003] provide an extensive survey of prior research oninformation flow. Among other things, they discuss vari-ous covert channels, including timing channels and resourceexhaustion channels. We do not address these attacks inλinfo, and instead focus only on implicit and explicit flows.Venkatakrishnan et al. [2006] perform a hybrid of static and

Page 10: Efficient Purely-Dynamic Information Flow Analysis · Efficient Purely-Dynamic Information Flow Analysis ... a prerequisite to policy enforcement. We formalize our information flow

dynamic analysis. The static analysis is used to transformthe code in order to instrument it with the appropriate run-time checks.

Several approaches use type systems for information flowanalysis. Volpano et al. [1996] introduce a type system basedon Denning’s model and proved its soundness. Heintze andRiecke [1998] create a simple calculus and show how it canbe expanded to deal with concurrency, assignment, and in-tegrity. Pottier and Simonet [2003] introduce type inferenceto ML (specifically a variation called Core ML).

Non-interference is one of the most common correctnesscriteria for information flow analyses. Barthe et al. [2004]discuss better approaches for analyzing non-interference,which were extended by Terauchi and Aiken [2005]. McLean[1992] shows that non-interference can be proved on a trace,rather than the usual intermediate step of a state machine.Boudol [2008] argues that non-interference is not necessar-ily the best property to use, specifically because it rules outprograms that deliberately declassify information. Instead,the author suggests using an intensional notion of security.

Fenton [1974] presents a dynamic analysis for informa-tion flow; the analysis requires that each mutable variablehas a fixed security label, which is somewhat restrictive. Ourapproach allows these security labels to be dynamically up-graded, while the no-sensitive-upgrade check still prohibitsimplicit information leaks.

A few papers highlight the challenges of working withmore advanced features of languages. Banerjee and Nau-mann [2002] address complications caused by dynamicmemory allocation for information flow analysis. King et al.[2008] highlight the problems with false alarms caused byimplicit flows, and in particular exceptions. λinfo has neitherof these features; extending it to address these topics remainsan area for future work.

Web programming has become one of the central targetsfor information flow analyses. On the server side, Haldaret al. [2005] introduce dynamic taint propagation for Java.Lam et al. [2008] focus on defending against SQL injectionand cross-site scripting attacks. Zheng and Myers [2008] ad-dress web encryption specifically through static informationflow analysis. Dealing with information release (e.g., forpassword validation) is an interesting case for informationflow, since certain outputs must be declassified; Both Chongand Myers [2004] and Fournet and Rezk [2008] address this.

In addition to the server-side, there has been a great dealof interest in information flow analysis for client-side pro-gramming. Primarily, this centers around Java applets and,more recently, JavaScript. Myers and Liskov [1997] use adecentralized information control model for Java applets.JFlow [Myers 1999] has become one of the standards forinformation flow analysis on the JVM. Chandra and Franz[2007] also combine static and dynamic analysis for theJVM, but permitting the information flow policies to bechanged at runtime. Interestingly, client-side research is

more focused on confidentiality, whereas server-side pro-gramming tends to address integrity concerns in more depth.

Unlike with Java code, JavaScript is not compiled in ad-vance. Faced with this limitation,Vogt et al. [2007] add in-formation flow analysis to JavaScript that relies on dynamicanalysis whenever possible. Although we have not directlydealt with JavaScript, this is one of our central motivations.

Although static analysis has been considered indispens-able in many approaches, the benefits of dynamic analysisare becoming appreciated. Le Guernic et al. [2006] discussusing dynamic automaton-based monitoring. Askarov et al.[2008] highlight the risks with Denning-style analysis. Inparticular, they show that if intermediary output is allowed,the assumption that only one bit of information leaks withtermination is not valid. Although λinfo does not technicallypermit intermediary output, it will be a clear concern as weextend sparse labeling to more realistic languages. Malacariaand Chen [2008] also provide a framework for quantifyingexactly how much information can leak with a given model.O’Neill et al. [2006] have highlighted some of the complica-tions that are introduced through interactive programs. Sincemany web-based applications fall into this domain, this is ofparticular interest to us.

Our sparse labeling strategy is inspired by prior work oncontracts [Findler 2002] and language interoperation [Grayet al. 2005]. In particular, each security domain can beviewed as a separate “language”, and explicit labels functionas proxies that permit transparent interoperation betweenthese multiple languages.

7. ConclusionsWith the increasing importance of JavaScript and similarlanguages, fast and correct information flow analysis at runtime is essential. We have shown that, through sparse label-ing, it is possible to track information flow dynamically withreduced overhead. We believe these techniques may help tofurther safe client-side scripting.

AcknowledgmentsWe thank Brendan Eich, Andreas Gal, and Michael Franz forvaluable discussions on dynamic information flow analysis.

ReferencesAslan Askarov, Sebastian Hunt, Andrei Sabelfeld, and David

Sands. Termination-insensitive noninterference leaks more thanjust a bit. In ESORICS ’08: Proceedings of the 13th EuropeanSymposium on Research in Computer Security, pages 333–348,Berlin, Heidelberg, 2008. Springer-Verlag.

Anindya Banerjee and David A. Naumann. Secure informa-tion flow and pointer confinement in a java-like language. InIEEE Computer Security Foundations Workshop, pages 253–267. IEEE Computer Society, 2002.

Gilles Barthe, Pedro R. D’Argenio, and Tamara Rezk. Secure in-formation flow by self-composition. In IEEE Computer Security

Page 11: Efficient Purely-Dynamic Information Flow Analysis · Efficient Purely-Dynamic Information Flow Analysis ... a prerequisite to policy enforcement. We formalize our information flow

Foundations Workshop, pages 100–114. IEEE Computer Soci-ety, 2004.

Gerard Boudol. Secure information flow as a safety property. InPierpaolo Degano, Joshua D. Guttman, and Fabio Martinelli,editors, Formal Aspects in Security and Trust, volume 5491of Lecture Notes in Computer Science, pages 20–34. Springer,2008.

Deepak Chandra and Michael Franz. Fine-grained information flowanalysis and enforcement in a java virtual machine. pages 463–475, Dec. 2007.

Stephen Chong and Andrew C. Myers. Security policies for down-grading. In CCS ’04: Proceedings of the 11th ACM confer-ence on Computer and communications security, pages 198–209, New York, NY, USA, 2004. ACM.

Dorothy E. Denning. A lattice model of secure information flow.Communications of the ACM, 19(5):236–243, 1976.

Dorothy E. Denning and Peter J. Denning. Certification of pro-grams for secure information flow. Communications of the ACM,20(7):504–513, 1977.

Brendan Eich. Narcissus–JS implemented in JS. Avail-able on the web at http://mxr.mozilla.org/

mozilla/source/js/narcissus/.

J. S. Fenton. Memoryless subsystems. The Computer Journal, 17(2):143–147, 1974.

Robert Bruce Findler. Behavioral Software Contracts. PhD thesis,Rice University, 2002.

Cedric Fournet and Tamara Rezk. Cryptographically sound im-plementations for typed information-flow security. In Sympo-sium on Principles of Programming Languages, pages 323–335,2008.

Andreas Gal, Brendan Eich, Mike Shaver, David Anderson, BlakeKaplan, Graydon Hoare, David Mandelin, Boris Zbarsky, JasonOrendorff, Michael Bebenita, Mason Chang, Michael Franz, Ed-win Smith, Rick Reitmaier, and Mohammad Haghighat. Trace-based just-in-time type specialization for dynamic languages. InConference on Programming Language Design and Implemen-tation, 2009.

Joseph A. Goguen and Jose Meseguer. Security policies and se-curity models. IEEE Symposium on Security and Privacy, 0:11,1982.

Kathryn E. Gray, Robert Bruce Findler, and Matthew Flatt. Fine-grained interoperability through mirrors and contracts. In OOP-SLA ’05: Proceedings of the 20th annual ACM SIGPLAN confer-ence on Object-oriented programming, systems, languages, andapplications, pages 231–245, 2005.

Vivek Haldar, Deepak Chandra, and Michael Franz. Dynamictaint propagation for java. In ACSAC, pages 303–311. IEEEComputer Society, 2005.

Nevin Heintze and Jon G. Riecke. The slam calculus: Programmingwith secrecy and integrity. In Symposium on Principles ofProgramming Languages, pages 365–377, 1998.

Dave King, Boniface Hicks, Michael Hicks, and Trent Jaeger. Im-plicit flows: Can’t live with ’em, can’t live without ’em. In In-ternational Conference on Information Systems Security, pages56–70, 2008.

Monica S. Lam, Michael Martin, V. Benjamin Livshits, and JohnWhaley. Securing web applications with static and dynamicinformation flow tracking. In Robert Gluck and Oege de Moor,editors, ACM SIGPLAN Workshop on Partial Evaluation andProgram Manipulation, pages 3–12. ACM, 2008.

Gurvan Le Guernic, Anindya Banerjee, Thomas Jensen, and DavidSchmidt. Automata-based confidentiality monitoring. 2006.URL http://hal.inria.fr/inria-00130210/en/.

Pasquale Malacaria and Han Chen. Lagrange multipliers and max-imum information leakage in different observational models.In ACM SIGPLAN Workshop on Programming Languages andAnalysis for Security, pages 135–146, 2008.

John McLean. Proving noninterference and functional correctnessusing traces. Journal of Computer Security, 1(1):37–58, 1992.

Andrew C. Myers. Jflow: Practical mostly-static information flowcontrol. In Symposium on Principles of Programming Lan-guages, pages 228–241, 1999.

Andrew C. Myers and Barbara Liskov. A decentralized model forinformation flow control. In Symposium on Operating SystemPrinciples, pages 129–142, 1997.

Kevin R. O’Neill, Michael R. Clarkson, and Stephen Chong.Information-flow security for interactive programs. In IEEEComputer Security Foundations Workshop, pages 190–201.IEEE Computer Society, 2006.

Francois Pottier and Vincent Simonet. Information flow inferencefor ml. Transactions on Programming Languages and Systems,25(1):117–158, 2003.

Andrei Sabelfeld and Andrew C. Myers. Language-basedinformation-flow security. Selected Areas in Communications,IEEE Journal on, 21(1):5–19, Jan 2003.

Tachio Terauchi and Alexander Aiken. Secure information flow asa safety problem. In Chris Hankin and Igor Siveroni, editors,SAS, volume 3672 of Lecture Notes in Computer Science, pages352–367. Springer, 2005.

V. N. Venkatakrishnan, Wei Xu, Daniel C. DuVarney, and R. Sekar.Provably correct runtime enforcement of non-interference prop-erties. In Information and Communications Security, pages 332–351, 2006.

Philipp Vogt, Florian Nentwich, Nenad Jovanovic, En-gin Kirda, Christopher Kruegel, and Giovanni Vi-gna. Cross site scripting prevention with dynamic datatainting and static analysis. February 2007. URLhttp://www.infosys.tuwien.ac.at/Staff/ek/

papers/xss prevention.pdf.

Dennis Volpano, Cynthia Irvine, and Geoffrey Smith. A sound typesystem for secure flow analysis. Journal of Computer Security,4(2-3):167–187, 1996.

Stephan Arthur Zdancewic. Programming languages for informa-tion security. PhD thesis, Ithaca, NY, USA, 2002. Chair-Myers,,Andrew.

Lantian Zheng and Andrew C. Myers. Securing nonintrusive webencryption through information flow. In ACM SIGPLAN Work-shop on Programming Languages and Analysis for Security,pages 125–134, 2008.

Page 12: Efficient Purely-Dynamic Information Flow Analysis · Efficient Purely-Dynamic Information Flow Analysis ... a prerequisite to policy enforcement. We formalize our information flow

A. Non-Interference for Sparse LabelingRESTATEMENT OF THEOREM 2 (Non-Interference for SparseLabeling). If

σ1 ≈H σ2

θ1 ∼pcH θ2

σ1, θ1, e ↓pc σ′1, v1σ2, θ2, e ↓pc σ′2, v2

thenσ′1 ≈H σ′2v1 ∼pc

H v2

PROOF By induction on the derivation σ1, θ1, e ↓pc σ′1, v1and case analysis on the last rule used in that derivation.

Note that any derivation via the [S-APP] rule can be de-rived via the [S-APP-SLOW] rule, and similarly for the other[. . . -SLOW] rules, and so we assume without loss of general-ity that both evaluations are via the [. . . -SLOW] rules when-ever possible.

• [S-CONST]: Then e = c and σ′1 = σ1 ≈H σ2 = σ′2 andv1 = v2 = c.• [S-VAR]: Then e = x and σ′1 = σ1 ≈H σ2 = σ′2 andv1 = θ1(x) ∼pc

H θ2(x) = v2.• [S-FUN]: Then e = λx.e′ and σ′1 = σ1 ≈H σ2 = σ′2 andv1 = (λx.e′, θ1) ∼pc

H (λx.e′, θ2) = v2.• [S-APP-SLOW]: In this case, e = (ea eb), and from the

antecedents of this rule, we have that for i ∈ 1, 2:

σi, θi, ea ↓pc σ′′i , (λx.ei, θ′i)

ki

σ′′i , θi, eb ↓pc σ′′′i , v′i

σ′′′i , θ′i, ei[x := v′i] ↓pctki σ

′i, v′′i

vi = 〈ki〉pc v′′i

By induction:

σ′′1 ≈H σ′′2σ′′′1 ≈H σ′′′2(λx.e1c)k1 ∼pc

H (λx.e2c)k2

v′1 ∼pcH v′2

If k1 and k2 are both at least H (with respect to pc)then v1 ∼pc

H v2, since they both have label at least H .By Lemma 10, σ′1 ≈H σ′′′1 ≈H σ′′′2 ≈H σ′2, and weneed to conclude that σ′1 ≈H σ′2.We know that dom(σ′i) ⊇ dom(σ′′′i ), since executiononly allocates additional reference cells. Without lossof generality, we assume that the two executions allo-cate reference cells from disjoint parts of the addressspace,3 i.e.:

(dom(σ′i) \ dom(σ′′′i )) ∩ dom(σ′2−i) = ∅

3 We refer the interested reader to [Banerjee and Naumann 2002] for analternative proof argument that does use of this assumption, but whichinvolves a more complicated compatibility relation on stores.

Under this assumption, the only common addresses inσ′1 and σ′2 are also the common addresses in σ′′′1 andσ′′′2 , and hence we have that σ′1 ≈H σ′2.

If k1 and k2 are not both at least H (with respectto pc), then θ′1 ∼

pcH θ′2 and e1 = e2 and k1 = k2.

By induction, σ′1 ≈H σ′2 and v′′1 ∼pcH v′′2 , and hence

v′1 ∼pcH v′2.

• [S-PRIM-SLOW]: This case holds via a similar argument.• [S-REF]: In this case, e = ref e′. Without loss of gener-

ality, we assume that both evaluation allocate at the sameaddress a 6∈ dom(σ1) ∪ dom(σ2), and so a = v1 = v2.From the antecedents of this rule, we have that for i ∈1, 2:

σi, θi, e′ ↓pc σ′′i , v′i

σ′i = σ′′i [a := v′i]

By induction, σ′′1 ≈H σ′′2 and v′1 ∼pcH v′2, and so σ′1 ≈H

σ′2 as label(a) = pc.• [S-DEREF-SLOW]: In this case, e = !e′, and from the

antecedents of this rule, we have that for i ∈ 1, 2:

σi, θi, e ↓pc σ′i, akii

vi = 〈ki〉pc σ′i(ai)

By induction, σ′1 ≈H σ′2 and ak11 ∼

pcH ak2

2 .

If k1 and k2 are both at least H (wrt pc), then v1 ∼pcH

v2, since they both have label at least H (wrt pc).

Otherwise, a1 = a2 and k1 = k2 and σ′1(a) ∼label(a)H

σ′2(a). By Lemma 9, label(a) v k1, and so byLemma 7, σ′1(a) ∼

k1H σ′2(a). By Lemma 8, v1 ∼pc

H v2.

• [S-ASSIGN-SLOW] In this case, e = (ea:= eb), and fromthe antecedents of this rule, we have that for i ∈ 1, 2:

σi, θi, ea ↓pc σ′′i , akii

σ′′i , θi, eb ↓pc σ′′′i , vi

mi = label(ai)(pc t ki) v labelmi

(σ′′′i (ai))σ′i = σ′′′i [ai := 〈pc t ki〉mi vi]

By induction:

σ′′1 ≈H σ′′2 σ′′′1 ≈H σ′′′2ak11 ∼

pcH ak2

2 v1 ∼pcH v2

If ak11 = ak2

2 then let l = m1 = m2. By Lemma 8,〈pc〉l v1 ∼l

H 〈pc〉l v2, and hence σ′1 ≈H σ′2 from theabove.

Otherwise H v ki v labelmi(σ′′′i (ai)). Hence

σ′1 ≈H σ′2.