central moment analysis for cost accumulators in
TRANSCRIPT
Central Moment Analysis for Cost Accumulators inProbabilistic Programs
Technical Report
Di WangCarnegie Mellon University
USA
Jan HoffmannCarnegie Mellon University
USA
Thomas RepsUniversity of Wisconsin
USA
AbstractFor probabilistic programs, it is usually not possible to au-tomatically derive exact information about their properties,such as the distribution of states at a given program point.Instead, one can attempt to derive approximations, suchas upper bounds on tail probabilities. Such bounds can beobtained via concentration inequalities, which rely on themoments of a distribution, such as the expectation (the firstraw moment) or the variance (the second central moment).Tail bounds obtained using central moments are often tighterthan the ones obtained using raw moments, but automati-cally analyzing central moments is more challenging.
This paper presents an analysis for probabilistic programsthat automatically derives symbolic upper and lower boundson variances, as well as higher central moments, of cost ac-cumulators. To overcome the challenges of higher-momentanalysis, it generalizes analyses for expectations with analgebraic abstraction that simultaneously analyzes differentmoments, utilizing relations between them. A key innova-tion is the notion of moment-polymorphic recursion, and apractical derivation system that handles recursive functions.The analysis has been implemented using a template-
based technique that reduces the inference of polynomialbounds to linear programming. Experiments with our pro-totype central-moment analyzer show that, despite the ana-lyzerâs upper/lower bounds on various quantities, it obtainstighter tail bounds than an existing system that uses onlyraw moments, such as expectations.
Keywords: Probabilistic programs, central moments, costanalysis, tail bounds
1 IntroductionProbabilistic programs [18, 25, 30] can be used to manipulateuncertain quantities modeled by probability distributionsand random control flows. Uncertainty arises naturally inBayesian networks, which capture statistical dependencies(e.g., for diagnosis of diseases [23]), and cyber-physical sys-tems, which are subject to sensor errors and peripheral dis-turbances (e.g., airborne collision-avoidance systems [29]).
PLDI â21, June 20â25, 2021, Virtual, UK2021. ACM ISBN 978-1-4503-8391-2/21/06. . . $15.00https://doi.org/10.1145/3453483.3454062
Researchers have used probabilistic programs to implementand analyze randomized algorithms [2], cryptographic proto-cols [3], and machine-learning algorithms [16]. A probabilis-tic program propagates uncertainty through a computation,and produces a distribution over results.
In general, it is not tractable to compute the result distribu-tions of probabilistic programs automatically and precisely:Composing simple distributions can quickly complicate theresult distribution, and randomness in the control flow caneasily lead to state-space explosion. Monte-Carlo simula-tion [36] is a common approach to study the result distribu-tions, but the technique does not provide formal guarantees,and can sometimes be inefficient [5].In this paper, we focus on a specific yet important kind
of uncertain quantity: cost accumulators, which are pro-gram variables that can only be incremented or decrementedthrough the program execution. Examples of cost accumu-lators include termination time [5, 11, 12, 24, 33], rewardsin Markov decision processes (MDPs) [35], position infor-mation in control systems [4, 8, 37], and cash flow duringbitcoin mining [43]. Recent work [8, 26, 31, 43] has proposedsuccessful static-analysis approaches that leverage aggregateinformation of a cost accumulator đ , such as đ âs expectedvalue đŒ[đ ] (i.e., đ âs âfirst momentâ). The intuition why itis beneficial to compute aggregate informationâin lieu ofdistributionsâis that aggregate measures like expectationsabstract distributions to a single number, while still indi-cating non-trivial properties. Moreover, expectations aretransformed by statements in a probabilistic program in amanner similar to the weakest-precondition transformationof formulas in a non-probabilistic program [30].
One important kind of aggregate information is moments.In this paper, we show how to obtain central moments (i.e.,đŒ[(đ âđŒ[đ ])đ ] for any đ â„ 2), whereas most previous workfocused on raw moments (i.e., đŒ[đđ ] for any đ â„ 1). Centralmoments can provide more information about distributions.For example, the variance đ [đ ] (i.e., đŒ[(đ â đŒ[đ ])2], đ âsâsecond central momentâ) indicates how đ can deviate fromits mean, the skewness (i.e., đŒ[ (đâđŒ[đ ])3 ]
(đ [đ ]) 3/2 , đ âs âthird standard-ized momentâ) indicates how lopsided the distribution of đis, and the kurtosis (i.e., đŒ[ (đâđŒ[đ ])4 ]
(đ [đ ])2 , đ âs âfourth standard-ized momentâ) measures the heaviness of the tails of thedistribution of đ . One application of moments is to answer
1
arX
iv:2
001.
1015
0v3
[cs
.PL
] 8
Apr
202
1
PLDI â21, June 20â25, 2021, Virtual, UK Di Wang, Jan Hoffmann, and Thomas Reps
queries about tail bounds, e.g., the assertions about proba-bilities of the form â[đ â„ đ], via concentration-of-measureinequalities from probability theory [13]. With central mo-ments, we find an opportunity to obtain more precise tailbounds of the form â[đ â„ đ], and become able to derivebounds on tail probabilities of the form â[|đ â đŒ[đ ] | â„ đ].
Central moments đŒ[(đ â đŒ[đ ])đ ] can be seen as polyno-mials of raw moments đŒ[đ ], · · · ,đŒ[đđ ], e.g., the varianceđ [đ ] = đŒ[(đ âđŒ[đ ])2] can be rewritten as đŒ[đ 2] âđŒ2 [đ ],where đŒđ [đ ] denotes (đŒ[đ ])đ . To derive bounds on centralmoments, we need both upper and lower bounds on the rawmoments, because of the presence of subtraction. For exam-ple, to upper-bound đ [đ ], a static analyzer needs to have anupper bound on đŒ[đ 2] and a lower bound on đŒ2 [đ ].In this work, we present and implement the first fully
automatic analysis for deriving symbolic interval boundson higher central moments for cost accumulators in prob-abilistic programs with general recursion and continuousdistributions. One challenge is to support interproceduralreasoning to reuse analysis results for functions. Our solutionmakes use of a âliftingâ technique from the natural-language-processing community. That technique derives an algebrafor second moments from an algebra for first moments [27].We generalize the technique to develop moment semirings,and use them to derive a novel frame rule to handle functioncalls with moment-polymorphic recursion (see §2.2).Recent work has successfully automated inference of up-
per [31] or lower bounds [43] on the expected cost of prob-abilistic programs. Kura et al. [26] developed a system toderive upper bounds on higher rawmoments of program run-times. However, even in combination, existing approachescannot solve tasks such as deriving a lower bound on thesecond rawmoment of runtimes, or deriving an upper boundon the variance of accumulators that count live heap cells.Fig. 1(a) summarizes the features of related work on mo-ment inference for probabilistic programs. To the best of ourknowledge, our work is the first moment-analysis tool thatsupports all of the listed programming and analysis features.Fig. 1(b) and (c) compare our work with related work interms of tail-bound analysis on a concrete program (see §5).The bounds are derived for the cost accumulator tick in arandom-walk program that we will present in §2. It can beobserved that for đ â„ 20, the most precise tail bound fortick is the one obtained via an upper bound on the varianceđ [tick] (tickâs second central moment).
Our work incorporates ideas known from the literature:- Using the expected-potential method (or ranking super-martingales) to derive upper bounds on the expected pro-gram runtimes or monotone costs [10â12, 14, 26, 31].
- Using the Optional Stopping Theorem from probabilitytheory to ensure the soundness of lower-bound inferencefor probabilistic programs [1, 19, 38, 43].
feature [8] [31] [26] [43] this workloop â â â â
recursion â âcontinuous distributions â â â ânon-monotone costs â â âhigher moments â â âinterval bounds â â â
(a)
[31, 43] [26] this workDerived
đŒ[tick] †2đ + 4 đŒ[tick2] †đ [tick] â€bound 4đ2 + 22đ + 28 22đ + 28
Moment type raw raw centralConcentration Markov Markov Cantelliinequality (degree = 1) (degree = 2)Tail bound â 1
2 â 14
đââââââââ 0â[tick â„ 4đ](b)
20 40 60 80 đ
0.25
0.5
0.75
[31, 43]
[26]
this workâ[tick
â„4đ]
(c)
Figure 1. (a) Comparison in terms of supporting features.(b) Comparison in terms of moment bounds for the runningexample. (c) Comparison in terms of derived tail bounds.
- Using linear programming (LP) to efficiently automatethe (expected) potential method for (expected) cost analy-sis [20, 21, 42].
The contributions of our work are as follows:âą We develop moment semirings to compose the momentsfor a cost accumulator from two computations, and toenable interprocedural reasoning about higher moments.
âą We instantiate moment semirings with the symbolic in-terval domain, use that to develop a derivation systemfor interval bounds on higher central moments for costaccumulators, and automate the derivation via LP solving.
âą We prove the soundness of our derivation system for pro-grams that satisfy the criterion of our recent extension tothe Optional Stopping Theorem, and develop an algorithmfor checking this criterion automatically.
âą We implemented our analysis and evaluated it on a broadsuite of benchmarks from the literature. Our experimentalresults show that on a variety of examples, our analyzer isable to use higher central moments to obtain tighter tailbounds on program runtimes than the system of Kura et al.[26], which uses only upper bounds on raw moments.
2
Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI â21, June 20â25, 2021, Virtual, UK
1 func rdwalk() begin2 { 2(đ â đ„) + 4 }3 if đ„ < đ then4 { 2(đ â đ„) + 4 }5 đĄ ⌠uniform(â1, 2);6 { 2(đ â đ„ â đĄ) + 5 }7 đ„ B đ„ + đĄ ;8 { 2(đ â đ„) + 5 }9 call rdwalk;10 { 1 }11 tick(1)12 { 0 }13 fi14 end
1 # XID def= {đ„, đ, đĄ}
2 # FID def= {rdwalk}
3 # pre-condition: {đ > 0}4 func main() begin5 đ„ B 0;6 call rdwalk7 end
Figure 2. A bounded, biased random walk, implementedusing recursion. The annotations show the derivation of anupper bound on the expected accumulated cost.
2 OverviewIn this section, we demonstrate the expected-potentialmethod for both first-moment analysis (previous work) andhigher central-moment analysis (this work) (§2.1), and dis-cuss the challenges to supporting interprocedural reasoningand to ensuring the soundness of our approach (§2.2).
Example 2.1. The program in Fig. 2 implements a bounded,biased random walk. The main function consists of a singlestatement âcall rdwalkâ that invokes a recursive function.The variables đ„ and đ represent the current position and theending position of the randomwalk, respectively. We assumethat đ > 0 holds initially. In each step, the program samplesthe length of the current move from a uniform distributionon the interval [â1, 2]. The statement tick(1) adds one to acost accumulator that counts the number of steps before therandomwalk ends. We denote this accumulator by tick in therest of this section. The program terminates with probabilityone and its expected accumulated cost is bounded by 2đ + 4.
2.1 The Expected-Potential Method forHigher-Moment Analysis
Our approach to higher-moment analysis is inspired by theexpected-potential method [31], which is also known as rank-ing super-martingales [10, 11, 26, 43], for expected-cost boundanalysis of probabilistic programs.The classic potential method of amortized analysis [39]
can be automated to derive symbolic cost bounds for non-probabilistic programs [20, 21]. The basic idea is to definea potential function đ : ÎŁ â â+ that maps program statesđ â ÎŁ to nonnegative numbers, where we assume eachstate đ contains a cost-accumulator component đ.đŒ . If aprogram executes with initial state đ to final state đ âČ, then itholds that đ (đ) â„ (đ âČ.đŒ â đ.đŒ) + đ (đ âČ), where (đ âČ.đŒ â đ.đŒ)describes the accumulated cost from đ to đ âČ. The potentialmethod also enables compositional reasoning: if a statement
đ1 executes from đ to đ âČ and a statement đ2 executes fromđ âČ to đ âČâČ, then we have đ (đ) â„ (đ âČ.đŒ â đ.đŒ) + đ (đ âČ) andđ (đ âČ) â„ (đ âČâČ.đŒ âđ âČ.đŒ) +đ (đ âČâČ); therefore, we derive đ (đ) â„(đ âČâČ.đŒ â đ.đŒ) + đ (đ âČâČ) for the sequential composition đ1; đ2.For non-probabilistic programs, the initial potential providesan upper bound on the accumulated cost.
This approach has been adapted to reason about expectedcosts of probabilistic programs [31, 43]. To derive upperbounds on the expected accumulated cost of a program đwith initial state đ , one needs to take into consideration thedistribution of all possible executions. More precisely, thepotential function should satisfy the following property:
đ (đ) â„ đŒđâČâŒJđK(đ) [đ¶ (đ, đ âČ) + đ (đ âČ)], (1)where the notationđŒđ„âŒ` [đ (đ„)] represents the expected valueof đ (đ„), where đ„ is drawn from the distribution `, JđK(đ) isthe distribution over final states of executing đ from đ , andđ¶ (đ, đ âČ) def
= đ âČ.đŒ â đ.đŒ is the execution cost from đ to đ âČ.
Example 2.2. Fig. 2 annotates the rdwalk function fromEx. 2.1 with the derivation of an upper bound on the expectedaccumulated cost. The annotations, taken together, define anexpected-potential function đ : ÎŁ â â+ where a programstate đ â ÎŁ consists of a program point and a valuationfor program variables. To justify the upper bound 2(đ âđ„) + 4 for the function rdwalk, one has to show that thepotential right before the tick(1) statement should be atleast 1. This property is established by backward reasoningon the function body:âą For call rdwalk, we apply the âinduction hypothesisâ thatthe expected cost of the function rdwalk can be upper-bounded by 2(đ â đ„) + 4. Adding the 1 unit of potentialneed by the tick statement, we obtain 2(đ â đ„) + 5 as thepre-annotation of the function call.
âą For đ„ B đ„ + đĄ , we substitute đ„ with đ„ + đĄ in the post-annotation of this statement to obtain the pre-annotation.
âą For đĄ ⌠uniform(â1, 2), because its post-annotation is2(đ â đ„ â đĄ) + 5, we compute its pre-annotation as
đŒđĄâŒuniform(â1,2) [2(đ â đ„ â đĄ) + 5]= 2(đ â đ„) + 5 â 2 · đŒđĄâŒuniform(â1,2) [đĄ]= 2(đ â đ„) + 5 â 2 · 1/2 = 2(đ â đ„) + 4,
which is exactly the upper bound we want to justify.
Our approach. In this paper, we focus on derivation ofhigher central moments. Observing that a central momentđŒ[(đ âđŒ[đ ])đ ] can be rewritten as a polynomial of raw mo-ments đŒ[đ ], · · · ,đŒ[đđ ], we reduce the problem of boundingcentral moments to reasoning about upper and lower boundson raw moments. For example, the variance can be writtenas đ [đ ] = đŒ[đ 2] âđŒ2 [đ ], so it suffices to analyze the upperbound of the second moment đŒ[đ 2] and the lower boundon the square of the first moment đŒ2 [đ ]. For higher cen-tral moments, this approach requires both upper and lowerbounds on higher raw moments. For example, consider thefourth central moment of a nonnegative random variable đ :
3
PLDI â21, June 20â25, 2021, Virtual, UK Di Wang, Jan Hoffmann, and Thomas Reps
1 func rdwalk() begin2 { âš1, 2(đ â đ„) + 4, 4(đ â đ„)2 + 22(đ â đ„) + 28â© }3 if đ„ < đ then4 { âš1, 2(đ â đ„) + 4, 4(đ â đ„)2 + 22(đ â đ„) + 28â© }5 đĄ ⌠uniform(â1, 2);6 { âš1, 2(đ â đ„ â đĄ) + 5, 4(đ â đ„ â đĄ)2 + 26(đ â đ„ â đĄ) + 37â© }7 đ„ B đ„ + đĄ ;8 { âš1, 2(đ â đ„) + 5, 4(đ â đ„)2 + 26(đ â đ„) + 37â© }9 call rdwalk;10 { âš1, 1, 1â© }11 tick(1)12 { âš1, 0, 0â© }13 fi14 end
Figure 3. Derivation of an upper bound on the first andsecond moment of the accumulated cost.
đŒ[(đ âđŒ[đ ])4] = đŒ[đ 4] â4đŒ[đ 3]đŒ[đ ] +6đŒ[đ 2]đŒ2 [đ ] â3đŒ4 [đ ] .Deriving an upper bound on the fourth central moment re-quires lower bounds on the first (đŒ[đ ]) and third (đŒ[đ 3])raw moments.We now sketch the development of moment semirings.
We first consider only the upper bounds on higher mo-ments of nonnegative costs. To do so, we extend the rangeof the expected-potential function đ to real-valued vectors(â+)đ+1, wheređ â â is the degree of the target moment.We update the potential inequality (1) as follows:
đ (đ) â„ đŒđâČâŒJđK(đ) [âââââââââââââââââšđ¶ (đ, đ âČ)đâ©0â€đâ€đ â đ (đ âČ)], (2)
whereââââââââââšđŁđâ©0â€đâ€đ denotes an (đ + 1)-dimensional vector,
the order †on vectors is defined pointwise, and â is somecomposition operator. Recall that JđK(đ) denotes the distri-bution over final states of executing đ from đ , and đ¶ (đ, đ âČ)describes the cost for the execution from đ to đ âČ. Intuitively,for đ (đ) = ââââââââââââââšđ (đ)đâ©0â€đâ€đ and each đ , the component đ (đ)đis an upper bound on the đ-th moment of the cost for thecomputation starting from đ . The 0-th moment is the ter-mination probability of the computation, and we assume itis always one for now. We cannot simply define â as point-wise addition because, for example, (đ + đ)2 â đ2 + đ2 ingeneral. If we think of đ as the cost for some probabilis-tic computation, and we prepend a constant cost đ to thecomputation, then by linearity of expectations, we haveđŒ[(đ + đ)2] = đŒ[đ2 + 2đđ + đ2] = đ2 + 2 · đ · đŒ[đ] + đŒ[đ2],i.e., reasoning about the second moment requires us to keeptrack of the first moment. Similarly, we should haveđ (đ)2 â„ đŒđâČâŒJđK(đ) [đ¶ (đ, đ âČ)2 + 2 ·đ¶ (đ, đ âČ) ·đ (đ âČ)1 +đ (đ âČ)2],for the second-moment component, wheređ (đ âČ)1 andđ (đ âČ)2denote đŒ[đ] and đŒ[đ2], respectively. Therefore, the compo-sition operator â for second-moment analysis (i.e.,đ = 2)should be defined as
âš1, đ1, đ 1â© â âš1, đ2, đ 2â© def= âš1, đ1 + đ2, đ 1 + 2đ1đ2 + đ 2â©. (3)
Example 2.3. Fig. 3 annotates the rdwalk function fromEx. 2.1 with the derivation of an upper bound on both thefirst and second moment of the accumulated cost. To justifythe first and second moment of the accumulated cost for thefunction rdwalk, we again perform backward reasoning:âą For tick(1), it transforms a post-annotation đ by_đ.(âš1, 1, 1â© â đ); thus, the pre-annotation is âš1, 1, 1â© ââš1, 0, 0â© = âš1, 1, 1â©.
âą For call rdwalk, we apply the âinduction hypothesisâ, i.e.,the upper bound shown on line 2. We use the â opera-tor to compose the induction hypothesis with the post-annotation of this function call:
âš1, 2(đâđ„)+4, 4(đâđ„)2+22(đâđ„)+28â© â âš1, 1, 1â©= âš1, 2(đâđ„)+5, (4(đâđ„)2+22(đâđ„)+28)+2· (2(đâđ„)+4)+1â©= âš1, 2(đâđ„)+5, 4(đâđ„)2+26(đâđ„)+37â©.
âą For đ„ B đ„ + đĄ , we substitute đ„ with đ„ + đĄ in the post-annotation of this statement to obtain the pre-annotation.
âą For đĄ ⌠uniform(â1, 2), because the post-annotation in-volves both đĄ and đĄ2, we compute from the definition ofuniform distributions thatđŒđĄâŒuniform(â1,2) [đĄ] = 1/2, đŒđĄâŒuniform(â1,2) [đĄ2] = 1.
Then the upper bound on the second moment is derivedas follows:đŒđĄâŒuniform(â1,2) [4(đâđ„âđĄ)2+26(đâđ„âđĄ)+37]
= (4(đâđ„)2+26(đâđ„)+37)â(8(đâđ„)+26) ·đŒđĄâŒuniform(â1,2) [đĄ]+ 4·đŒđĄâŒuniform(â1,2) [đĄ2]
= 4(đâđ„)2+22(đâđ„)+28,which is the same as the desired upper bound on the sec-ond moment of the accumulated cost for the functionrdwalk. (See Fig. 3, line 2.)
We generalize the composition operator â to momentswith arbitrarily high degrees, via a family of algebraic struc-tures, which we name moment semirings (see §3.2). Thesesemirings are algebraic in the sense that they can be instan-tiated with any partially ordered semiring, not just â+.
Interval bounds. Moment semirings not only provide ageneral method to analyze higher moments, but also enablereasoning about upper and lower bounds on moments simul-taneously. The simultaneous treatment is also essential foranalyzing programs with non-monotone costs (see §3.3).
We instantiate moment semirings with the standard inter-val semiring I = {[đ, đ] | đ †đ}. The algebraic approachallows us to systematically incorporate the interval-valuedbounds, by reinterpreting operations in eq. (3) under I :
âš[1, 1], [đL1 , đU1 ], [đ L1 , đ U1 ]â© â âš[1, 1], [đL2 , đU2 ], [đ L2 , đ U2 ]â©def= âš[1, 1], [đL1 , đU1 ] +I [đL2 , đU2 ],
[đ L1 , đ U2 ] +I 2 · ( [đL1 , đU1 ] ·I [đL2 , đU2 ]) +I [đ L2 , đ U2 ]â©= âš[1, 1], [đL1 + đL2 , đU1 + đU2 ], [đ L1 + 2 ·min đ + đ L2 , đ U1 + 2 ·max đ + đ U2 ]â©,
4
Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI â21, June 20â25, 2021, Virtual, UK
where đ def= {đL1đL2 , đL1đU2 , đU1 đL2 , đU1 đU2 }. We then update the po-
tential inequality eq. (2) as follows:
đ (đ) â đŒđâČâŒJđK(đ) [âââââââââââââââââââââââââââââš[đ¶ (đ, đ âČ)đ ,đ¶ (đ, đ âČ)đ ]â©0â€đâ€đ â đ (đ âČ)],
where the order â is defined as pointwise interval inclusion.
Example 2.4. Suppose that the interval bound on the firstmoment of the accumulated cost of the rdwalk function fromEx. 2.1 is [2(đ â đ„), 2(đ â đ„) + 4]. We can now derive theupper bound on the variance đ [tick] †22đ + 28 shown inFig. 1(b) (where we substitute đ„ with 0 because the mainfunction initializes đ„ to 0 on line 5 in Fig. 2):đ [tick] = đŒ[tick2] â đŒ2 [tick]
†(upper bnd. on đŒ[tick2])â(lower bnd. on đŒ[tick])2= (4đ2 + 22đ + 28) â (2đ)2 = 22đ + 28.
In §5, we describe how we use moment bounds to derivethe tail bounds shown in Fig. 1(c).
2.2 Two Major ChallengesInterprocedural reasoning. Recall that in the derivation
of Fig. 3, we use the â operator to compose the upper boundson moments for call rdwalk and its post-annotation âš1, 1, 1â©.However, this approach does not work in general, becausethe post-annotation might be symbolic (e.g., âš1, đ„, đ„2â©) andthe callee might mutate referenced program variables (e.g., đ„ ).One workaround is to derive a pre-annotation for each possi-ble post-annotation of a recursive function, i.e., the momentannotations for a recursive function is polymorphic. Thisworkaround would not be effective for non-tail-recursivefunctions: for example, we need to reason about the rdwalkfunction in Fig. 3 with infinitely many post-annotationsâš1, 0, 0â©, âš1, 1, 1â©, âš1, 2, 4â©, . . . , i.e., âš1, đ, đ2â© for all đ â â€+.Our solution to moment-polymorphic recursion is to intro-
duce a combination operator â in a way that if đ1 and đ2 aretwo expected-potential functions, then
đ1 (đ)âđ2 (đ) â„đŒđâČâŒJđK(đ) [ââââââââââââââââšđ¶ (đ, đ âČ)đâ©0â€đâ€đâ(đ1 (đ âČ)âđ2 (đ âČ))] .
We then use the â operator to derive a frame rule:{ đ1 } đ { đ âČ
1 } { đ2 } đ { đ âČ2 }
{ đ1 â đ2 } đ { đ âČ1 â đ âČ
2 }We define â as pointwise addition, i.e., for second moments,
âšđ1, đ1, đ 1â© â âšđ2, đ2, đ 2â© def= âšđ1 + đ2, đ1 + đ2, đ 1 + đ 2â©, (4)
and because the 0-th-moment (i.e., termination-probability)component is no longer guaranteed to be one, we redefineâ to consider the termination probabilities:âšđ1, đ1, đ 1â©ââšđ2, đ2, đ 2â© def
= âšđ1đ2, đ2đ1+đ1đ2, đ2đ 1+2đ1đ2+đ1đ 2â©.(5)
Remark 2.5. As we will show in §3.2, the composition oper-ator â and combination operator â form a moment semiring;consequently, we can use algebraic properties of semirings (e.g.,distributivity) to aid higher-moment analysis. For example,a vector âš0, đ1, đ 1â© whose termination-probability component
is zero does not seem to make sense, because moments withrespect to a zero distribution should also be zero. However, bydistributivity, we have
âš1, đ3, đ 3â© â âš1, đ1 + đ2, đ 1 + đ 2â©= âš1, đ3, đ 3â© â (âš0, đ1, đ 1â© â âš1, đ2, đ 2â©)= (âš1, đ3, đ 3â© â âš0, đ1, đ 1â©) â (âš1, đ3, đ 3â© â âš1, đ2, đ 2â©).
If we think of âš1, đ1 + đ2, đ 1 + đ 2â© as a post-annotation of acomputation whose moments are bounded by âš1, đ3, đ 3â©, theequation above indicates that we can use â to decompose thepost-annotation into subparts, and then reason about each sub-part separately. This fact inspires us to develop a decompositiontechnique for moment-polymorphic recursion.
Example 2.6. With the â operator and the frame rule, weonly need to analyze the rdwalk function from Ex. 2.1 withthree post-annotations: âš1, 0, 0â©, âš0, 1, 1â©, and âš0, 0, 2â©, whichform a kind of âelimination sequence.â We construct thissequence in an on-demand manner; the first post-annotationis the identity element âš1, 0, 0â© of the moment semiring.For post-annotation âš1, 0, 0â©, as shown in Fig. 3, we need
to know the moment bound for rdwalk with the post-annotation âš1, 1, 1â©. Instead of reanalyzing rdwalk with thepost-annotation âš1, 1, 1â©, we use the â operator to computethe âdifferenceâ between it and the previous post-annotationâš1, 0, 0â©. Observing that âš1, 1, 1â© = âš1, 0, 0â© â âš0, 1, 1â©, we nowanalyze rdwalk with âš0, 1, 1â© as the post-annotation:1 call rdwalk; { âš0, 1, 3â© } # = âš1, 1, 1â© â âš0, 1, 1â©2 tick(1) { âš0, 1, 1â© }Again, because âš0, 1, 3â© = âš0, 1, 1â© â âš0, 0, 2â©, we need to fur-ther analyze rdwalk with âš0, 0, 2â© as the post-annotation:1 call rdwalk; { âš0, 0, 2â© } # = âš1, 1, 1â© â âš0, 0, 2â©2 tick(1) { âš0, 0, 2â© }With the post-annotation âš0, 0, 2â©, we can now rea-son monomorphically without analyzing any new post-annotation! We can perform a succession of reasoning stepssimilar to what we have done in Ex. 2.2 to justify the follow-ing bounds (âunwindingâ the elimination sequence):âą {âš0, 0, 2â©} rdwalk {âš0, 0, 2â©}: Directly by backward reason-ing with the post-annotation âš0, 0, 2â©.
âą {âš0, 1, 4(đ â đ„) + 9â©} rdwalk {âš0, 1, 1â©}: To analyze the re-cursive call with post-annotation âš0, 1, 3â©, we use the framerule with the post-call-site annotation âš0, 0, 2â© to deriveâš0, 1, 4(đ â đ„) + 11â© as the pre-annotation:1 { âš0, 1, 4(đ â đ„) + 11â© } # = âš0, 1, 4(đ â đ„) + 9â© â âš0, 0, 2â©2 call rdwalk;3 { âš0, 1, 3â© } # = âš0, 1, 1â© â âš0, 0, 2â©
âą {âš1, 2(đâđ„)+4, 4(đâđ„)2+22(đâđ„)+28â©} rdwalk {âš1, 0, 0â©}:To analyze the recursive call with post-annotation âš1, 1, 1â©,we use the frame rule with the post-call-site annotationâš0, 1, 1â© to derive âš1, 2(đ âđ„) +5, 4(đ âđ„)2 +26(đ âđ„) +37â©as the pre-annotation:1 { âš1, 2(đâđ„)+5, 4(đâđ„)2+26(đâđ„)+37â© }2 # = âš1, 2(đâđ„)+4, 4(đâđ„)2+22(đâđ„)+28â© â âš0, 1, 4(đâđ„)+9â©3 call rdwalk;4 { âš1, 1, 1â© } # = âš1, 0, 0â© â âš0, 1, 1â©
5
PLDI â21, June 20â25, 2021, Virtual, UK Di Wang, Jan Hoffmann, and Thomas Reps
1 func geo() begin { âš1, 2đ„ â© }2 đ„ B đ„ + 1; { âš1, 2đ„â1â© }3 # expected-potential method for lower bounds:4 # 2đ„â1 < 1/2 · (2đ„ + 1) + 1/2 · 05 if prob(1/2) then { âš1, 2đ„ + 1â© }6 tick(1); { âš1, 2đ„ â© }7 call geo { âš1, 0â©}8 fi9 end
Figure 4. A purely probabilistic loop with annotations for alower bound on the first moment of the accumulated cost.
In §3.3, we present an automatic inference system for theexpected-potential method that is extended with interval-valued bounds on highermoments, with support for moment-polymorphic recursion.
Soundness of the analysis. Unlike the classic potentialmethod, the expected-potential method is not always soundwhen reasoning about the moments for cost accumulatorsin probabilistic programs.
Counterexample 2.7. Consider the program in Fig. 4 thatdescribes a purely probabilistic loop that exits the loop withprobability 1/2 in each iteration. The expected accumulatedcost of the program should be one [19]. However, the annota-tions in Fig. 4 justify a potential function 2đ„ as a lower boundon the expected accumulated cost, no matter what value đ„has at the beginning, which is apparently unsound.
Why does the expected-potential method fail in this case?The short answer is that dualization only works for someproblems: upper-bounding the sum of nonnegative ticks isequivalent to lower-bounding the sum of nonpositive ticks;lower-bounding the sum of nonnegative ticksâthe issue inFig. 4âis equivalent to upper-bounding the sum of nonposi-tive ticks; however, the two kinds of problems are inherentlydifferent [19]. Intuitively, the classic potential method forbounding the costs of non-probabilistic programs is a par-tial-correctness method, i.e., derived upper/lower bounds aresound if the analyzed program terminates [32]. With proba-bilistic programs, many programs do not terminate definitely,but only almost surely, i.e., they terminate with probabilityone, but have some execution traces that are non-terminating.The programs in Figs. 2 and 4 are both almost-surely termi-nating. For the expected-potential method, the potential at aprogram state can be seen as an average of potentials neededfor all possible computations that continue from the state.If the program state can lead to a non-terminating execu-tion trace, the potential associated with that trace might beproblematic, and as a consequence, the expected-potentialmethod might fail.
Recent research [1, 19, 38, 43] has employed the OptionalStopping Theorem (OST) from probability theory to addressthis soundness issue. The classic OST provides a collection ofsufficient conditions for reasoning about expected gain upon
đ F skip | tick(đ) | đ„ B đž | đ„ ⌠đ· | call đ | while đż do đ od| if prob(đ) then đ1 else đ2 fi | if đż then đ1 else đ2 fi | đ1; đ2
đż F true | not đż | đż1 and đż2 | đž1 †đž2đž F đ„ | đ | đž1 + đž2 | đž1 Ă đž2đ· F uniform(đ, đ) | · · ·
Figure 5. Syntax of Appl, where đ â [0, 1], đ, đ, đ â â, đ < đ,đ„ â VID is a variable, and đ â FID is a function identifier.
termination of stochastic processes, where the expected gainat any time is invariant. By constructing a stochastic pro-cess for executions of probabilistic programs and setting theexpected-potential function as the invariant, one can applythe OST to justify the soundness of the expected-potentialfunction. In a companion paper [41], we study and proposean extension to the classic OST with a new sufficient condi-tion that is suitable for reasoning about higher moments; inthis work, we prove the soundness of our central-moment in-ference for programs that satisfy this condition, and developan algorithm to check this condition automatically (see §4).
3 Derivation System for Higher MomentsIn this section, we describe the inference system used byour analysis. We first present a probabilistic programminglanguage (§3.1). We then introduce moment semirings tocompose higher moments for a cost accumulator from twocomputations (§3.2). We use moment semirings to developour derivation system, which is presented as a declarativeprogram logic (§3.3). Finally, we sketch how we reduce theinference of a derivation to LP solving (§3.4).
3.1 A Probabilistic Programming LanguageThis paper uses an imperative arithmetic probabilistic pro-gramming language Appl that supports general recursionand continuous distributions, where program variables arereal-valued. We use the following notational conventions.Natural numbers â exclude 0, i.e., â def
= {1, 2, 3, · · · } â â€+ def=
{0, 1, 2, · · · }. The Iverson brackets [·] are defined by [đ] = 1if đ is true and otherwise [đ] = 0. We denote updating anexisting binding of đ„ in a finite map đ to đŁ by đ [đ„ âŠâ đŁ]. Wewill also use the following standard notions from probabilitytheory: đ-algebras, measurable spaces, measurable functions,random variables, probability measures, and expectations.Appendix A provides a review of those notions.
Fig. 5 presents the syntax of Appl, where the metavari-ables đ , đż, đž, and đ· stand for statements, conditions, expres-sions, and distributions, respectively. Each distribution đ· isassociated with a probability measure `đ· â đ»(â). We writeđ»(đ ) for the collection of all probability measures on themeasurable space đ . For example, uniform(đ, đ) describesa uniform distribution on the interval [đ, đ], and its corre-sponding probability measure is the integration of its densityfunction `uniform(đ,đ) (đ) def
=â«đ
[đâ€đ„â€đ ]đâđ đđ„ . The statement
6
Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI â21, June 20â25, 2021, Virtual, UK
âđ„ ⌠đ·â is a random-sampling assignment, which draws fromthe distribution `đ· to obtain a sample value and then assignsit to đ„ . The statement âif prob(đ) then đ1 else đ2 fiâ is aprobabilistic-branching statement, which executes đ1 withprobability đ , or đ2 with probability (1 â đ).The statement âcall đ â makes a (possibly recursive) call
to the function with identifier đ â FID. In this paper, we as-sume that the functions only manipulate states that consistof global program variables. The statement tick(đ), wheređ â â is a constant, is used to define the cost model. It adds đto an anonymous global cost accumulator. Note that our im-plementation supports local variables, function parameters,return statements, as well as accumulation of non-constantcosts; the restrictions imposed here are not essential, and areintroduced solely to simplify the presentation.We use a pair âšđ, đmainâ© to represent an Appl program,
where đ is a finite map from function identifiers to theirbodies and đmain is the body of the main function. We presentan operational semantics for Appl in §4.1.
3.2 Moment SemiringsAs discussed in §2.1, we want to design a composition op-eration â and a combination operation â to compose andcombine higher moments of accumulated costs such that
đ (đ) âđŒđâČâŒJđK(đ)[ââââââââââââââââšđ¶ (đ, đ âČ)đâ©0â€đâ€đâđ (đ âČ)],
đ1 (đ)âđ2 (đ) âđŒđâČâŒJđK(đ)[ââââââââââââââââšđ¶ (đ, đ âČ)đâ©0â€đâ€đâ(đ1 (đ âČ)âđ2 (đ âČ))],
where the expected-potential functions đ, đ1, đ2 map pro-gram states to interval-valued vectors, đ¶ (đ, đ âČ) is the costfor the computation from đ to đ âČ, and đ is the degree ofthe target moment. In eqs. (4) and (5), we gave a definitionof â and â suitable for first and second moments, respec-tively. In this section, we generalize them to reason aboutupper and lower bounds of higher moments. Our approachis inspired by the work of Li and Eisner [27], which devel-ops a method to âliftâ techniques for first moments to thosefor second moments. Instead of restricting the elements ofsemirings to be vectors of numbers, we propose algebraicmoment semirings that can also be instantiated with vectorsof intervals, which we need for the interval-bound analysisthat was demonstrated in §2.1.
Definition 3.1. Theđ-th order moment semiring M(đ)R =
( |R|đ+1, â, â, 0, 1) is parametrized by a partially orderedsemiring R = ( |R|, â€, +, ·, 0, 1), whereâââââââââââšđąđâ©0â€đâ€đâââââââââââšđŁđâ©0â€đâ€đ def
=âââââââââââââââšđąđ + đŁđâ©0â€đâ€đ, (6)
âââââââââââšđąđâ©0â€đâ€đâââââââââââšđŁđâ©0â€đâ€đ def=âââââââââââââââââââââââââââšâđ
đ=0(đđ
)Ă(đąđ · đŁđâđ )â©0â€đâ€đ, (7)(đđ
)is the binomial coefficient; the scalar product đ Ă đą is an
abbreviation forâđđ=1 đą, for đ â â€+, đą â R; 0 def
= âš0, 0, · · · , 0â©;and 1 def
= âš1, 0, · · · , 0â©. We define the partial order â as thepointwise extension of the partial order †onR.
Intuitively, the definition of â in eq. (7) can be seen asthe multiplication of two moment-generating functions fordistributions with moments
âââââââââââšđąđâ©0â€đâ€đ andââââââââââšđŁđâ©0â€đâ€đ , re-
spectively. We prove a composition property for momentsemirings.
Lemma 3.2. For all đą, đŁ â R, it holds thatââââââââââââââââš(đą + đŁ)đâ©0â€đâ€đ =
âââââââââââšđąđâ©0â€đâ€đ â
ââââââââââšđŁđâ©0â€đâ€đ,
where đąđ is an abbreviation forâđđ=1 đą, for đ â â€+, đą â R.
3.3 Inference RulesWe present the derivation system as a declarative programlogic that uses moment semirings to enable compositionalreasoning and moment-polymorphic recursion.
Interval-valued moment semirings. Our derivationsystem infers upper and lower bounds simultaneously, ratherthan separately, which is essential for non-monotone costs.Consider a program âtick(â1); đâ and suppose that we haveâš1, 2, 5â© and âš1,â2, 5â© as the upper and lower bound on thefirst two moments of the cost for đ , respectively. If we onlyuse the upper bound, we derive âš1,â1, 1â© â âš1, 2, 5â© = âš1, 1, 2â©,which is not an upper bound on the moments of the costfor the program; if the actual moments of the cost for đ areâš1, 0, 5â©, then the actual moments of the cost for âtick(â1); đâare âš1,â1, 1â© â âš1, 0, 5â© = âš1,â1, 4⩠̞†âš1, 1, 2â©. Thus, in theanalysis, we instantiate moment semirings with the intervaldomain I . For the program âtick(â1); đâ, its interval-valuedbound on the first two moments is âš[1, 1], [â1,â1], [1, 1]â© ââš[1, 1], [â2, 2], [5, 5]â© = âš[1, 1], [â3, 1], [2, 10]â©.Template-based expected-potential functions. The ba-
sic approach to automated inference using potential func-tions is to introduce a template for the expected-potentialfunctions. Let us fixđ â â as the degree of the target mo-ment. Because we useM(đ)
I -valued expected-potential func-tions whose range is vectors of intervals, the templates arevectors of intervals whose ends are represented symboli-cally. In this paper, we represent the ends of intervals bypolynomials in â[VID] over program variables.More formally, we lift the interval semiring I to a sym-
bolic interval semiring PI by representing the ends of theđ-th interval by polynomials in âđđ [VID] up to degreeđđ for some fixed đ â â. Let M(đ)
PI be the đ-th ordermoment semiring instantiated with the symbolic intervalsemiring. Then the potential annotation is represented asđ =
ââââââââââââââââš[đżđ ,đđ ]â©0â€đâ€đ â M(đ)PI , where đżđ âs andđđ âs are poly-
nomials in âđđ [VID]. đ defines anM(đ)I -valued expected-
potential function đđ (đ) def=âââââââââââââââââââââââš[đ (đżđ ), đ (đđ )]â©0â€đâ€đ , where
đ is a program state, and đ (đżđ ) and đ (đđ ) are đżđ and đđevaluated over đ , respectively.
Inference rules. We formalize our derivation system formoment analysis in a Hoare-logic style. The judgment has
7
PLDI â21, June 20â25, 2021, Virtual, UK Di Wang, Jan Hoffmann, and Thomas Reps
(Q-Tick)
đ =ââââââââââââââââš[đđ , đđ ]â©0â€đâ€đ â đ âČ
Î âąâ {Î;đ} tick(đ) {Î;đ âČ}
(Q-Sample)Î = âđ„ â supp(`đ· ) : ÎâČ
đ = đŒđ„âŒ`đ· [đ âČ]Î âąâ {Î;đ} đ„ ⌠đ· {ÎâČ;đ âČ}
(Q-Loop)Î âąâ {Π⧠đż;đ} đ {Î;đ}
Îâąâ {Î;đ} while đż do đ od {Îâ§ÂŹđż;đ}
(Q-Call-Mono)(Î;đ, ÎâČ;đ âČ) â Îđ (đ )
Îâąđ {Î;đ} call đ {ÎâČ;đ âČ}(Q-Seq)
Î âąâ {Î;đ} đ1 {ÎâČ;đ âČ}Î âąâ {ÎâČ;đ âČ} đ2 {ÎâČâČ;đ âČâČ}Î âąâ {Î;đ} đ1; đ2 {ÎâČâČ;đ âČâČ}
(Q-Call-Poly)â < đ Îâ (đ ) = (Î;đ1, Î
âČ;đ âČ1)
Î âąâ+1 {Î;đ2} đ(đ ) {ÎâČ;đ âČ2}
Î âąâ {Î;đ1 â đ2} call đ {ÎâČ;đ âČ1 â đ âČ
2}(Q-Prob)Î âąâ {Î;đ1} đ1 {ÎâČ;đ âČ} Î âąâ {Î;đ2} đ2 {ÎâČ;đ âČ}đ = đ â đ đ = âš[đ, đ], [0, 0], · · · , [0, 0]â© â đ1
đ = âš[1 â đ, 1 â đ], [0, 0], · · · , [0, 0]â© â đ2
Î âąâ {Î;đ} if prob(đ) then đ1 else đ2 fi {ÎâČ;đ âČ}(Q-Weaken)Îâąâ {Î0;đ0} đ {ÎâČ0 ;đ âČ
0} Î |= Î0 ÎâČ0 |= ÎâČ Î |= đ â đ0 ÎâČ0 |= đ âČ0 â đ âČ
Î âąâ {Î;đ} đ {ÎâČ;đ âČ}
Figure 6. Selected inference rules of the derivation system.
the form Î âąâ {Î;đ} đ {ÎâČ;đ âČ}, where đ is a statement,{Î;đ} is a precondition, {ÎâČ;đ âČ} is a postcondition, Î =
âšÎđâ©0â€đâ€đ is a context of function specifications, andâ â â€+
specifies some restrictions put on đ,đ âČ that we will explainlater. The logical context Î : (VIDââ)â{â€,â„} is a predi-cate that describes reachable states at a program point. Thepotential annotation đ â M(đ)
PI specifies a map from pro-gram states to the moment semiring that is used to defineinterval-valued expected-potential functions. The semanticsof the triple {·;đ} đ {·;đ âČ} is that if the rest of the computa-tion after executing đ has its moments of the accumulatedcost bounded by đđâČ , then the whole computation has itsmoments of the accumulated cost bounded byđđ . The param-eter â restricts all đ-th-moment components in đ,đ âČ, suchthat đ < â, to be [0, 0]. We call such potential annotationsâ-restricted; this construction is motivated by an observationfrom Ex. 2.6, where we illustrated the benefits of carryingout interprocedural analysis using an âelimination sequenceâof annotations for recursive function calls, where the succes-sive annotations have a greater number of zeros, filling fromthe left. Function specifications are valid pairs of pre- andpost-conditions for all declared functions in a program. Foreach đ , such that 0â€đ â€đ, and each function đ , a valid spec-ification (Î;đ, ÎâČ;đ âČ) â Îđ (đ ) is justified by the judgmentÎ âąđ {Î;đ}đ(đ ) {ÎâČ;đ âČ}, wheređ(đ ) is the function bodyof đ , andđ,đ âČ are đ-restricted. To perform context-sensitiveanalysis, a function can have multiple specifications.Fig. 6 presents some of the inference rules. The rule (Q-
Tick) is the only rule that deals with costs in a program. To
accumulate the moments of the cost, we use the â opera-tion in the moment semiring M(đ)
PI . The rule (Q-Sample)accounts for sampling statements. Because âđ„ âŒđ·â randomlyassigns a value to đ„ in the support of distributionđ· , we quan-tify đ„ out universally from the logical context. To computeđ = đŒđ„âŒ`đ· [đ âČ], where đ„ is drawn from distribution đ· , weassume the moments for đ· are well-defined and computable,and substitute đ„đ , đ â â with the corresponding moments inđ âČ. We make this assumption because every component ofđ âČ is a polynomial over program variables. For example, ifđ· = uniform(â1, 2), we know the following factsđŒđ„âŒ`đ· [đ„0]=1,đŒđ„âŒ`đ· [đ„1]= 1/2,đŒđ„âŒ`đ· [đ„2]=1,đŒđ„âŒ`đ· [đ„3]= 5/4.Then for đ âČ = âš[1, 1], [1 + đ„2, đ„đŠ2 + đ„3đŠ]â©, by the linearity ofexpectations, we compute đ = đŒđ„âŒ`đ· [đ âČ] as follows:đŒđ„âŒ`đ· [đ âČ] = âš[1, 1], [đŒđ„âŒ`đ· [1 + đ„2],đŒđ„âŒ`đ· [đ„đŠ2 + đ„3đŠ]â©
= âš[1, 1], [1 + đŒđ„âŒ`đ· [đ„2], đŠ2đŒđ„âŒ`đ· [đ„] + đŠđŒđ„âŒ`đ· [đ„3]]â©= âš[1, 1], [2, 1/2 · đŠ2 + 5/4 · đŠ]â©.
The other probabilistic rule (Q-Prob) deals with probabilisticbranching. Intuitively, if the moments of the execution of đ1and đ2 are đ1 and đ2, respectively, and those of the accumu-lated cost of the computation after the branch statement isbounded by đđâČ , then the moments for the whole computa-tion should be bounded by a âweighted averageâ of (đ1âđđâČ)and (đ2 â đđâČ), with respect to the branching probability đ .We implement the weighted average by the combinationoperator â applied to âš[đ, đ], [0, 0], · · · , [0, 0]â© â đ1 â đđâČ
and âš[1 â đ, 1 â đ], [0, 0], · · · , [0, 0]â© â đ2 â đđâČ , because the0-th moments denote probabilities.
The rules (Q-Call-Poly) and (Q-Call-Mono) handle func-tion calls. Recall that in Ex. 2.6, we use the â operator tocombine multiple potential functions for a function to reasonabout recursive function calls. The restriction parameter âis used to ensure that the derivation system only needs toreason about finitely many post-annotations for each callsite. In rule (Q-Call-Poly), whereâ is smaller than the targetmomentđ, we fetch the pre- and post-condition đ1, đ
âČ1 for
the function đ from the specification context Îâ . We thencombine it with a frame of (â + 1)-restricted potential an-notations đ2, đ
âČ2 for the function đ . The frame is used to
account for the interval bounds on the moments for the com-putation after the function call for most non-tail-recursiveprograms. When â reaches the target momentđ, we use therule (Q-Call-Mono) to reason moment-monomorphically,because setting â tođ + 1 implies that the frame can onlybe âš[0, 0], [0, 0], · · · , [0, 0]â©.
The structural rule (Q-Weaken) is used to strengthen thepre-condition and relax the post-condition. The entailmentrelation Î |= ÎâČ states that the logical implication Î =â ÎâČ
is valid. In terms of the bounds on higher moments for costaccumulators, if the triple {·;đ} đ {·;đ âČ} is valid, then wecan safely widen the intervals in the pre-condition đ andnarrow the intervals in the post-condition đ âČ.
8
Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI â21, June 20â25, 2021, Virtual, UK
1 func rdwalk() begin2 { đ„ <đ+2; âš[1, 1], [2(đâđ„), 2(đâđ„)+4],3 [4(đâđ„)2 + 6(đâđ„)â4, 4(đâđ„)2+22(đâđ„)+28]â© }4 if đ„ < đ then5 { đ„ <đ ; âš[1, 1], [2(đâđ„), 2(đâđ„)+4],6 [4(đâđ„)2+6(đâđ„)â4, 4(đâđ„)2+22(đâđ„)+28]â© }7 đĄ ⌠uniform(â1, 2);8 { đ„ <đâ§đĄ †2; âš[1, 1], [2(đâđ„âđĄ)+1, 2(đâđ„âđĄ)+5],9 [4(đâđ„âđĄ)2 + 10(đâđ„âđĄ)â3, 4(đâđ„âđĄ)2+26(đâđ„âđĄ)+37)]â© }10 đ„ B đ„ + đĄ ;11 { đ„ <đ+2; âš[1, 1], [2(đâđ„)+1, 2(đâđ„)+5],12 [4(đâđ„)2+10(đâđ„)â3, 4(đâđ„)2+26(đâđ„)+37]â© }13 call rdwalk;14 { â€; âš[1, 1], [1, 1], [1, 1]â© }15 tick(1)16 { â€; âš[1, 1], [0, 0], [0, 0]â© }17 fi18 end
Figure 7. The rdwalk function with annotations for theinterval-bounds on the first and second moments.
Example 3.3. Fig. 7 presents the logical context and thecomplete potential annotation for the first and second mo-ments for the cost accumulator tick of the rdwalk functionfrom Ex. 2.1. Similar to the reasoning in Ex. 2.6, we can jus-tify the derivation using moment-polymorphic recursionand the moment bounds for rdwalk with post-annotationsâš[0, 0], [1, 1], [1, 1]â© and âš[0, 0], [0, 0], [2, 2]â©.
3.4 Automatic Linear-Constraint GenerationWe adapt existing techniques [9, 31] to automate our in-ference system by (i) using an abstract interpreter to inferlogical contexts, (ii) generating templates and linear con-straints by inductively applying the derivation rules to theanalyzed program, and (iii) employing an off-the-shelf LPsolver to discharge the linear constraints. During the genera-tion phase, the coefficients of monomials in the polynomialsfrom the ends of the intervals in every qualitative contextđ â M(đ)
PI are recorded as symbolic names, and the inequal-ities among those coefficientsâderived from the inferencerules in Fig. 6âare emitted to the LP solver.
Generating linear constraints. Fig. 8 demonstrates thegeneration process for some of the bounds in Fig. 3. Let đ”đ bea vector of monomials over program variables VID of degreeup to đ . Then a polynomial
âđâđ”đ đđ · đ, where đđ â â for
all đ â đ”đ , can be represented as a vector of its coefficients(đđ)đâđ”đ . We denote coefficient vectors by uppercase letters,while we use lowercase letters as names of the coefficients.We also assume that the degree of the polynomials for theđ-th moments is up to đ .
For (Q-Tick), we generate constraints that correspond tothe composition operation â of the moment semiring. For
example, the second-moment component should satisfyâïžđâđ”2
đĄđĄđđ
· đ = the second-moment component of((1, 1, 1)
â (âïžđâđ”0
đąđĄđđ
· đ,âïžđâđ”1
đŁđĄđđ
· đ,âïžđâđ”2
đ€đĄđđ
· đ))=
âïžđâđ”2
đ€đĄđđ
· đ + 2 ·âïžđâđ”1
đŁđĄđđ
· đ +âïžđâđ”0
đąđĄđđ
· đ.
Then we extract đĄđĄđ1 = đ€đĄđ1 + 2đŁđĄđ1 + đąđĄđ1 for đ = 1, andđĄđĄđđ„ = đ€đĄđđ„ + 2đŁđĄđđ„ for đ = đ„ , etc. For (Q-Sample), we generateconstraints to perform âpartial evaluationâ on the polynomi-als by substituting đ with the moments of uniform(â1, 2).As we discussed in §3.3, let đ· denote uniform(â1, 2), thenđŒđâŒđ· [đ€đ đđ · đ ] = đ€đ đđ · đŒđâŒđ· [đ ] = 1/2 ·đ€đ đđ , đŒđâŒđ· [đ€đ đđ 2 · đ 2] =đ€đ đđ 2
· đŒđâŒđ· [đ 2] = 1 · đ€đ đđ 2. Then we generate a constraint
đĄđ đ1 = đ€đ đ1 + 1/2 ·đ€đ đđ + 1 ·đ€đ đđ 2
for đĄđ đ1 .The loop rule (Q-Loop) involves constructing loop in-
variants đ , which is in general a non-trivial problem forautomated static analysis. Instead of computing the loopinvariant đ explicitly, our system represents đ directly asa template with unknown coefficients, then uses đ as thepost-annotation to analyze the loop body and obtain a pre-annotation, and finally generates linear constraints that in-dicate the pre-annotation equals to đ .The structural rule (Q-Weaken) can be applied at any
point during the derivation. In our implementation, we applyit where the control flow has a branch, because differentbranches might have different costs. To handle the judgmentÎ |= đ â đ âČ, i.e., to generate constraints that ensure oneinterval is always contained in another interval, where theends of the intervals are polynomials, we adapt the idea ofrewrite functions [9, 31]. Intuitively, to ensure that [đż1,đ1] â[đż2,đ2], i.e., đż1 †đż2 andđ2 †đ1, under the logical contextÎ, we generate constraints indicating that there exist twopolynomialsđ1,đ2 that are always nonnegative under Î, suchthat đż1 = đż2+đ1 andđ1 = đ2âđ2. Here,đ1 andđ2 are like slackvariables, except that because all quantities are polynomials,they are too (i.e., slack polynomials). In our implementation,Î is a set of linear constraints over program variables ofthe form E â„ 0 , then we can represent đ1,đ2 by conicalcombinations (i.e., linear combinations with nonnegativescalars) of expressions E in Î.
Solving linear constraints. The LP solver not only findsassignments to the coefficients that satisfy the constraints, itcan also optimize a linear objective function. In our central-moment analysis, we construct an objective function thattries to minimize imprecision. For example, let us considerupper bounds on the variance. We randomly pick a con-crete valuation of program variables that satisfies the pre-condition (e.g., đ > 0 in Fig. 2), and then substitute programvariables with the concrete valuation in the polynomial forthe upper bound on the variance (obtained from bounds on
9
PLDI â21, June 20â25, 2021, Virtual, UK Di Wang, Jan Hoffmann, and Thomas Reps
đđĄđ1 = đąđĄđ1 đđĄđ1 = đąđĄđ1 + đŁđĄđ1 đđĄđđ„ = đŁđĄđđ„ đđĄđđ = đŁđĄđđ đđĄđđ = đŁđĄđđđĄđĄđ1 = đ€đĄđ1 + 2đŁđĄđ1 + đąđĄđ1 đĄđĄđđ„ = đ€đĄđđ„ + 2đŁđĄđđ„ đĄđĄđ
đ„2= đ€đĄđ
đ„2· · ·
Î âą {â€; (đđĄđ , đđĄđ ,đ đĄđ )} tick(1) {â€; (đ đĄđ ,đ đĄđ ,đ đĄđ )}(Q-Tick)
đđ đ1 = đąđ đ1 đđ đ1 = đŁđ đ1 + 1/2 · đŁđ đđ đđ đđ„ = đŁđ đđ„ đđ đđ = đŁđ đđ đđ đđ = 0đĄđ đ1 = đ€đ đ1 + 1/2 ·đ€đ đđ + 1 ·đ€đ đ
đ 2đĄđ đđ„ = đ€đ đđ„ + 1/2 ·đ€đ đđ ·đ„ đĄđ đ
đ„2= đ€đ đ
đ„2· · ·
Î âą {đ„ < đ ; (đđ đ, đđ đ,đ đ đ)} đ ⌠uniform(â1, 2) {đ„ < đ ⧠đ †2; (đ đ đ,đ đ đ,đ đ đ)} (Q-Sample)
Figure 8. Generate linear constraints, guided by inference rules.
the raw moments). The resulting linear combination of coef-ficients, which we set as the objective function, stands for thevariance under the concrete valuation. Thus, minimizing theobjective function produces the most precise upper boundon the variance under the specific concrete valuation. Also,we can extract a symbolic upper bound on the variance usingthe assignments to the coefficients. Because the derivation ofthe bounds only uses the given pre-condition, the symbolicbounds apply to all valuations that satisfy the pre-condition.
4 Soundness of Higher-Moment AnalysisIn this section, we study the soundness of our derivation sys-tem for higher-moment analysis. We first present a Markov-chain semantics for the probabilistic programming languageAppl to reason about how stepwise costs contribute to theglobal accumulated cost (§4.1). We then formulate higher-moment analysis with respect to the semantics and provethe soundness of our derivation system for higher-momentanalysis based on a recent extension to the Optional StoppingTheorem (§4.2). Finally, we sketch the algorithmic approachfor ensuring the soundness of our analysis (§4.3).
4.1 A Markov-Chain SemanticsOperational semantics. We start with a small-step op-
erational semantics with continuations, which we will uselater to construct the Markov-chain semantics. We follow adistribution-based approach [7, 25] to define an operationalcost semantics for Appl. Full details of the semantics areincluded in Appendix B. A program configuration đ â ÎŁ isa quadruple âšđŸ, đ, đŸ, đŒâ© where đŸ : VID â â is a programstate that maps variables to values, đ is the statement beingexecuted, đŸ is a continuation that describes what remains tobe done after the execution of đ , and đŒ â â is the global costaccumulator. An execution of an Appl program âšđ, đmainâ©is initialized with âš__.0, đmain,Kstop, 0â©, and the termina-tion configurations have the form âš_, skip,Kstop, _â©, whereKstop is an empty continuation.
Different from a standard semantics where each programconfiguration steps to at most one new configuration, aprobabilistic semantics may pick several different new con-figurations. The evaluation relation for Appl has the formđ âŠâ ` where ` â đ»(ÎŁ) is a probability measure over con-figurations. Below are two example rules. The rule (E-Prob)
constructs a distribution whose support has exactly twoelements, which stand for the two branches of the proba-bilistic choice. We write đż (đ) for the Dirac measure at đ ,defined as _đŽ.[đ â đŽ] where đŽ is a measurable subset ofÎŁ. We also write đ · `1 + (1 â đ) · `2 for a convex combi-nation of measures `1 and `2 where đ â [0, 1], defined as_đŽ.đ · `1 (đŽ) + (1 â đ) · `2 (đŽ). The rule (E-Sample) âpushesâthe probability distribution of đ· to a distribution over post-sampling program configurations.
đ = if prob(đ) then đ1 else đ2 fiâšđŸ, đ, đŸ, đŒâ© âŠâ đ · đż (âšđŸ, đ1, đŸ, đŒâ©) + (1 â đ) · đż (âšđŸ, đ2, đŸ, đŒâ©)
(E-Prob)
âšđŸ, đ„ âŒđ·,đŸ, đŒâ© âŠâ _đŽ.`đ· ({đ | âšđŸ [đ„ âŠâđ ], skip, đŸ, đŒâ© â đŽ}) (E-Sample)
Example 4.1. Suppose that a random sampling statementis being executed, i.e., the current configuration is
âš{đĄ âŠâ đĄ0}, (đĄ ⌠uniform(â1, 2)), đŸ0, đŒ0â©.The probability measure for the uniform distribution is_đ.
â«đ
[â1â€đ„â€2]3 đđ„ . Thus, by the rule (E-Sample), we derive
the post-sampling probability measure over configurations:_đŽ.
â«â[âš{đĄ âŠâ đ }, skip, đŸ0, đŒ0â© â đŽ] · [â1â€đ â€2]
3 đđ .
A Markov-chain semantics. In this work, we harnessMarkov-chain-based reasoning [24, 33] to develop a Markov-chain cost semantics for Appl, based on the evaluation rela-tion đ âŠâ `. An advantage of this approach is that it allowsus to study how the cost of every single evaluation step con-tributes to the accumulated cost at the exit of the program.Details of this semantics are included in Appendix C.Let (Ω,F ,â) be the probability space where Ω def
= ÎŁâ€+ isthe set of all infinite traces over program configurations,F isa đ-algebra on Ω, and â is a probability measure on (Ω,F)obtained by the evaluation relation đ âŠâ ` and the initial con-figuration âš__.0, đmain,Kstop, 0â©. Intuitively, â specifies theprobability distribution over all possible executions of a prob-abilistic program. The probability of an assertion \ with re-spect to â, written â[\ ], is defined as â({đ | \ (đ) is true}).To formulate the accumulated cost at the exit of the pro-
gram, we define the stopping time đ : Ω â â€+ âȘ {â} of aprobabilistic program as a random variable on the probabilityspace (Ω,F ,â) of program traces:
đ (đ) def= inf{đ â â€+ | đđ = âš_, skip,Kstop, _â©},
10
Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI â21, June 20â25, 2021, Virtual, UK
i.e.,đ (đ) is the number of evaluation steps before the traceđreaches some termination configuration âš_, skip,Kstop, _â©.We define the accumulated cost đŽđ : Ω â â with respect tothe stopping time đ as
đŽđ (đ) def= đŽđ (đ) (đ),
where đŽđ : Ω â â captures the accumulated cost at theđ-th evaluation step for đ â â€+, which is defined as
đŽđ (đ) def= đŒđ where đđ = âš_, _, _, đŒđâ©.
Theđ-th moment of the accumulated cost is given by theexpectation đŒ[đŽđ
đ] with respect to â.
4.2 Soundness of the Derivation SystemProofs for this section are included in Appendices D to F.
The expected-potential method for moment analysis.We fix a degree đ â â and let M(đ)
I be the đ-th ordermoment semiring instantiated with the interval semiring I .We now defineM(đ)
I -valued expected-potential functions.
Definition 4.2. A measurable map đ : ÎŁ â M(đ)I is said
to be an expected-potential function if(i) đ (đ) = 1 if đ = âš_, skip,Kstop, _â©, and(ii) đ (đ) â đŒđâČâŒâŠâ(đ) [
âââââââââââââââââââââââââââââ([(đŒ âČ â đŒ)đ , (đŒ âČ â đŒ)đ ])0â€đâ€đ â đ (đ âČ)]where đ = âš_, _, _, đŒâ©, đ âČ = âš_, _, _, đŒ âČâ© for all đ â ÎŁ.
Intuitively, đ (đ) is an interval bound on the moments forthe accumulated cost of the computation that continues fromthe configuration đ . We define Ίđ and Yđ , where đ â â€+, tobeM(đ)
I -valued random variables on the probability space(Ω,F ,â) of the Markov-chain semantics as
Ίđ (đ) def= đ (đđ),Yđ (đ) def
=ââââââââââââââââââââââââââš[đŽđ (đ)đ , đŽđ (đ)đ ]â©0â€đâ€đâΊđ (đ).
In the definition of Yđ , we use â to compose the powers ofthe accumulated cost at step đ and the expected potentialfunction that stands for the moments of the accumulatedcost for the rest of the computation.
Lemma 4.3. By the properties of potential functions, we canprove that đŒ[Yđ+1 | Yđ] â Yđ almost surely, for all đ â â€+.
We call {Yđ}đââ€+ a moment invariant. Our goal is toestablish that đŒ[Yđ ] â đŒ[Y0], i.e., the initial interval-valued potential đŒ[Y0] = đŒ[1 â Ί0] = đŒ[Ί0] bracketsthe higher moments of the accumulated cost đŒ[Yđ ] =
đŒ[âââââââââââââââââš[đŽđ
đ, đŽđ
đ]â©0â€đâ€đ â 1] =
ââââââââââââââââââââââââš[đŒ[đŽđ
đ],đŒ[đŽđ
đ]]â©0â€đâ€đ .
Soundness. The soundness of the derivation system isproved with respect to the Markov-chain semantics. Letâ„ââââââââââââââââš[đđ , đđ ]â©0â€đâ€đ â„â def
= max0â€đâ€đ{max{|đđ |, |đđ |}}.Theorem 4.4. Let âšđ, đmainâ© be a probabilistic program. Sup-pose Î âą {Î;đ} đmain {ÎâČ; 1}, where đ â M(đ)
PI and theends of the đ-th interval in đ are polynomials in âđđ [VID].Let {Yđ}đââ€+ be the moment invariant extracted from the
Markov-chain semantics with respect to the derivation ofÎ âą {Î;đ} đmain {ÎâČ; 1}. If the following conditions hold:(i) đŒ[đđđ ] < â, and(ii) there exists đ¶ â„ 0 such that for all đ â â€+, â„Yđ â„â â€
𶠷 (đ + 1)đđ almost surely,
Thenââââââââââââââââââââââââš[đŒ[đŽđ
đ],đŒ[đŽđ
đ]]â©0â€đâ€đ â đđ (__.0).
The intuitive meaning ofââââââââââââââââââââââââš[đŒ[đŽđ
đ],đŒ[đŽđ
đ]]â©0â€đâ€đ â
đđ (__.0) is that the moment đŒ[đŽđđ] of the accumulated cost
upon program termination is bounded by the interval in theđ th-moment component of đđ (__.0), where đ is the quanti-tative context and __.0 is the initial state.
As we discussed in §2.2 and Ex. 2.7, the expected-potentialmethod is not always sound for deriving bounds on highermoments for cost accumulators in probabilistic programs.The extra conditions Thm. 4.4(i) and (ii) impose constraintson the analyzed program and the expected-potential func-tion, which allow us to reduce the soundness to the optionalstopping problem from probability theory.
Optional stopping. Let us represent the moment invari-ant {Yđ}đââ€+ as
{âš[đż (0)đ ,đ (0)đ ], [đż (1)đ ,đ (1)
đ ], · · · , [đż (đ)đ ,đ (đ)
đ ]â©}đââ€+ ,
where đż (đ)đ ,đ (đ)đ : Ω â â are real-valued random vari-
ables on the probability space (Ω,F ,â) of the Markov-chainsemantics, for đ â â€+, 0 †đ †đ. We then have the obser-vations below as direct corollaries of Lem. 4.3:âą For any đ , the sequence {đ (đ)
đ }đââ€+ satisfies đŒ[đ (đ)đ+1 |
đ (đ)đ ] †đ (đ)
đ almost surely, for all đ â â€+, and we wantto find sufficient conditions for đŒ[đ (đ)
đ] †đŒ[đ (đ)
0 ].âą For any đ , the sequence {đż (đ)đ }đââ€+ satisfies đŒ[đż (đ)
đ+1 |đż (đ)đ ] â„ đż (đ)đ almost surely, for all đ â â€+, and we want tofind sufficient conditions for đŒ[đż (đ)
đ] â„ đŒ[đż (đ)0 ].
These kinds of questions can be reduced to optional stop-ping problem from probability theory. Recent research [1,19, 38, 43] has used the Optional Stopping Theorem (OST)from probability theory to establish sufficient conditions forthe soundness for analysis of probabilistic programs. How-ever, the classic OST turns out to be not suitable for higher-moment analysis. We extend OST with a new sufficient con-dition that allows us to prove Thm. 4.4. We discuss the detailsof our extended OST in a companion paper [41]; in this work,we focus on the derivation system for central moments. Wealso include a brief discussion on OST in Appendix E.
4.3 An Algorithm for Checking Soundness CriteriaTermination Analysis. We reuse our system for auto-
matically deriving higher moments, which we developed in§3.3 and §3.4, for checking if đŒ[đđđ ] < â (Thm. 4.4(i)). Toreason about termination time, we assume that every pro-gram statement increments the cost accumulator by one. For
11
PLDI â21, June 20â25, 2021, Virtual, UK Di Wang, Jan Hoffmann, and Thomas Reps
example, the inference rule (Q-Sample) becomesÎ = âđ„ â supp(`đ· ) : ÎâČ đ = âš1, 1, · · · , 1â© â đŒđ„âŒ`đ· [đ âČ]
Î âą {Î;đ} đ„ ⌠đ· {ÎâČ;đ âČ}However, we cannot apply Thm. 4.4 for the soundness of thetermination-time analysis, because that would introduce acircular dependence. Instead, we use a different proof tech-nique to reason about đŒ[đđđ ], taking into account themono-tonicity of the runtimes. Because upper-bound analysis ofhigher moments of runtimes has been studied by Kura et al.[26], we skip the details, but include them in Appendix G.
Boundedness of â„đđ â„â, đ â â€+. To ensure that the con-dition in Thm. 4.4(ii) holds, we check if the analyzed programsatisfies the bounded-update property: every (deterministicor probabilistic) assignment to a program variable updatesthe variable with a change bounded by a constant đ¶ almostsurely. Then the absolute value of every program variable atevaluation step đ can be bounded by 𶠷 đ = đ (đ). Thus, apolynomial up to degree â â â over program variables can bebounded byđ (đâ ) at evaluation step đ. As observed byWanget al. [43], bounded updates are common in practice.
5 Tail-Bound AnalysisOne application of our central-moment analysis is to boundthe probability that the accumulated cost deviates from somegiven quantity. In this section, we sketch how we producethe tail bounds shown in Fig. 1(c).There are a lot of concentration-of-measure inequalities
in probability theory [13]. Among those, one of the mostimportant is Markovâs inequality:
Proposition 5.1. If đ is a nonnegative random variable andđ > 0, then â[đ â„ đ] †đŒ[đđ ]
đđfor any đ â â.
Recall that Fig. 1(b) presents upper bounds on the rawmoments đŒ[tick] †2đ + 4 and đŒ[tick2] †4đ2 + 22đ + 28for the cost accumulator tick. With Markovâs inequality, wederive the following tail bounds:
â[tick â„ 4đ] †đŒ[tick]4đ †2đ + 4
4đđââââââââ 1
2 , (8)
â[tick â„ 4đ] †đŒ[tick2](4đ)2 †4đ2 + 22đ + 28
16đ2đââââââââ 1
4 . (9)
Note that (9) provides an asymptotically more precise boundon â[tick â„ 4đ] than (8) does, when đ approaches infinity.
Central-moment analysis can obtain an even more precisetail bound. As presented in Fig. 1(b), our analysis derivesđ [tick] †22đ + 28 for the variance of tick. We can nowemploy concentration inequalities that involve variances ofrandom variables. Recall Cantelliâs inequality:
Proposition 5.2. If đ is a random variable and đ > 0, thenâ[đâđŒ[đ ] â„đ] †đ [đ ]
đ [đ ]+đ2 and â[đâđŒ[đ ] â€âđ] †đ [đ ]đ [đ ]+đ2 .
With Cantelliâs inequality, we obtain the following tailbound, where we assume đ â„ 2:
â[tick â„ 4đ] = â[tick â (2đ + 4) â„ 2đ â 4]
†â[tick â đŒ[tick] â„ 2đ â 4] †đ [tick]đ [tick] + (2đ â 4)2
= 1 â (2đ â 4)2đ [tick] + (2đ â 4)2 †22đ + 28
4đ2 + 6đ + 44đââââââââ 0.
(10)
For all đ â„ 15, (10) gives a more precise bound than both(8) and (9). It is also clear from Fig. 1(c), where we plot thethree tail bounds (8), (9), and (10), that the asymptoticallymost precise bound is the one obtained via variances.
In general, for higher central moments, we employ Cheby-shevâs inequality to derive tail bounds:
Proposition 5.3. If đ is a random variable and đ > 0, thenâ[|đ â đŒ[đ ] | â„ đ] †đŒ[ (đâđŒ[đ ])2đ ]
đ2đfor any đ â â.
In our experiments, we use Chebyshevâs inequality toderive tail bounds from the fourth central moments. We willshow in Fig. 9 that these tail bounds can be more precisethan those obtained from both raw moments and variances.
6 Implementation and ExperimentsImplementation. Our tool is implemented in OCaml, and
consists of about 5,300 LOC. The tool works on imperativearithmetic probabilistic programs using a CFG-based IR [40].The language supports recursive functions, continuous dis-tributions, unstructured control-flow, and local variables. Toinfer the bounds on the central moments for a cost accumu-lator in a program, the user needs to specify the order ofthe analyzed moment, and a maximal degree for the poly-nomials to be used in potential-function templates. UsingAPRON [22], we implemented an interprocedural numericanalysis to infer the logical contexts used in the derivation.We use the off-the-shelf solver Gurobi [28] for LP solving.
Evaluation setup. We evaluated our tool to answer thefollowing three research questions:1. How does the raw-moment inference part of our tool
compare to existing techniques for expected-cost boundanalysis [31, 43]?
2. How does our tool compare to the state of the art in tail-probability analysis (which is based only on higher rawmoments [26])?
3. How scalable is our tool? Can it analyze programs withmany recursive functions?For the first question, we collected a broad suite of chal-
lenging examples from related work [26, 31, 43] with dif-ferent loop and recursion patterns, as well as probabilisticbranching, discrete sampling, and continuous sampling. Ourtool achieved comparable precision and efficiency with theprior work on expected-cost bound analysis [31, 43]. Thedetails are included in Appendix H.
12
Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI â21, June 20â25, 2021, Virtual, UK
For the second question, we evaluated our tool on the com-plete benchmarks from Kura el al. [26]. We also conducted acase study of a timing-attack analysis for a program providedby DARPA during engagements of the STAC program [15],where central moments are more useful than raw momentsto bound the success probability of an attacker. We includethe case study in Appendix I.
For the third question, we conducted case studies on twosets of synthetic benchmark programs:âą coupon-collector programs withđ coupons (đ â [1, 103]),where each program is implemented as a set of tail-recursive functions, each of which represents a state ofcoupon collection, i.e., the number of coupons collectedso far; and
âą random-walk programs with đ consecutive one-dimensional random walks (đ â [1, 103]), each of whichstarts at a position that equals the number of steps takenby the previous random walk to reach the ending position(the origin). Each program is implemented as a set ofnon-tail-recursive functions, each of which represents arandom walk. The random walks in the same programcan have different transition probabilities.
The largest synthetic program has nearly 16,000 LOC. Wethen ran our tool to derive an upper bound on the fourth(resp., second) central moment of the runtime for eachcoupon-collector (resp., random-walk) program.The experiments were performed on a machine with an
Intel Core i7 3.6GHz processor and 16GB of RAM undermacOS Catalina 10.15.7.
Results. Some of the evaluation results to answer the sec-ond research question are presented in Tab. 1. The programs(1-1) and (1-2) are coupon-collector problems with a totalof two and four coupons, respectively. The programs (2-1)and (2-2) are one-dimensional random walks with integer-valued and real-valued coordinates, respectively. We omitthree other programs here but include the full results in Ap-pendix H. The table contains the inferred upper bounds onthe moments for runtimes of these programs, and the run-ning times of the analyses. We compared our results withKura et al.âs inference tool for raw moments [26]. Our toolis as precise as, and sometimes more precise than the priorwork on all the benchmark programs. Meanwhile, our tool isable to infer an upper bound on the raw moments of degreeup to four on all the benchmarks, while the prior work re-ports failure on some higher moments for the random-walkprograms. In terms of efficiency, our tool completed eachexample in less than 10 seconds, while the prior work tookmore than a fewminutes on some programs. One reason whyour tool is more efficient is that we always reduce higher-moment inference with non-linear polynomial templates toefficient LP solving, but the prior work requires semidefiniteprogramming (SDP) for polynomial templates.
Table 1. Upper bounds on the raw/central moments of run-times, with comparison to Kura et al. [26]. âT/Oâ stands fortimeout after 30 minutes. âN/Aâ means that the tool is notapplicable. â-â indicates that the tool fails to infer a bound.Entries with more precise results or less analysis time aremarked in bold. Full results are included in Appendix H.
program moment this work Kura et al. [26]upper bnd. time (sec) upper bnd. time (sec)
(1-1)
2nd raw 201 0.019 201 0.0153rd raw 3829 0.019 3829 0.0204th raw 90705 0.023 90705 0.027
2nd central 32 0.029 N/A N/A4th central 9728 0.058 N/A N/A
(1-2)
2nd raw 2357 1.068 3124 0.0373rd raw 148847 1.512 171932 0.0624th raw 11285725 1.914 12049876 0.096
2nd central 362 3.346 N/A N/A4th central 955973 9.801 N/A N/A
(2-1)
2nd raw 2320 0.016 2320 11.3803rd raw 691520 0.018 - 16.0564th raw 340107520 0.021 - 23.414
2nd central 1920 0.026 N/A N/A4th central 289873920 0.049 N/A N/A
(2-2)
2nd raw 8375 0.022 8375 38.4633rd raw 1362813 0.028 - 73.4084th raw 306105209 0.035 - 141.072
2nd central 5875 0.029 N/A N/A4th central 447053126 0.086 N/A N/A
Besides raw moments, our tool is also capable of inferringupper bounds on the central moments of runtimes for thebenchmarks. To evaluate the quality of the inferred centralmoments, Fig. 9 plots the upper bounds of tail probabilitieson runtimes đ obtained by Kura et al. [26], and those byour central-moment analysis. Specifically, the prior workuses Markovâs inequality (Prop. 5.1), while we are also ableto apply Cantelliâs and Chebyshevâs inequality (Props. 5.2and 5.3) with central moments. Our tool outperforms theprior work on programs (1-1) and (1-2), and derives bettertail bounds when đ is large on program (2-2), while it obtainssimilar curves on program (2-1).
Scalability. In Fig. 10, we demonstrate the running timesof our tool on the two sets of synthetic benchmark programs;Fig. 10a plots the analysis times for coupon-collector pro-grams as a function of the independent variable đ (the totalnumber of coupons), and Fig. 10b plots the analysis times forrandom-walk programs as a function of đ (the total numberof random walks). The evaluation statistics show that ourtool achieves good scalability in both case studies: the run-time is almost a linear function of the program size, whichis the number of recursive functions for both case studies.Two reasons why our tool is scalable on the two sets of pro-grams are (i) our analysis is compositional and uses functionsummaries to analyze function calls, and (ii) for a fixed set of
13
PLDI â21, June 20â25, 2021, Virtual, UK Di Wang, Jan Hoffmann, and Thomas Reps
20 30 40 đ0
0.1
0.2
tailprobability
â[đ
â„đ]
(1-1)
by raw moments [26]by 2nd central momentby 4th central moment
100 150 đ0
0.1
0.2
(1-2)
by raw moments [26]by 2nd central momentby 4th central moment
200 400 đ0
0.1
0.2
tailprobability
â[đ
â„đ]
(2-1)
by raw moments [26]by 2nd central momentby 4th central moment
200 400 đ0
0.1
0.2
(2-2)
by raw moments [26]by 2nd central momentby 4th central moment
Figure 9. Upper bound of the tail probability â[đ â„ đ] as afunction of đ , with comparison to Kura et al. [26]. Each grayline is the minimum of tail bounds obtained from the rawmoments of degree up to four inferred by Kura et al. [26].Green lines and red lines are the tail bounds given by 2ndand 4th central moments inferred by our tool, respectively.We include the plots for other programs in Appendix H.
0 200 400 600 800 1000N
0
2000
4000
6000
8000
runn
ing
time
(ms)
(a) Coupon Collector
0 200 400 600 800 1000N
0
500
1000
1500
2000
2500
3000
3500
runn
ing
time
(ms)
(b) Random Walk
Figure 10. Running times of our tool on two sets of syntheticbenchmark programs. Each figure plots the runtimes as afunction of the size of the analyzed program.
templates and a fixed diameter of the call graph, the numberof linear constraints generated by our tool grows linearlywith the size of the program, and the LP solvers availablenowadays can handle large problem instances efficiently.
Discussion. Higher central moments can also providemore information about the shape of a distribution, e.g., theskewness (i.e., đŒ[ (đâđŒ[đ ])
3 ](đ [đ ]) 3/2 ) indicates how lopsided the distri-
bution of đ is, and the kurtosis (i.e., đŒ[ (đâđŒ[đ ])4 ](đ [đ ])2 ) measures
the heaviness of the tails of the distribution of đ . We usedour tool to analyze two variants of the random-walk pro-gram (2-1). The two random walks have different transitionprobabilities and step length, but they have the same ex-pected runtime đŒ[đ ]. Tab. 2 presents the skewness and kur-tosis derived from the moment bounds inferred by our tool.
100 200 300 400 5000
2
4
·10â3
density
estim
ation
rdwalk-1rdwalk-2đŒ[đ ]
600 700 800 900 đ0
1
2
·10â4rdwalk-1rdwalk-2
Figure 11. Density estimation for the runtimeđ of two vari-ants rdwalk-1 and rdwalk-2 of (2-1).
Table 2. Skewness & kurtosis.program skewness kurtosisrdwalk-1 2.1362 10.5633rdwalk-2 2.9635 17.5823
A positive skew indicatesthat the mass of the distribu-tion is concentrated on theleft, and larger skew meansthe concentration is more left. A larger kurtosis, on theother hand, indicates that the distribution has fatter tails.Therefore, as the derived skewness and kurtosis indicate, thedistribution of the runtime đ for rdwalk-2 should be moreleft-leaning and have fatter tails than the distribution forrdwalk-1. Density estimates for the runtime đ , obtained bysimulation, are shown in Fig. 11.Our tool can also derive symbolic bounds on higher mo-
ments. The table below presents the inferred upper boundson the variances for the random-walk benchmarks, wherewe replace the concrete inputs with symbolic pre-conditions.
program pre-condition upper bnd. on the variance(2-1) đ„ â„ 0 1920đ„(2-2) đ„ â„ 0 2166.6667đ„ + 1541.6667
7 ConclusionWe have presented an automatic central-moment analysisfor probabilistic programs that support general recursionand continuous distributions, by deriving symbolic intervalbounds on higher raw moments for cost accumulators. Wehave proposed moment semirings for compositional reason-ing and moment-polymorphic recursion for interproceduralanalysis. We have proved soundness of our technique usinga recent extension to the Optional Stopping Theorem. Theeffectiveness of our technique has been demonstrated withour prototype implementation and the analysis of a broadsuite of benchmarks, as well as a tail-bound analysis.In the future, we plan to go beyond arithmetic programs
and add support for more datatypes, e.g., Booleans and lists.We will also work on other kinds of uncertain quantitiesfor probabilistic programs. Another research direction is toapply our analysis to higher-order functional programs.
AcknowledgmentsThis article is based on research supported, in part, by a giftfrom Rajiv and Ritu Batra; by ONR under grants N00014-17-1-2889 and N00014-19-1-2318; by DARPA under AA con-tract FA8750-18-C0092; and by the NSF under SaTC award1801369, SHF awards 1812876 and 2007784, and CAREERaward 1845514. Any opinions, findings, and conclusions orrecommendations expressed in this publication are those of
14
Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI â21, June 20â25, 2021, Virtual, UK
the authors, and do not necessarily reflect the views of thesponsoring agencies.
We thank Jason Breck and John Cyphert for formulating,and providing us with an initial formalization of the problemof timing-attack analysis.
References[1] Gilles Barthe, Thomas Espitau, Luis MarĂa Ferrer Fioriti, and Justin Hsu.
2016. Synthesizing Probabilistic Invariants via Doobâs Decomposition.In Computer Aided Verif. (CAVâ16). https://doi.org/10.1007/978-3-319-41528-4_3
[2] Gilles Barthe, Thomas Espitau, Marco Gaboardi, Benjamin GrĂ©goire,Justin Hsu, and Pierre-Yves Strub. 2018. An Assertion-Based ProgramLogic for Probabilistic Programs. In European Symp. on Programming(ESOPâ18). https://doi.org/10.1007/978-3-319-89884-1_5
[3] Gilles Barthe, Benjamin GrĂ©goire, and Santiago Zanella BĂ©guelin. 2009.Formal Certification of Code-Based Cryptographic Proofs. In Princ. ofProg. Lang. (POPLâ09). https://doi.org/10.1145/1594834.1480894
[4] Ezio Bartocci, Laura KovĂĄcs, and Miroslav StankoviÄ. 2019. AutomaticGeneration of Moment-Based Invariants for Prob-Solvable Loops. InAutomated Tech. for Verif. and Analysis (ATVAâ19). https://doi.org/10.1007/978-3-030-31784-3_15
[5] Kevin Batz, Benjamin Lucien Kaminski, Joost-Pieter Katoen, andChristoph Matheja. 2018. How long, O Bayesian network, will Isample thee?. In European Symp. on Programming (ESOPâ18). https://doi.org/10.1007/978-3-319-89884-1_7
[6] Patrick Billingsley. 2012. Probability and Measure. John Wiley & Sons,Inc.
[7] Johannes Borgström, Ugo Dal Lago, Andrew D. Gordon, and MarcinSzymczak. 2016. A Lambda-Calculus Foundation for UniversalProbabilistic Programming. In Int. Conf. on Functional Programming(ICFPâ16). https://doi.org/10.1145/2951913.2951942
[8] Olivier Bouissou, Eric Goubault, Sylvie Putot, Aleksandar Chakarov,and Sriram Sankaranarayanan. 2016. Uncertainty Propagation UsingProbabilistic Affine Forms and Concentration of Measure Inequalities.In Tools and Algs. for the Construct. and Anal. of Syst. (TACASâ16).https://doi.org/10.1007/978-3-662-49674-9_13
[9] Quentin Carbonneaux, Jan Hoffmann, Thomas Reps, and Zhong Shao.2017. Automated Resource Analysis with Coq Proof Objects. In Com-puter Aided Verif. (CAVâ17). https://doi.org/10.1007/978-3-319-63390-9_4
[10] Aleksandar Chakarov and Sriram Sankaranarayanan. 2013. Proba-bilistic Program Analysis with Martingales. In Computer Aided Verif.(CAVâ13). https://doi.org/10.1007/978-3-642-39799-8_34
[11] Krishnendu Chatterjee, Hongfei Fu, and Amir Kafshdar Goharshady.2016. Termination Analysis of Probabilistic Programs Through Posi-tivstellensatzâs. In Computer Aided Verif. (CAVâ16). https://doi.org/10.1007/978-3-319-41528-4_1
[12] Krishnendu Chatterjee, Hongfei Fu, Petr NovotnĂœ, and RouzbehHasheminezhad. 2016. Algorithmic Analysis of Qualitative and Quanti-tative Termination Problems for Affine Probabilistic Programs. In Princ.of Prog. Lang. (POPLâ16). https://doi.org/10.1145/2837614.2837639
[13] Devdatt P. Dubhashi and Alessandro Panconesi. 2009. Concentrationof Measure for the Analysis of Randomized Algorithms. CambridgeUniversity Press. https://doi.org/10.1017/CBO9780511581274
[14] Luis MarĂa Ferrer Fioriti and Holger Hermanns. 2015. Probabilistic Ter-mination: Soundness, Completeness, and Compositionality. In Princ.of Prog. Lang. (POPLâ15). https://doi.org/10.1145/2676726.2677001
[15] Dustin Fraze. 2015. Space/Time Analysis for Cybersecurity (STAC).Available on https://www.darpa.mil/program/space-time-analysis-for-cybersecurity.
[16] Zoubin Ghahramani. 2015. Probabilistic machine learning and arti-ficial intelligence. Nature 521 (May 2015). https://doi.org/10.1038/
nature14541[17] MichĂšle Giry. 1982. A Categorical Approach to Probability Theory. In
Categorical Aspects of Topology and Analysis. https://doi.org/10.1007/BFb0092872
[18] Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sri-ram K. Rajamani. 2014. Probabilistic Programming. In Future of Softw.Eng. (FOSEâ14). https://doi.org/10.1145/2593882.2593900
[19] Marcel Hark, Benjamin Lucien Kaminski, JĂŒrgen Giesl, and Joost-PieterKatoen. 2019. Aiming Low Is Harder: Induction for Lower Bounds inProbabilistic Program Verification. Proc. ACM Program. Lang. 4, POPL(December 2019). https://doi.org/10.1145/3371105
[20] Jan Hoffmann and Martin Hofmann. 2010. Amortized Resource Anal-ysis with Polymorphic Recursion and Partial Big-Step OperationalSemantics. In Asian Symp. on Prog. Lang. and Systems (APLASâ10).https://doi.org/10.1007/978-3-642-17164-2_13
[21] Martin Hofmann and Steffen Jost. 2003. Static Prediction of HeapSpace Usage for First-Order Functional Programs. In Princ. of Prog.Lang. (POPLâ03). https://doi.org/10.1145/604131.604148
[22] Bertrand Jeannet and Antoine MinĂ©. 2009. Apron: A Library of Nu-merical Abstract Domains for Static Analysis. In Computer Aided Verif.(CAVâ09). https://doi.org/10.1007/978-3-642-02658-4_52
[23] Xia Jiang and Gregory F. Cooper. 2010. A Bayesian spatio-temporalmethod for disease outbreak detection. J. American Medical InformaticsAssociation 17, 4 (July 2010). https://doi.org/10.1136/jamia.2009.000356
[24] Benjamin Lucien Kaminski, Joost-Pieter Katoen, Christoph Matheja,and Federico Olmedo. 2016. Weakest Precondition Reasoning forExpected RunâTimes of Probabilistic Programs. In European Symp.on Programming (ESOPâ16). https://doi.org/10.1007/978-3-662-49498-1_15
[25] Dexter Kozen. 1981. Semantics of Probabilistic Programs. J. Comput.Syst. Sci. 22, 3 (June 1981). https://doi.org/10.1016/0022-0000(81)90036-2
[26] Satoshi Kura, Natsuki Urabe, and Ichiro Hasuo. 2019. Tail Probabilityfor Randomized Program Runtimes via Martingales for Higher Mo-ments. In Tools and Algs. for the Construct. and Anal. of Syst. (TACASâ19).https://doi.org/10.1007/978-3-030-17465-1_8
[27] Zhifei Li and Jason Eisner. 2009. First- and Second-Order ExpectationSemirings with Applications to Minimum-Risk Training on Trans-lation Forests. In Empirical Methods in Natural Language Processing(EMNLPâ09). https://dl.acm.org/doi/10.5555/1699510.1699517
[28] Gurobi Optimization LLC. 2020. Gurobi Optimizer Reference Manual.Available on https://www.gurobi.com.
[29] Guido Manfredi and Yannick Jestin. 2016. An Introduction to ACASXu and the Challenges Ahead. In Digital Avionics Systems Conference(DASCâ16). https://doi.org/10.1109/DASC.2016.7778055
[30] Annabelle K. McIver and Carroll C. Morgan. 2005. Abstraction, Refine-ment and Proof for Probabilistic Systems. Springer Science+BusinessMedia, Inc. https://doi.org/10.1007/b138392
[31] Van Chan Ngo, Quentin Carbonneaux, and Jan Hoffmann. 2018.Bounded Expectations: Resource Analysis for Probabilistic Programs.In Prog. Lang. Design and Impl. (PLDIâ18). https://doi.org/10.1145/3192366.3192394
[32] Van Chan Ngo, Mario Dehesa-Azuara, Matthew Fredrikson, and JanHoffmann. 2017. Verifying and Synthesizing Constant-ResourceImplementations with Types. In Symp. on Sec. and Privacy (SPâ17).https://doi.org/10.1109/SP.2017.53
[33] Federico Olmedo, Benjamin Lucien Kaminski, Joost-Pieter Katoen, andChristoph Matheja. 2016. Reasoning about Recursive ProbabilisticPrograms. In Logic in Computer Science (LICSâ16). https://doi.org/10.1145/2933575.2935317
[34] Prakash Panangaden. 1999. The Category of Markov Kernels. Electr.Notes Theor. Comp. Sci. 22 (1999). https://doi.org/10.1016/S1571-0661(05)80602-4
[35] Martin L. Puterman. 1994.Markov Decision Processes: Discrete StochasticDynamic Programming. John Wiley & Sons, Inc. https://dl.acm.org/
15
PLDI â21, June 20â25, 2021, Virtual, UK Di Wang, Jan Hoffmann, and Thomas Reps
doi/book/10.5555/528623[36] Reuven Y. Rubinstein and Dirk P. Kroese. 2016. Simulation and the
Monte Carlo Method. John Wiley & Sons, Inc. https://doi.org/10.1002/9781118631980
[37] Sriram Sankaranarayanan, Aleksandar Chakarov, and Sumit Gulwani.2013. Static Analysis for Probabilistic Programs: Inferring WholeProgram Properties from Finitely Many Paths. In Prog. Lang. Designand Impl. (PLDIâ13). https://doi.org/10.1145/2491956.2462179
[38] Anne Schreuder and Luke Ong. 2019. Polynomial Probabilistic Invari-ants and the Optional Stopping Theorem. https://arxiv.org/abs/1910.12634
[39] Robert Endre Tarjan. 1985. Amortized Computational Complexity.SIAM J. Algebraic Discrete Methods 6, 2 (August 1985). https://doi.org/10.1137/0606031
[40] Di Wang, Jan Hoffmann, and Thomas Reps. 2018. PMAF: An AlgebraicFramework for Static Analysis of Probabilistic Programs. In Prog. Lang.Design and Impl. (PLDIâ18). https://doi.org/10.1145/3192366.3192408
[41] Di Wang, Jan Hoffmann, and Thomas Reps. 2021. Expected-CostAnalysis for Probabilistic Programs and Semantics-Level Adaption ofOptional Stopping Theorems. https://arxiv.org/abs/2103.16105
[42] Di Wang, David M. Kahn, and Jan Hoffmann. 2020. Raising Expec-tations: Automating Expected Cost Analysis with Types. Proc. ACMProgram. Lang. 4, ICFP (August 2020). https://doi.org/10.1145/3408992
[43] Peixin Wang, Hongfei Fu, Amir Kafshdar Goharshady, KrishnenduChatterjee, Xudong Qin, and Wenjun Shi. 2019. Cost Analysis ofNondeterministic Probabilistic Programs. In Prog. Lang. Design andImpl. (PLDIâ19). https://doi.org/10.1145/3314221.3314581
[44] David Williams. 1991. Probability with Martingales. Cambridge Uni-versity Press. https://doi.org/10.1017/CBO9780511813658
16
Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI â21, June 20â25, 2021, Virtual, UK
A Preliminaries on Measure TheoryInterested readers can refer to textbooks and notes in theliterature [6, 44] for more details.
A.1 BasicsA measurable space is a pair (đ,S), where đ is a nonemptyset, and S is a đ-algebra on đ , i.e., a family of subsets of đ thatcontains â and is closed under complement and countableunions. The smallest đ-algebra that contains a family A ofsubsets of đ is said to be generated by A, denoted by đ (A).Every topological space (đ, đ) admits a Borel đ-algebra, givenby đ (đ). This gives canonical đ-algebras on â, â, â, etc. Afunction đ : đ â đ , where (đ,S) and (đ, T ) are measur-able spaces, is said to be (S, T )-measurable, if đ â1 (đ”) â Sfor each đ” â T . If đ = â, we tacitly assume that the Borelđ-algebra is defined on đ , and we simply call đ measurable,or a random variable. Measurable functions form a vectorspace, and products, maxima, and limiting operations pre-serve measurability.A measure ` on a measurable space (đ,S) is a mapping
from S to [0,â] such that (i) ` (â ) = 0, and (ii) for allpairwise-disjoint {đŽđ}đââ€+ in S , it holds that ` (âđââ€+ đŽđ ) =âđââ€+ ` (đŽđ ). The triple (đ,S, `) is called a measure space. A
measure ` is called a probability measure, if ` (đ) = 1. Wedenote the collection of probability measures on (đ,S) byđ»(đ,S). For each đ„ â đ , the Dirac measure đż (đ„) is definedas _đŽ.[đ„ â đŽ]. For measures ` and a , we write ` + a for themeasure _đŽ.` (đŽ) +a (đŽ). For measure ` and scalar đ â„ 0, wewrite đ · ` for the measure _đŽ.đ · ` (đŽ).
The integral of a measurable function đ on đŽ â S with re-spect to ameasure ` on (đ,S) is defined following Lebesgueâstheory and is denoted by ` (đ ;đŽ),
â«đŽđ đ`, or
â«đŽđ (đ„)` (đđ„). If
` is a probability measure, we call the integral as the expec-tation of đ , written đŒđ„âŒ` [đ ;đŽ], or simply đŒ[đ ;đŽ] when thescope is clear in the context. If đŽ = đ , we tacitly omit đŽ fromthe notations. For eachđŽ â S , it holds that ` (đ ;đŽ) = ` (đ IđŽ),where IđŽ is the indicator function for đŽ. We denote the col-lection of integrable functions on (đ,S, `) by L1 (đ,S, `).
A kernel from a measurable space (đ,S) to another (đ, T )is a mapping from đĂT to [0,â] such that: (i) for each đ„ â đ ,the function _đ”.^ (đ„, đ”) is a measure on (đ, T ), and (ii) foreach đ” â T , the function _đ„.^ (đ„, đ”) is measurable. We write^ : (đ,S) â (đ, T ) to declare that ^ is a kernel from (đ,S)to (đ, T ). Intuitively, kernels describe measure transformersfrom one measurable space to another. A kernel ^ is calleda probability kernel, if ^ (đ„,đ ) = 1 for all đ„ â đ . We denotethe collection of probability kernels from (đ,S) to (đ, T ) byđ((đ,S), (đ, T )). If the two measurable spaces coincide, wesimply writeđ(đ,S). We can âpush-forwardâ a measure ` on(đ,S) to a measure on (đ, T ) through a kernel ^ : (đ,S) â(đ, T ) by integration:1 ` â«= ^
def= _đ”.
â«đ^ (đ„, đ”)` (đđ„).
1We use amonad bind notation â«= here. Indeed, the category of measurablespaces admits a monad with sub-probability measures [17, 34].
A.2 Product MeasuresThe product of two measurable spaces (đ,S) and (đ, T ) isdefined as (đ,S)â (đ, T ) def
= (đĂđ,SâT ), where SâT is thesmallest đ-algebra that makes coordinate maps measurable,i.e., đ ({đâ1
1 (đŽ) | đŽ â S} âȘ {đâ12 (đ”) | đ” â T })), where đđ is
the đ-the coordinate map. If `1 and `2 are two probabilitymeasures on (đ,S) and (đ, T ), respectively, then there existsa unique probability measure ` on (đ,S)â (đ, T ), by Fubiniâstheorem, called the product measure of `1 and `2, written`1 â `2, such that ` (đŽĂđ”) = `1 (đŽ) · `2 (đ”) for all đŽ â S andđ” â T .If ` is a probability measure on (đ,S) and ^ : (đ,S) â
(đ, T ) is a probability kernel, then we can construct the aprobability measure on (đ,S) â (đ, T ) that captures all tran-sitions from ` through ^: ` â^ def
= _(đŽ, đ”).â«đŽ^ (đ„, đ”)` (đđ„). If
` is a probability measure on (đ0,S0) and ^đ : (đđâ1,Sđâ1) â(đđ ,Sđ ) is a probability kernel for đ = 1, · · · , đ,where đ â â€+,then we can construct a probability measure on
âđ
đ=0 (đđ ,Sđ ),i.e., the space of sequences of đ transitions by iteratively ap-plying the kernels to `:
` ââ0
đ=1^đ
def= `,
` ââđ+1
đ=1^đ
def= (` â
âđ
đ=1^đ ) â ^đ+1, 0 †đ < đ.
Let (đđ ,Sđ ), đ â I be a family of measurable spaces. Theirproduct, denoted by
âđâI (đđ ,Sđ ) = (âđâI đđ ,
âđâI Sđ ), is
the product space with the smallest đ-algebra such that foreach đ â I , the coordinatemap đđ is measurable. The theorembelow is widely used to construct a probability measures onan infinite product via probability kernels.
Proposition A.1 (Ionescu-Tulcea). Let (đđ ,Sđ ), đ â â€+ be asequence of measurable spaces. Let `0 be a probability measureon (đ0,S0). For each đ â â, let ^đ :
âđâ1đ=0 (đđ ,Sđ ) â (đđ ,Sđ )
be a probability kernel. Then there exists a sequence of proba-
bility measures `đdef= `0 â
âđ
đ=1 ^đ , đ â â€+, and there existsa uniquely defined probability measure ` on
ââđ=0 (đđ ,Sđ )
such that `đ (đŽ) = ` (đŽ Ă ââđ=đ+1 đđ ) for all đ â â€+ and
đŽ â âđ
đ=0 Sđ .
A.3 Conditional ExpectationsLet (Ω,F ,â) be a measure space, where â is a probabilitymeasure. We write L1 for L1 (Ω,F ,â). Let đ â L1 be anintegrable random variable, and G â F be a sub-đ-algebraof F . Then there exists a random variable đ â L1 such that:(i) đ is G-measurable, and (ii) for each đș â G, it holds thatđŒ[đ ;đș] = đŒ[đ ;đș]. Such a random variable đ is said to bea version of conditional expectation đŒ[đ | G] of đ givenG. Conditional expectations admit almost-sure uniqueness.On the other hand, if đ is a nonnegative random variable,the conditional expectation đŒ[đ | G] is also defined andalmost-surely unique.
17
PLDI â21, June 20â25, 2021, Virtual, UK Di Wang, Jan Hoffmann, and Thomas Reps
Intuitively, for some đ â Ω, đŒ[đ | G] (đ) is the expecta-tion ofđ given the set of valuesđ (đ) for everyG-measurablerandom variable đ . For example, if G = {â ,Ω}, which con-tains no information, then đŒ[đ | G] (đ) = đŒ[đ ]. If G = F ,which contains full information, then đŒ[đ | G] (đ) = đ (đ).
We review some useful properties of conditional expecta-tions below.
Proposition A.2. Let đ â L1, and G,H be sub-đ-algebrasof F .(1) If đ is any version of đŒ[đ | G], then đŒ[đ ] = đŒ[đ ].(2) If đ is G-measurable, then đŒ[đ | G] = đ , a.s.(3) IfH is a sub-đ-algebra of G, then đŒ[đŒ(đ | G) | H] = đŒ[đ |
H], a.s.(4) If đ is G-measurable and đ · đ â L1, then đŒ[đ · đ | G] =
đ · đŒ[đ | G], a.s. On the other hand, if đ is a nonnega-tive random variable and đ is a nonnegative G-measurablerandom variable, then the property also holds, with the un-derstanding that both sides might be infinite.
A.4 Convergence TheoremsWe review two important convergence theorems for seriesof random variables.
Proposition A.3 (Monotone convergence theorem). If{đđ}đââ€+ is a non-decreasing sequence of nonnegativemeasurable functions on a measure space (đ,S, `), and{đđ}đââ€+ converges to đ pointwise, then đ is measurable andlimđââ ` (đđ) = ` (đ ) †â.
Further, the theorem still holds if đ is chosen as a measurablefunction and â{đđ}đââ€+ converges to đ pointwiseâ holds almosteverywhere, rather than everywhere.
Proposition A.4 (Dominated convergence theorem). If{đđ}đââ€+ is a sequence of measurable functions on a mea-sure space (đ,S, `), {đđ}đââ€+ converges to đ pointwise, and{đđ}đââ€+ is dominated by a nonnegative integrable function đ(i.e., |đđ (đ„) | †đ(đ„) for all đ â â€+, đ„ â đ), then đ is integrableand limđââ ` (đđ) = ` (đ ).Further, the theorem still holds if đ is chosen as a measur-
able function and â{đđ}đââ€+ converges to đ pointwise and isdominated by đâ holds almost everywhere.
B Operational Cost SemanticsWe follow a distribution-based approach [7, 25] to definereduction rules for the semantics of Appl. A probabilisticsemantics steps a program configuration to a probabilitydistribution on configurations. To formally describe thesedistributions, we need to construct a measurable space ofprogram configurations. Our approach is to construct a mea-surable space for each of the four components of configura-tions, and then use their product measurable space as thesemantic domain.
âą Valuations đŸ : VID â â are finite real-valued maps,so we define (đ ,V) def
= (âVID,B(âVID)) as the canonicalstructure on a finite-dimensional space.
âą The executing statement đ can contain real numbers, sowe need to âliftâ the Borel đ-algebra on â to programstatements. Intuitively, statements with exactly the samestructure can be treated as vectors of parameters that cor-respond to their real-valued components. Formally, weachieve this by constructing a metric space on statementsand then extracting a Borel đ-algebra from the metricspace. Fig. 12 presents a recursively defined metric đđ onstatements, as well as metrics đđž , đđż , and đđ· on expres-sions, conditions, and distributions, respectively, as theyare required by đđ . We denote the result measurable spaceby (đ,S).
âą A continuation đŸ is either an empty continuation Kstop,a loop continuation Kloop đż đ đŸ , or a sequence contin-uation Kseq đ đŸ . Similarly, we construct a measurablespace (đŸ,K) on continuations by extracting from a met-ric space. Fig. 12 shows the definition of a metric đđŸ oncontinuations.
âą The cost accumulator đŒ â â is a real number, so we define(đ,W) def
= (â,B(â)) as the canonical measurable spaceon â.
Then the semantic domain is defined as the product mea-surable space of the four components: (ÎŁ,O) def
= (đ ,V) â(đ,S) â (đŸ,K) â (đ,W).Fig. 13 presents the rules of the evaluation relation đ âŠâ `
for Appl where đ is a configuration and ` is a probabilitydistribution on configurations. Note that inAppl, expressionsđž and conditions đż are deterministic, so we define a standardbig-step evaluation relation for them, written đŸ âą đž â đ andđŸ âą đż â đ, where đŸ is a valuation, đ â â, and đ â {â€,â„}.Most of the rules in Fig. 13, except (E-Sample) and (E-Prob),are also deterministic as they step to a Dirac measure.The evaluation relation âŠâ can be interpreted as a distri-
bution transformer. Indeed, âŠâ can be seen as a probabilitykernel.
Lemma B.1. Let đŸ : VID â â be a valuation.âą Let đž be an expression. Then there exists a unique đ â â
such that đŸ âą đž â đ .âą Let đż be a condition. Then there exists a unique đ â {â€,â„}such that đŸ âą đż â đ.
Proof. By induction on the structure of đž and đż. âĄ
Lemma B.2. For every configuration đ â ÎŁ, there exists aunique ` â đ»(ÎŁ,O) such that đ âŠâ `.
Proof. Let đ = âšđŸ, đ, đŸ, đŒâ©. Then by case analysis on the struc-ture of đ , followed by a case analysis on the structure of đŸ ifđ = skip. The rest of the proof appeals to Lem. B.1. âĄ
Theorem B.3. The evaluation relation âŠâ defines a probabil-ity kernel on program configurations.
18
Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI â21, June 20â25, 2021, Virtual, UK
đđž (đ„, đ„) def= 0
đđž (đ1, đ2) def= |đ1 â đ2 |
đđž (đž11 + đž12, đž21 + đž22) def= đđž (đž11, đž21) + đđž (đž12, đž22)
đđž (đž11 Ă đž12, đž21 Ă đž22) def= đđž (đž11, đž21) + đđž (đž12, đž22)
đđž (đž1, đž2) def= â otherwise
đđż (true, true) def= 0
đđż (not đż1,not đż2) def= đđż (đż1, đż2)
đđż (đż11 and đż12, đż21 and đż22) def= đđż (đż11, đż21) + đđż (đż12, đż22)
đđż (đž11 †đž12, đž21 †đž22) def= đđž (đž11, đž21) + đđž (đž12, đž22)
đđż (đż1, đż2) def= â otherwise
đđ· (uniform(đ1, đ1),uniform(đ2, đ2)) def= |đ1 â đ2 | + |đ1 â đ2 |
đđ· (đ·1, đ·2) def= â otherwise
đđ (skip, skip) def= 0
đđ (tick(đ1), tick(đ2)) def= |đ1 â đ2 |
đđ (đ„ B đž1, đ„ B đž2) def= đđž (đž1, đž2)
đđ (đ„ ⌠đ·1, đ„ ⌠đ·2) def= đđ· (đ·1, đ·2)
đđ (call đ , call đ ) def= 0
đđ (if prob(đ1) then đ11 else đ12 fi, if prob(đ2) then đ21 else đ22 fi) def= |đ1 â đ2 | + đđ (đ11, đ21) + đđ (đ12, đ22)
đđ (if đż1 then đ11 else đ12 fi, if đż2 then đ21 else đ22 fi) def= đđż (đż1, đż2) + đđ (đ11, đ21) + đđ (đ12, đ22)
đđ (while đż1 do đ1 od,while đż2 do đ2 od) def= đđż (đż1, đż2) + đđ (đ1, đ2)
đđ (đ11; đ12, đ21; đ22) def= đđ (đ11, đ21) + đđ (đ12, đ22)
đđ (đ1, đ2) def= â otherwise
đđŸ (Kstop,Kstop) def= 0
đđŸ (Kloop đż1 đ1 đŸ1,Kloop đż2 đ2 đŸ2) def= đđż (đż1, đż2) + đđ (đ1, đ2) + đđŸ (đŸ1, đŸ2)
đđŸ (Kseq đ1 đŸ1,Kseq đ2 đŸ2) def= đđ (đ1, đ2) + đđŸ (đŸ1, đŸ2)
đđŸ (đŸ1, đŸ2) def= â otherwise
Figure 12.Metrics for expressions, conditions, distributions, statements, and continuations.
Proof. Lem. B.2 tells us that âŠâ can be seen as a function ËâŠâdefined as follows:
ËâŠâ(đ,đŽ) def= ` (đŽ) where đ âŠâ `.
It is clear that _đŽ. ËâŠâ(đ,đŽ) is a probability measure. On theother hand, to show that _đ. ËâŠâ(đ,đŽ) is measurable for anyđŽ â O, we need to prove that for đ” â B(â), it holds thatđȘ(đŽ, đ”) def
= (_đ. ËâŠâ(đ,đŽ))â1 (đ”) â O.We introduce skeletons of programs to separate real num-
bers and discrete structures.đ F skip | tick(âĄâ ) | đ„ B đž | đ„ âŒ ïżœÌïżœ | call đ| if prob(âĄâ ) then đ1 else đ2 fi | if ïżœÌïżœ then đ1 else đ2 fi
| while ïżœÌïżœ do đ od | đ1; đ2ïżœÌïżœ F true | not ïżœÌïżœ | ïżœÌïżœ1 and ïżœÌïżœ2 | đž1 †đž2
đž F đ„ | âĄâ | đž1 + đž2 | đž1 Ă đž2
ïżœÌïżœ F uniform(âĄâđ ,âĄâđ )ïżœÌïżœ F Kstop | Kloop ïżœÌïżœ đ ïżœÌïżœ | Kseq đ ïżœÌïżœThe holes âĄâ are placeholders for real numbers parameterizedby locations â â LOC. We assume that the holes in a programstructure are always pairwise distinct. Let [ : LOC â â
be a map from holes to real numbers and [ (đ) (resp., [ (ïżœÌïżœ),[ (đž), [ (ïżœÌïżœ), [ (ïżœÌïżœ)) be the instantiation of a statement (resp.,condition, expression, distribution, continuation) skeleton bysubstituting [ (â) for âĄâ . One important property of skeletonsis that the âdistanceâ between any concretizations of twodifferent skeletons is always infinity with respect to themetrics in Fig. 12.
Observe thatđȘ(đŽ, đ”) =
âÌđ,ïżœÌïżœ
đȘ(đŽ, đ”) â© {âšđŸ, [ (đ), [ (ïżœÌïżœ), đŒâ© | any đŸ, đŒ, [}
19
PLDI â21, June 20â25, 2021, Virtual, UK Di Wang, Jan Hoffmann, and Thomas Reps
đŸ âą đž â đ âthe expression đž evaluates to a real value đ under the valuation đŸâ(E-Var)đŸ (đ„) = đđŸ âą đ„ â đ
(E-Const)
đŸ âą đ â đ
(E-Add)đŸ âą đž1 â đ1 đŸ âą đž2 â đ2 đ = đ1 + đ2
đŸ âą đž1 + đž2 â đ
(E-Mul)đŸ âą đž1 â đ1 đŸ âą đž2 â đ2 đ = đ1 · đ2
đŸ âą đž1 Ă đž2 â đđŸ âą đż â đ âthe condition đż evaluates to a Boolean value đ under the valuation đŸâ
(E-Top)
đŸ âą true â â€
(E-Neg)đŸ âą đż â đ
đŸ âą not đż â ÂŹđ
(E-Conj)đŸ âą đż1 â đ1 đŸ âą đż2 â đ2đŸ âą đż1 and đż2 â đ1 ⧠đ2
(E-Le)đŸ âą đž1 â đ1 đŸ âą đž2 â đ2đŸ âą đž1 †đž2 â [đ1 †đ2]
âšđŸ, đ, đŸ, đŒâ© âŠâ ` âthe configuration âšđŸ, đ, đŸ, đŒâ© steps to a probability distribution ` on âšđŸ âČ, đ âČ, đŸ âČ, đŒ âČâ©âsâ
(E-Skip-Stop)
âšđŸ, skip,Kstop, đŒâ© âŠâ đż (âšđŸ, skip,Kstop, đŒâ©)
(E-Skip-Loop)đŸ âą đż â đ
âšđŸ, skip,Kloop đ đż đŸ, đŒâ© âŠâ [đ] · đż (âšđŸ, đ,Kloop đ đż đŸ, đŒâ©) + [ÂŹđ] · đż (âšđŸ, skip, đŸ, đŒâ©)(E-Skip-Seq)
âšđŸ, skip,Kseq đ đŸ, đŒâ© âŠâ đż (âšđŸ, đ, đŸ, đŒâ©)(E-Tick)
âšđŸ, tick(đ), đŸ, đŒâ© âŠâ đż (âšđŸ, skip, đŸ, đŒ + đâ©)(E-Assign)
đŸ âą đž â đâšđŸ, đ„ B đž, đŸ, đŒâ© âŠâ đż (âšđŸ [đ„ âŠâ đ ], skip, đŸ, đŒâ©)
(E-Sample)
âšđŸ, đ„ ⌠đ·,đŸ, đŒâ© âŠâ `đ· â«= _đ .đż (âšđŸ [đ„ âŠâ đ ], skip, đŸ, đŒâ©)(E-Call)
âšđŸ, call đ , đŸ, đŒâ© âŠâ đż (âšđŸ,đ(đ ), đŸ, đŒâ©)(E-Prob)
âšđŸ, if prob(đ) then đ1 else đ2 fi, đŸ, đŒâ© âŠâ đ · đż (âšđŸ, đ1, đŸ, đŒâ©) + (1 â đ) · đż (âšđŸ, đ2, đŸ, đŒâ©)(E-Cond)
đŸ âą đż â đâšđŸ, if đż then đ1 else đ2 fi, đŸ, đŒâ© âŠâ [đ] · đż (âšđŸ, đ1, đŸ, đŒâ©) + [ÂŹđ] · đż (âšđŸ, đ2, đŸ, đŒâ©)
(E-Loop)
âšđŸ,while đż do đ od, đŸ, đŒâ© âŠâ đż (âšđŸ, skip,Kloop đż đ đŸ, đŒâ©)(E-Seq)
âšđŸ, đ1; đ2, đŸ, đŒâ© âŠâ đż (âšđŸ, đ1,Kseq đ2 đŸ, đŒâ©)
Figure 13. Rules of the operational semantics of Appl.
and that đ, ïżœÌïżœ are countable families of statement and contin-uation skeletons. Thus it suffices to prove that every set inthe union, which we denote by đȘ(đŽ, đ”) â© đ(đ, ïżœÌïżœ) later inthe proof, is measurable. Note that đ(đ, ïżœÌïżœ) itself is indeedmeasurable. Further, the skeletons đ and ïżœÌïżœ are able to deter-mine the evaluation rule for all concretized configurations.Thus we can proceed by a case analysis on the evaluationrules.To aid the case analysis, we define a deterministic eval-
uation relation detâŠâââ by getting rid of the đż (·) notations inthe rules in Fig. 13 except probabilistic ones (E-Sample) and(E-Prob). Obviously, detâŠâââ can be interpreted as a measurablefunction on configurations.âą If the evaluation rule is deterministic, then we have
đȘ(đŽ, đ”) â©đ(đ, ïżœÌïżœ)= {đ | đ âŠâ `, ` (đŽ) â đ”} â©đ(đ, ïżœÌïżœ)= {đ | đ detâŠâââ đ âČ, [đ âČ â đŽ] â đ”} â©đ(đ, ïżœÌïżœ)
=
đ(đ, ïżœÌïżœ) if {0, 1} â đ”
detâŠââââ1(đŽ) â©đ(đ, ïżœÌïżœ) if 1 â đ” and 0 â đ”
detâŠââââ1(đŽđ ) â©đ(đ, ïżœÌïżœ) if 0 â đ” and 1 â đ”
â if {0, 1} â© đ” = â .The sets in all the cases are measurable, so is the setđȘ(đŽ, đ”) â©đ(đ, ïżœÌïżœ).
âą (E-Prob): Consider đ” with the form (ââ, đĄ] with đĄ â â. IfđĄ â„ 1, then đȘ(đŽ, đ”) = ÎŁ. Otherwise, let us assume đĄ < 1.Let đ = if prob(âĄ) then đ1 else đ2 fi. Then we haveđȘ(đŽ, đ”) â©đ(đ, ïżœÌïżœ)
= {đ | đ âŠâ `, ` (đŽ) â đ”} â©đ(đ, ïżœÌïżœ)= {đ | đ âŠâ đ · đż (đ1) + (1 â đ) · đż (đ2),
đ · [đ1 â đŽ] + (1 â đ) · [đ2 â đŽ] â đ”} â©đ(đ, ïżœÌïżœ)= đ(đ, ïżœÌïżœ) â© {âšđŸ, if prob(đ) then đ1 else đ2 fi, đŸ, đŒâ© |
đ · [âšđŸ, đ1, đŸ, đŒâ© â đŽ] + (1 â đ) · [âšđŸ, đ2, đŸ, đŒâ© â đŽ] †đĄ}= đ(đ, ïżœÌïżœ) â©{âšđŸ, if prob(đ) then đ1 else đ2 fi, đŸ, đŒâ© |20
Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI â21, June 20â25, 2021, Virtual, UK
(âšđŸ, đ1, đŸ, đŒâ© â đŽ, âšđŸ, đ2, đŸ, đŒâ© â đŽ, đ †đĄ) âš(âšđŸ, đ2, đŸ, đŒâ© â đŽ, âšđŸ, đ1, đŸ, đŒâ© â đŽ, 1 â đ †đĄ)}.
The set above is measurable because đŽ and đŽđ are mea-surable, as well as {đ †đĄ} and {đ â„ 1â đĄ} are measurablein â.
âą (E-Sample): Consider đ” with the form (ââ, đĄ] with đĄ â â.Similar to the previous case, we assume that đĄ < 1. Letđ = đ„ ⌠uniform(âĄâđ ,âĄâđ ), without loss of generality.Then we haveđȘ(đŽ, đ”) â©đ(đ, ïżœÌïżœ)
= {đ | đ âŠâ `, ` (đŽ) â đ”} â©đ(đ, ïżœÌïżœ)
= {đ | đ âŠâ `đ· â«= ^đ ,
â«^đ (đ ) (đŽ)`đ· (đđ ) †đĄ} â©đ(đ, ïżœÌïżœ)
= đ(đ, ïżœÌïżœ) â© {đ | đ âŠâ `đ· â«= ^đ ,
`đ· ({đ | âšđŸ [đ„ âŠâ đ ], skip, đŸ, đŒâ© â đŽ}) †đĄ}= đ(đ, ïżœÌïżœ) â© {âšđŸ, đ„ ⌠uniform(đ, đ), đŸ, đŒâ© | đ < đ,
`uniform(đ,đ) ({đ | âšđŸ [đ„ âŠâ đ ], skip, đŸ, đŒâ© â đŽ}) †đĄ},where ^ âšđŸ,đ,đŸ,đŒ â©
def= _đ .đż (âšđŸ [đ„ âŠâ đ ], skip, đŸ, đŒâ©). For fixed
đŸ, đŸ, đŒ , the set {đ | âšđŸ [đ„ âŠâ đ ], skip, đŸ, đŒâ© â đŽ} is measur-able in â. For the distributions considered in this paper,there is a sub-probability kernel ^đ· : âar(đ·) â â. Forexample, ^uniform (đ, đ) is defined to be `uniform(đ,đ) if đ <
đ, or 0 otherwise. Therefore, _(đ, đ) .^uniform (đ, đ) ({đ |âšđŸ [đ„ âŠâ đ ], skip, đŸ, đŒâ© â đŽ}) is measurable, and its in-version on (ââ, đĄ] is a measurable set on distributionparameters (đ, đ). Hence the set above is measurable.
âĄ
C Trace-Based Cost SemanticsTo reason about moments of the accumulated cost, wefollow the Markov-chain-based reasoning [24, 33] to de-velop a trace-based cost semantics for Appl. Let (Ω,F) def
=ââđ=0 (ÎŁ,O) be a measurable space of infinite traces on
program configurations. Let {Fđ}đââ€+ be a filtration, i.e.,an increasing sequence F0 â F1 â · · · â F of sub-đ-algebras in F , generated by coordinate maps đđ (đ) def
= đđ
for đ â â€+. Let âšđ, đmainâ© be an Appl program. Let `0def=
đż (âš__.0, đmain,Kstop, 0â©) be the initial distribution. Let âbe the probability measure on infinite traces induced byProp. A.1 and Thm. B.3. Then (Ω,F ,â) forms a probabilityspace on infinite traces of the program.
D Trace-Based Reasoning on ExpectationsRecall that we define a stopping time đ : Ω â â€+ âȘ {â} as
đ (đ) def= inf{đ â â€+ | đđ = âš_, skip,Kstop, _â©},
and random variables {đŽđ}đââ€+ , {Ίđ}đââ€+ , {đđ}đââ€+ asđŽđ (đ) def
= đŒđ,Ίđ (đ) def= đ (đđ), đđ (đ) def
= đŽđ (đ) + Ίđ (đ),where đđ = âš_, _, _, đŒđâ©, and đ : ÎŁ â â is an expected-potential function for expected-cost bound analysis. Taking
the stopping time into consideration, we define the stoppedversion for these random variables as đŽđ (đ) def
= đŽđ (đ) (đ),Ίđ (đ) def
= Ίđ (đ) (đ), đđ (đ) def= đđ (đ) (đ). Note thatđŽđ ,Ίđ , đđ
are not guaranteed to be well-defined everywhere.
Lemma D.1. If â[đ < â] = 1, i.e., the program termi-nates almost surely, then đŽđ is well-defined almost surely andâ[limđââđŽđ = đŽđ ] = 1. Further, if {đŽđ}đââ€+ is pointwisenon-decreasing, then limđââ đŒ[đŽđ] = đŒ[đŽđ ].Proof. By the property of the operational semantics, for đ âΩ such that đ (đ) < â, we have đŽđ (đ) = đŽđ (đ) for allđ â„ đ (đ). Then we have
â[ limđââđŽđ = đŽđ ]
= â({đ | limđââđŽđ (đ) = đŽđ (đ)})
â„ â({đ | limđââđŽđ (đ) = đŽđ (đ) â§đ (đ) < â})
= â({đ | đŽđ (đ) (đ) = đŽđ (đ) â§đ (đ) < â})= â({đ | đ (đ) < â})= 1.
Now let us assume that {đŽđ}đââ€+ is pointwise non-decreasing. By the property of the operational semantics, weknow that đŽ0 = 0. Therefore, đŽđâs are nonnegative randomvariables, and their expectations đŒ[đŽđ]âs are well-defined.By Prop. A.3, we have limđââ đŒ[đŽđ] = đŒ[limđââđŽđ]. Wethen conclude by the fact that limđââđŽđ = đŽđ , a.s., whichwe just proved. âĄ
We reformulate the martingale property using the filtra-tion {Fđ}đââ€+ .
Lemma D.2. For all đ â â€+, it holds thatđŒ[đđ+1 | Fđ] = đđ, a.s.,
i.e., the expectation of đđ conditioned on the execution historyis an invariant for đ â â€+.
Proof. We say that a sequence of random variables {đđ}đââ€+
is adapted to a filtration {Fđ}đââ€+ if for each đ â â€+, đđ isFđ-measurable. Then {Ίđ}đââ€+ and {đŽđ}đââ€+ are adaptedto the coordinate-generated filtration {Fđ}đââ€+ as Ίđ (đ)and đŽđ (đ) depend on đđ . Then we have
đŒ[đđ+1 | Fđ] (đ)= đŒ[đŽđ+1 + Ίđ+1 | Fđ] (đ)= đŒ[(đŽđ+1 âđŽđ) + Ίđ+1 +đŽđ | Fđ] (đ)= đŒ[(đŽđ+1 âđŽđ) + Ίđ+1 | Fđ] (đ) +đŽđ (đ)= đŒ[(đŒđ+1 â đŒđ) + đ (đđ+1) | Fđ] +đŽđ (đ)= đŒđâČâŒâŠâ(đđ) [(đŒ âČ â đŒđ) + đ (đ âČ)] +đŽđ (đ)= đ (đđ) +đŽđ (đ)= Ίđ (đ) +đŽđ (đ)= đđ (đ), a.s.
21
PLDI â21, June 20â25, 2021, Virtual, UK Di Wang, Jan Hoffmann, and Thomas Reps
Furthermore, we have the following corollary:đŒ[đđ+1] = đŒ[đŒ(đđ+1 | Fđ)] = đŒ[đđ],
for each đ â â€+, thus đŒ[đđ] = đŒ[đ0] for all đ â â€+. âĄ
Now we can prove soundness of the extended OST.
Theorem D.3. If đŒ[|đđ |] < â for all đ â â€+, then đŒ[đđ ]exists and đŒ[đđ ] = đŒ[đ0] in the following situation:There exist â â â and đ¶ â„ 0 such that đŒ[đ â ] < â and for
all đ â â€+, |đđ | †đ¶ · (đ + 1)â almost surely.
Proof. By đŒ[đ â ] < â where â â„ 1, we know that â[đ <
â] = 1. Then similar to the proof of Lem. D.1, we know thatđđ is well-defined almost surely and â[limđââ đđ = đđ ] = 1.On the other hand, we have|đđ | = |đmin(đ,đ) | †đ¶ · (min(đ, đ) + 1)⠆𶠷 (đ + 1)â , a.s.Recall thatđŒ[đ â ] < â. ThenđŒ[(đ+1)â ] = đŒ[đ â+đ (đ ââ1)] <â. By Prop. A.4, with the function đ set to _đ.𶠷 (đ (đ) +1)â ,we know that limđââ đŒ[đđ] = đŒ[đđ ]. By Lem. D.2, we haveđŒ[đđ] = đŒ[đ0] for all đ â â€+ thus we conclude that đŒ[đ0] =đŒ[đđ ]. âĄ
E Trace-Based Reasoning on MomentsWe start with the fundamental composition property formoment semirings.
Lemma (Lem. 3.2). For all đą, đŁ â R, it holds thatâš1, (đą + đŁ), (đą + đŁ)2, · · · , (đą + đŁ)đâ©
= âš1, đą,đą2, · · · , đąđ) â (1, đŁ, đŁ2, · · · , đŁđâ©,where đąđ is an abbreviation for
âđđ=1 đą for đ â â€+, đą â R.
Proof. Observe that
đ đ»đđ =
đâïžđ=0
(đ
đ
)Ă (đąđ · đŁđâđ ).
We prove by induction on đ that (đą + đŁ)đ = đ đ»đđ .âą đ = 0: Then (đą + đŁ)0 = 1. On the other hand, we have
đ đ»đ0 =
(00
)Ă (đą0 · đŁ0) = 1 Ă (1 · 1) = 1.
âą Suppose that (đą + đŁ)đ = đ đ»đđ . Then(đą + đŁ)đ+1
= (đą + đŁ) · (đą + đŁ)đ
= (đą + đŁ) ·đâïžđ=0
(đ
đ
)Ă (đąđ · đŁđâđ )
=
đâïžđ=0
(đ
đ
)Ă (đąđ+1 · đŁđâđ ) +
đâïžđ=0
(đ
đ
)Ă (đąđ · đŁđâđ+1)
=
đ+1âïžđ=1
(đ
đ â 1
)Ă (đąđ · đŁđâđ+1) +
đâïžđ=0
(đ
đ
)Ă (đąđ · đŁđâđ+1)
=
đ+1âïžđ=0
((đ
đ â 1
)+(đ
đ
)) Ă (đąđ · đŁđâđ+1)
=
đ+1âïžđ=0
(đ + 1đ
)Ă (đąđ · đŁđâđ+1)
= đ đ»đđ+1.âĄ
We also show that â and â are monotone if the operationsof the underlying semiring are monotone.
Lemma E.1. LetR = ( |R|, â€, +, ·, 0, 1) be a partially orderedsemiring. If + and · are monotone with respect to â€, then âand â in the moment semiring M(đ)
R are also monotone withrespect to â.Proof. It is straightforward to show â is monotone. Forthe rest of the proof, without loss of generality, we showthat
âââââââââââšđąđâ©0â€đâ€đ â ââââââââââšđŁđâ©0â€đâ€đ â âââââââââââšđąđâ©0â€đâ€đ â âââââââââââšđ€đâ©0â€đâ€đ ifââââââââââšđŁđâ©0â€đâ€đ â âââââââââââšđ€đâ©0â€đâ€đ . By the definition of â, we knowthat đŁđ †đ€đ for all đ = 0, 1, · · · ,đ. Then for each đ , we have
(âââââââââââšđąđâ©0â€đâ€đ â ââââââââââšđŁđâ©0â€đâ€đ)đ
=
đâïžđ=0
(đ
đ
)Ă (đąđ · đŁđâđ )
â€đâïžđ=0
(đ
đ
)Ă (đąđ ·đ€đâđ )
= (âââââââââââšđąđâ©0â€đâ€đ â âââââââââââšđ€đâ©0â€đâ€đ)đ .Then we conclude by the definition of â. âĄ
As we allow potential functions to be interval-valued, weshow that the interval semiring I satisfies the monotonicityrequired in Lem. E.1.
Lemma E.2. The operations +I and ·I are monotone withrespect to â€I .
Proof. It is straightforward to show +I is monotone. For therest of the proof, it suffices to show that [đ, đ] ·I [đ, đ] â€I[đâČ, đ âČ] ·I [đ, đ] if [đ, đ] â€I [đâČ, đ âČ], i.e., [đ, đ] â [đâČ, đ âČ] orđ â„ đâČ, đ †đ âČ.
We claim that min đđ,đ,đ,đ â„ min đđâČ,đâČ,đ,đ , i.e.,min{đđ, đđ, đđ, đđ} â„ min{đâČđ, đâČđ,đ âČđ, đ âČđ}.âą If 0 †đ †đ : Then đđ †đđ , đđ †đđ , đâČđ †đ âČđ , đâČđ †đ âČđ .It then suffices to show that min{đđ, đđ} â„ min{đâČđ, đâČđ}.Because đ â„ đ â„ 0 and đ â„ đâČ, we conclude that đđ â„ đâČđand đđ â„ đâČđ .
âą If đ < 0 †đ : Then đđ â„ đđ , đđ †đđ , đâČđ â„ đ âČđ , đâČđ â„ đ âČđ .It then suffices to show that min{đđ, đđ} â„ min{đ âČđ, đâČđ}.Because đ â„ 0 > đ and đ â„ đâČ, đ †đ âČ, we conclude thatđđ â„ đ âČđ and đđ †đâČđ .
âą If đ †đ < 0: Then đđ â„ đđ , đđ â„ đđ , đâČđ â„ đ âČđ , đâČđ â„ đ âČđ .It then suffices to show that min{đđ, đđ} â„ min{đ âČđ, đ âČđ}.Because 0 > đ â„ đ and đ †đ âČ, we conclude that đđ â„ đ âČđand đđ â„ đ âČđ .In a similar way, we can also prove that max đđ,đ,đ,đ â€
max đđâČ,đâČ,đ,đ . Therefore, we show that ·I is monotone. âĄ22
Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI â21, June 20â25, 2021, Virtual, UK
Lemma E.3. If {[đđ, đđ]}đââ€+ is a montone sequence in I ,i.e., [đ0, đ0] â€I [đ1, đ1] â€I · · · â€I [đđ, đđ] â€I · · · , and[đđ, đđ] â€I [đ, đ] for all đ â â€+. Let [đ, đ] = limđââ [đđ, đđ](the limit is well-defined by the monotone convergence theoremfor series). Then [đ, đ] â€I [đ, đ].Proof. By the definition of â€I , we know that {đđ}đââ€+ is non-increasing and {đđ}đââ€+ is non-decreasing. Because đđ â„ đfor all đ â â€+, we conclude that limđââ đđ â„ đ . Becauseđđ †đ for all đ â â€+, we conclude that limđââ đđ †đ . Thuswe conclude that [đ, đ] â€I [đ, đ]. âĄ
Recall that we extend the notions of đŽđ,Ίđ, đđ with inter-vals for higher moments as follows:
đŽđ (đ) def=ââââââââââââââââš[đŒđđ , đŒđđ ]â©0â€đâ€đ where đđ = âš_, _, _, đŒđâ©,
Ίđ (đ) def= đ (đđ),
đđ (đ) def= đŽđ (đ) â Ίđ (đ).
Note that in the definition of đđ , we use â to compose thepowers of the accumulated cost at step đ and the potentialfunction that stands for the moments of the accumulatedcost of the rest of the computation.We now extend some of the previous results on first mo-
ments to higher moments with intervals.
Lemma E.4. Let đ,đ : Ω â M(đ)I be integrable. Then
đŒ[đ â đ ] = đŒ[đ ] â đŒ[đ ].Proof. Appeal to linearity of expectations and the fact thatâ is the pointwise extension of +I , as well as +I is thepointwise extension of numeric addition. âĄ
Lemma E.5. If đ : Ω â M(đ)I is G-measurable and
bounded, a.s., as well as đ (đ) = âââââââââââââââââââââââš[đđ (đ), đđ (đ)]â©0â€đâ€đ forall đ â Ω, then đŒ[đ â đ | G] = đ â đŒ[đ | G] almost surely.
Proof. Fix đ â Ω. Let đ (đ) = âââââââââââââââââââââââš[đđ (đ), đđ (đ)]â©0â€đâ€đ . Thenwe have
đŒ[đ â đ | G] (đ)= đŒ[ââââââââââââââââš[đđ , đđ ]â©0â€đâ€đ â âââââââââââââââš[đđ , đđ ]â©0â€đâ€đ | G] (đ)
= đŒ[ââââââââââââââââââââââââââââââââââââââââââââšâI
đđ=0
(đđ
) ĂI ( [đđ , đđ ] ·I [đđâđ , đđâđ ])â©0â€đâ€đ | G] (đ)
=ââââââââââââââââââââââââââââââââââââââââââââââââââââââšđŒ[âI
đđ=0
(đđ
) ĂI ( [đđ , đđ ] ·I [đđâđ , đđâđ ]) | G] (đ)â©0â€đâ€đ=ââââââââââââââââââââââââââââââââââââââââââââââââââââšâI
đđ=0
(đđ
) ĂI đŒ[[đđ , đđ ] ·I [đđâđ , đđâđ ] | G] (đ)â©0â€đâ€đ .On the other hand, we haveđ (đ) â đŒ[đ | G] (đ)
= âšđ0 (đ), · · · , đđ (đ)â© â đŒ[âšđ0, · · · , đđâ© | G] (đ)= âšđ0 (đ), · · · , đđ (đ)â© â âšđŒ[đ0 | G] (đ), · · · ,đŒ[đđ | G] (đ)â©
=âââââââââââââââââââââââââââââââââââââââââââââââšâI
đđ=0
(đđ
) ĂI (đđ (đ) ·I đŒ[đđâđ | G] (đ))â©0â€đâ€đ
=ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââšâI
đđ=0
(đđ
)ĂI ( [đđ (đ), đđ (đ)] ·IđŒ[[đđâđ , đđâđ ] | G] (đ))â©0â€đâ€đ .Thus, it suffices to show that for each đ , đŒ[[đđ , đđ ] ·I[đđâđ , đđâđ ] | G] = [đđ , đđ ] ·I đŒ[[đđâđ , đđâđ ] | G] almost surely.For đ such that đđ (đ) â„ 0:đŒ[[đđ , đđ ] ·I [đđâđ , đđâđ ] | G] (đ)
= đŒ[[đđđđâđ , đđđđâđ ] | G] (đ)= [đŒ[đđđđâđ | G] (đ),đŒ[đđđđâđ | G] (đ)]= [đđ (đ) · đŒ[đđâđ | G] (đ), đđ (đ) · đŒ[đđâđ | G] (đ)], đ.đ .,= [đđ (đ), đđ (đ)] ·I đŒ[[đđâđ , đđâđ ] | G] (đ).
For đ such that đđ (đ) < 0:đŒ[[đđ , đđ ] ·I [đđâđ , đđâđ ] | G] (đ)
= đŒ[[đđđđâđ , đđđđâđ ] | G] (đ)= [đŒ[đđđđâđ | G] (đ),đŒ[đđđđâđ | G] (đ)]= [đđ (đ) · đŒ[đđâđ | G] (đ), đđ (đ) · đŒ[đđâđ | G] (đ)], đ.đ .,= [đđ (đ), đđ (đ)] ·I đŒ[[đđâđ , đđâđ ] | G] (đ).
âĄ
Lemma (Lem. 4.3). For all đ â â€+, it holds thatđŒ[đđ+1 | Fđ] â đđ, a.s.
Proof. Similar to the proof of Lem. D.2, we know that{đŽđ}đââ€+ and {Ίđ}đââ€+ are adapted to {Fđ}đââ€+ . By theproperty of the operational semantics, we know thatđŒđ (đ) †đ¶ · đ almost surely for some đ¶ â„ 0. Then usingLem. E.5, we haveđŒ[đđ+1 | Fđ] (đ)
= đŒ[đŽđ+1 â Ίđ+1 | Fđ] (đ)
= đŒ[đŽđâââââââââââââââââââââââââââââââââââš[(đŒđ+1âđŒđ)đ , (đŒđ+1âđŒđ)đ ]â©0â€đâ€đâΊđ+1 | Fđ] (đ)
= đŽđ (đ)âđŒ[âââââââââââââââââââââââââââââââââš[(đŒđ+1âđŒđ)đ , (đŒđ+1âđŒđ)đ ]â©0â€đâ€đâΊđ+1 |Fđ] (đ),
đ.đ .,
= đŽđ (đ)âđŒ[âââââââââââââââââââââââââââââââââš[(đŒđ+1âđŒđ)đ , (đŒđ+1âđŒđ)đ ]â©0â€đâ€đâđ (đđ+1) |Fđ] .
Recall the property of the expected-potential function đ inDefn. 4.2. Then by Lem. E.1 with Lem. E.2, we have
đŒ[đđ+1 | Fđ] (đ)â đŽđ (đ) â đ (đđ), đ.đ .,= đŽđ (đ) â Ίđ (đ)= đđ .
As a corollary, we have đŒ[đđ] â đŒ[đ0] for all đ â â€+. âĄ
Now we prove the following extension of OSTto deal with interval-valued potential functions. Letâ„ââââââââââââââââš[đđ , đđ ]â©0â€đâ€đ â„â def
= max0â€đâ€đ{max{|đđ |, |đđ |}}.Theorem E.6. If đŒ[â„đđ â„â] < â for all đ â â€+, then đŒ[đđ ]exists and đŒ[đđ ] â đŒ[đ0] in the following situation:There exist â â â and đ¶ â„ 0 such that đŒ[đ â ] < â and for
all đ â â€+, â„đđ â„⠆𶠷 (đ + 1)â almost surely.
23
PLDI â21, June 20â25, 2021, Virtual, UK Di Wang, Jan Hoffmann, and Thomas Reps
Proof. By đŒ[đ â ] < â where â â„ 1, we know that â[đ <
â] = 1. Then similar to the proof of Lem. D.1, we know thatđđ is well-defined almost surely and â[limđââ đđ = đđ ] = 1.On the other hand, đđ (đ) can be treated as a vector of realnumbers. Let đđ : Ω â â be a real-valued componentin đđ . Because đŒ[â„đđ â„â] < â and â„đđ â„⠆𶠷 (đ + 1)âalmost surely, we know that đŒ[|đđ |] †đŒ[â„đđ â„â] < â and|đđ | †â„đđ â„⠆𶠷 (đ + 1)â almost surely. Therefore,|đđ | = |đmin(đ,đ) | †đ¶ · (min(đ, đ) + 1)⠆𶠷 (đ + 1)â , a.s.Recall that đŒ[đ â ] < â. Then đŒ[(đ + 1)â ] = đŒ[đ â +đ (đ ââ1)] < â. By Prop. A.4, with the function đ set to_đ.𶠷 (đ (đ) + 1)â , we know that limđââ đŒ[đđ] = đŒ[đđ ].Because đđ is an arbitrary real-valued component in đđ , weknow that limđââ đŒ[đđ] = đŒ[đđ ]. By Lem. 4.3, we knowthat đŒ[đđ] â đŒ[đ0] for all đ â â€+. By Lem. E.3, we concludethat limđââ đŒ[đđ] â đŒ[đ0], i.e., đŒ[đđ ] â đŒ[đ0]. âĄ
Discussion. The classic OST from probability theory hasbeen used to reason about probabilistic programs:Proposition E.7. Let {đđ}đââ€+ be a super-martingale (resp.,sub-martingale). If đŒ[|đđ |] < â for all đ â â€+, then đŒ[đđ ]exists and đŒ[đđ ] †đŒ[đ0] (resp., đŒ[đđ ] â„ đŒ[đ0]) in each ofthe following situations:(a) đ is almost-surely bounded;(b) đŒ[đ ] <â and for some đ¶ â„ 0, đŒ[|đđ+1 â đđ | | đđ] †đ¶
almost surely, for all đ â â€+;(c) for some đ¶ â„ 0, |đđ | †đ¶ almost surely, for all đ â â€+.
Note that from item (a) to item (c) in Prop. E.7, the con-straint on the stopping time đ is getting weaker while theconstraint on {đđ}đââ€+ is getting stronger. However, the clas-sic OST is not suitable for higher-moment analysis: item (a)cannot handle almost-surely terminating programs; item (b)relies on bounded difference, which is too restrictive, becausethe change in second moments can be non-constant even ifthe change in first moments can be bounded by a universalconstant; and item (c) basically requires the accumulatedcost can be bounded by a universal constant, which is alsonot practical.Counterexample E.8. Recall the random-walk program inFig. 3. Below, we present the pre- and post-condition of therandom-sampling statement:
1 { âš1, 2(đâđ„)+4, 4(đâđ„)2+22(đâđ„)+28â© }2 đĄ ⌠uniform(â1, 2);3 { âš1, 2(đâđ„âđĄ)+5, 4(đâđ„âđĄ)2+26(đâđ„âđĄ)+37â© }
Suppose that we want to apply the classic OST to ensuringthe second-moment component is sound. Let us use ΊâČ
đ andđ âČđ to denote the over-approximations of the second momentin Ίđ and đđ , respectively. It has been shown that for sucha random walk, the stopping time đ is not bounded butđŒ[đ ] < â [26, 31]. Thus, the applicable OST criterion couldbe Prop. E.7(b), which requires someđ¶ > 0 such that |đ âČ
đ+1 âđ âČđ | †đ¶ for all đ â â€+.
Consider the case where the đ-th evaluation step of a traceđ executes the random-sampling statement. Let đŒđ
def= đŽđ (đ).
By the definition that đđ (đ) def=âââââââââââââââ([đŒđđ , đŒđđ ])0â€đâ€đ âΊđ (đ), we
have2
đ âČđ (đ)
= đŒ2đ+2 · đŒđ · (2(đâđ„)+4)+(4(đâđ„)2+22(đâđ„)+28),đ âČđ+1 (đ)
= đŒ2đ+2·đŒđ · (2(đâđ„âđĄ)+5)+(4(đâđ„âđĄ)2+26(đâđ„âđĄ)+37),thus
đ âČđ+1 (đ)âđ âČ
đ (đ)= 2·đŒđ · (â2đĄ+1)+(â4đĄ (2đâ2đ„âđĄ)+4(đâđ„)â26đĄ+9),
where đĄ â [â1, 2], đ is a constant, but đŒđ and đ„ are generallyunbounded, thus |đ âČ
đ+1âđ âČđ | cannot be bounded by a constant.
F Soundness of Bound InferenceFig. 14 presents the inference rules of the derivation system.Note that we prove the soundness on a more declarativesystemwhere the two rules for function calls are merged intoone. The validity of a context Î for function specificationsis then established by the validity of all specifications in Î,denoted by âą Î.In addition to rules of the judgments for statements and
function specifications, we also include rules for continu-ations and configurations that are used in the operationalsemantics. A continuation đŸ is valid with a pre-condition{Î;đ}, written Î âą {Î;đ} đŸ , if đđ describes a boundon the moments of the accumulated cost of the compu-tation represented by đŸ on the condition that the valua-tion before đŸ satisfies Î. Validity for configurations, writtenÎ âą {Î;đ} âšđŸ, đ, đŸ, đŒâ©, is established by validity of the state-ment đ and the continuation đŸ , as well as the requirementthat the valuation đŸ satisfies the pre-condition Î. Here đđalso describes an interval bound on the moments of the ac-cumulated cost of the computation that continues from theconfiguration âšđŸ, đ, đŸ, đŒâ©.The rule (Q-Weaken) and (QK-Weaken) are used to
strengthen the pre-condition and relax the post-condition.In terms of the bounds on moments of the accumulated cost,if the triple {·;đ} đ {·;đ âČ} is valid, then we can safely widenthe intervals in the pre-conditionđ and narrow the intervalsin the post-conditionđ âČ. To handle the judgment Î |= đ â đ âČ
in constraint generation, we adapt the idea of rewrite func-tions [9, 31]. Intuitively, to ensure that [đż1,đ1] âPI [đż2,đ2],i.e., đż1 †đż2 and đ2 †đ1, under the logical context Î, wegenerate constraints indicating that there exist two polyno-mials đ1,đ2 that are always nonnegative under Î, such thatđż1 = đż2 + đ1 and đ1 = đ2 â đ2. In our implementation, Îis a set of linear constraints over program variables of the2When we use a program variable đ„ in đđ (đ) , it represents đŸ (đ„) wheređđ = âšđŸ, _, _, _â©.
24
Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI â21, June 20â25, 2021, Virtual, UK
(Valid-Ctx)âđ â dom(Î) : â(Î;đ, Î;đ âČ) â Î(đ ) : Î âą {Î;đ} đ(đ ) {ÎâČ;đ âČ}
âą Î
(Q-Skip)
Î âą {Î;đ} skip {Î;đ}
(Q-Tick)
đ =ââââââââââââââââš[đđ , đđ ]â©0â€đâ€đ â đ âČ
Î âą {Î;đ} tick(đ) {Î;đ âČ}(Q-Assign)Î = [đž/đ„]ÎâČ đ = [đž/đ„]đ âČ
Î âą {Î;đ} đ„ B đž {ÎâČ;đ âČ}
(Q-Sample)Î = âđ„ â supp(`đ· ) : ÎâČ đ = đŒđ„âŒ`đ· [đ âČ]
Î âą {Î;đ} đ„ ⌠đ· {ÎâČ;đ âČ}
(Q-Loop)Î âą {Π⧠đż;đ} đ1 {Î;đ}
Î âą {Î;đ} while đż do đ1 od {Π⧠đż;đ}(Q-Cond)
Î âą {Π⧠đż;đ} đ1 {ÎâČ;đ âČ}Î âą {Π⧠đż;đ} đ2 {ÎâČ;đ âČ}
Î âą {Î;đ} if đż then đ1 else đ2 fi {ÎâČ;đ âČ}
(Q-Seq)Î âą {Î;đ} đ1 {ÎâČ;đ âČ}
Î âą {ÎâČ;đ âČ} đ2 {ÎâČâČ;đ âČâČ}Î âą {Î;đ} đ1; đ2 {ÎâČâČ;đ âČâČ}
(Q-Call)âđ : (Î;đđ , ÎâČ;đ âČ
đ ) â Î(đ )Î âą {Î;âđ đđ } call đ {ÎâČ;âđ đ
âČđ}
(Q-Prob)Î âą {Î;đ1} đ1 {ÎâČ;đ âČ} Î âą {Î;đ2} đ2 {ÎâČ;đ âČ}
đ = đ â đ đ = âš[đ, đ], [0, 0], · · · , [0, 0]â© â đ1 đ = âš[1 â đ, 1 â đ], [0, 0], · · · , [0, 0]â© â đ2
Î âą {Î;đ} if prob(đ) then đ1 else đ2 fi {ÎâČ;đ âČ}(Q-Weaken)Î âą {Î0;đ0} đ {ÎâČ0 ;đ âČ
0} Î |= Î0 ÎâČ0 |= ÎâČ Î |= đ â đ0 ÎâČ0 |= đ âČ0 â đ âČ
Î âą {Î;đ} đ {ÎâČ;đ âČ}(Valid-Cfg)đŸ |= Î Î âą {Î;đ} đ {ÎâČ;đ âČ} Î âą {ÎâČ;đ âČ} đŸ
Î âą {Î;đ} âšđŸ, đ, đŸ, đŒâ©(QK-Stop)
Î âą {Î;đ} Kstop
(QK-Loop)Î âą {Π⧠đż;đ} đ {Î;đ} Î âą {Π⧠đż;đ} đŸ
Î âą {Î;đ} Kloop đż đ đŸ(QK-Seq)Î âą {Î;đ} đ {ÎâČ;đ âČ} Î âą {ÎâČ;đ âČ} đŸ
Î âą {Î;đ} Kseq đ đŸ
(QK-Weaken)Î âą {ÎâČ;đ âČ} đŸ Î |= ÎâČ Î |= đ â đ âČ
Î âą {Î;đ} đŸ
Figure 14. Inference rules of the derivation system.
form E â„ 0 , then we can represent đ1,đ2 by conical combi-nations (i.e., linear combinations with nonnegative scalars)of expressions E in Î.To reduce the soundness proof to the extended OST
for interval-valued bounds, we construct an annotatedtransition kernel from validity judgements âą Î and Î âą{Î;đ} đmain {ÎâČ;đ âČ}.Lemma F.1. Suppose âą Î and Î âą {Î;đ} đmain {ÎâČ;đ âČ}.An annotated program configuration has the formâšÎ, đ,đŸ, đ, đŸ, đŒâ© such that Î âą {Î;đ} âšđŸ, đ, đŸ, đŒâ©. Thenthere exists a probability kernel ^ over annotated programconfigurations such that:
For all đ = âšÎ, đ,đŸ, đ, đŸ, đŒâ© â dom(^), it holds that(i) ^ is the same as the evaluation relation âŠâ if the annotations
are omitted, i.e.,^ (đ)â«=_âš_, _, đŸ âČ, đ âČ, đŸ âČ, đŒ âČâ©.đż (âšđŸ âČ, đ âČ, đŸ âČ, đŒ âČâ©) = âŠâ(âšđŸ, đ, đŸ, đŒâ©),
and(ii) đđ (đŸ) â đŒđâČâŒ^ (đ) [
ââââââââââââââââââââââââââââš[(đŒ âČâđŒ)đ , (đŒ âČâđŒ)đ ]â©0â€đâ€đâđđâČ (đŸ âČ)]
where đ âČ = âš_, đ âČ, đŸ âČ, _, _, đŒ âČâ©.Before proving the soundness, we show that the derivation
system for bound inference admits a relaxation rule.
Lemma F.2. Suppose that âą Î. If Î âą {Î;đ1} đ {ÎâČ;đ âČ1}
and Î âą {Î;đ2} đ {ÎâČ;đ âČ2}, then the judgment Î âą {Î;đ1 â
đ2} đ {ÎâČ;đ âČ1 â đ âČ
2} is derivable.Proof. By nested induction on the derivation of the judg-ment Î âą {Î;đ1} đ {ÎâČ;đ âČ
1}, followed by inversion onÎ âą {Î;đ2} đ {ÎâČ;đ âČ
2}. We assume the derivations havethe same shape and the same logical contexts; in practice, wecan ensure this by explicitly inserting weakening positions,e.g., all the branching points, and by doing a first pass toobtain logical contexts.
âą(Q-Skip)
Î âą {Î;đ1} skip {Î;đ1}By inversion, we have đ2 = đ
âČ2. By (Q-Skip), we immedi-
ately have Î âą {Î;đ1 â đ2} skip {Î;đ1 â đ2}.
âą
(Q-Tick)
đ1 =ââââââââââââââââš[đđ , đđ ]â©0â€đâ€đ â đ âČ
1Î âą {Î;đ1} tick(đ) {Î;đ âČ
1}By inversion, we have đ2 =
âââââââââââââââš[đđ , đđ ]â©0â€đâ€đ â đ âČ
2. By
distributivity, we haveâââââââââââââââš[đđ , đđ ]â©0â€đâ€đ â (đ âČ
1 â đ âČ2) =
(âââââââââââââââš[đđ , đđ ]â©0â€đâ€đ âđ âČ
1) â (âââââââââââââââš[đđ , đđ ]â©0â€đâ€đ âđ âČ
2) = đ1âđ2.Then we conclude by (Q-Tick).
25
PLDI â21, June 20â25, 2021, Virtual, UK Di Wang, Jan Hoffmann, and Thomas Reps
âą
(Q-Assign)Î = [đž/đ„]ÎâČ đ1 = [đž/đ„]đ âČ
1Î âą {Î;đ1} đ„ B đž {ÎâČ;đ âČ
1}By inversion, we have đ2 = [đž/đ„]đ âČ
2. Then we know that[đž/đ„] (đ âČ
1 â đ âČ2) = [đž/đ„]đ âČ
1 â [đž/đ„]đ âČ2 = đ1 â đ2. Then
we conclude by (Q-Assign).
âą
(Q-Sample)Î = âđ„ â supp(`đ· ) : ÎâČ đ1 = đŒđ„âŒ`đ· [đ âČ
1]Î âą {Î;đ1} đ„ ⌠đ· {ÎâČ;đ âČ
1}By inversion, we have đ2 = đŒđ„âŒ`đ· [đ âČ
2]. By Lem. E.4, weknow that đŒđ„âŒ`đ· [đ âČ
1 â đ âČ2] = đŒđ„âŒ`đ· [đ âČ
1] â đŒđ„âŒ`đ· [đ âČ2] =
đ1 â đ2. Then we conclude by (Q-Sample).
âą
(Q-Call)âđ : (Î;đ1đ , Î
âČ;đ âČ1đ ) â Î(đ )
Î âą {Î;âđ đ1đ } call đ {ÎâČ;âđ đâČ1đ }
By inversion, đ2 =â
đ đ2đ and đ âČ2 =
âđ đ
âČ2đ where
(Î;đ2đ , ÎâČ;đ âČ
2đ ) â Î(đ ) for each đ . Then by (Q-Call), wehave Î âą {Î;âđ đ1đ â
âđ đ2đ } call đ {ÎâČ;âđ đ
âČ1đ ââ
đ đâČ2đ }.
âą
(Q-Prob)Î âą {Î;đ11} đ1 {ÎâČ;đ âČ
1} Î âą {Î;đ12} đ2 {ÎâČ;đ âČ1}
đ1 = đ1 â đ 1 đ1 = âš[đ, đ], [0, 0], · · · , [0, 0]â© â đ11đ 1 = âš[1 â đ, 1 â đ], [0, 0], · · · , [0, 0]â© â đ12
Î âą {Î;đ1} if prob(đ) then đ1 else đ2 fi {ÎâČ;đ âČ1}
By inversion, we know thatđ2 = đ2âđ 2 for someđ21, đ22such thatÎ âą {Î;đ21} đ1 {ÎâČ;đ âČ
2},Î âą {Î;đ22} đ2 {ÎâČ;đ âČ2},
đ2 = âš[đ, đ], [0, 0], · · · , [0, 0]â© âđ21, and đ 2 = âš[1â đ, 1âđ], [0, 0], · · · , [0, 0]â© â đ22. By induction hypothesis, wehave Î âą {Î;đ11âđ21} đ1 {ÎâČ;đ âČ
1âđ âČ2} and Î âą {Î;đ12â
đ22} đ2 {ÎâČ;đ âČ1 â đ âČ
2}. Thenâš[đ, đ], · · · , [0, 0]â© â (đ11 â đ21)
= (âš[đ, đ], · · · , [0, 0]â© â đ11) â (âš[đ, đ], · · · , [0, 0]â© â đ21)= đ1 â đ2,âš[1 â đ, 1 â đ], · · · , [0, 0]â© â (đ12 â đ22)
= (âš[1âđ,1âđ],· · ·,[0, 0]â©âđ12)â (âš[1âđ,1âđ],· · ·,[0, 0]â©âđ22)= đ 1 â đ 2,(đ1 â đ2) â (đ 1 â đ 2)
= (đ1 â đ 1) â (đ2 â đ 2)= đ1 â đ2.
Thus we conclude by (Q-Prob).
âą
(Q-Cond)Î âą {Π⧠đż;đ1} đ1 {ÎâČ;đ âČ
1} Î âą {Π⧠đż;đ1} đ2 {ÎâČ;đ âČ1}
Î âą {Î;đ1} if đż then đ1 else đ2 fi {ÎâČ;đ âČ1}
By inversion, we have Î âą {Π⧠đż;đ2} đ1 {ÎâČ;đ âČ2}, and
Î âą {Π⧠đż;đ2} đ2 {ÎâČ;đ âČ2}. By induction hypothesis,
we have Î âą {Π⧠đż;đ1 â đ2} đ1 {ÎâČ;đ âČ1 â đ âČ
2} and Î âą{Π⧠đż;đ1 âđ2} đ2 {ÎâČ;đ âČ
1 âđ âČ2}. Then we conclude by
(Q-Cond).
âą
(Q-Loop)Î âą {Π⧠đż;đ1} đ {Î;đ1}
Î âą {Î;đ1} while đż do đ od {Π⧠đż;đ1}
By inversion, we have Î âą {Îâ§đż;đ2} đ {Î;đ2}. By induc-tion hypothesis, we haveÎ âą {Îâ§đż;đ1âđ2} đ {Î;đ1âđ2}.Then we conclude by (Q-Loop).
âą
(Q-Seq)Î âą {Î;đ1} đ1 {ÎâČâČ;đ âČâČ
1 } Î âą {ÎâČâČ;đ âČâČ1 } đ2 {ÎâČ;đ âČ
1}Î âą {Î;đ1} đ1; đ2 {ÎâČ;đ âČ
1}By inversion, there exists đ âČâČ
2 such that Î âą{Î;đ2} đ1 {ÎâČâČ;đ âČâČ
2 }, and Î âą {ÎâČâČ;đ âČâČ2 } đ2 {ÎâČ;đ âČ
2}. By in-duction hypothesis, we have Î âą {Î;đ1âđ2} đ1 {ÎâČâČ;đ âČâČ
1 âđ âČâČ2 } and Î âą {ÎâČâČ;đ âČâČ
1 â đ âČâČ2 } đ2 {ÎâČ;đ âČ
1 â đ âČ2}. Then we
conclude by (Q-Seq).
âą
(Q-Weaken)Î âą {Î0;đ0} đ {ÎâČ0 ;đ âČ
0}Î |= Î0 ÎâČ0 |= ÎâČ Î |= đ1 â đ0 ÎâČ0 |= đ âČ
0 â đ âČ1
Î âą {Î;đ1} đ {ÎâČ;đ âČ1}
By inversion, there exist đ3, đâČ3 such that Î |= đ2 â đ3,
ÎâČ0 |= đ âČ3 â đ âČ
2, and Î âą {Î0;đ3} đ {ÎâČ0 ;đ âČ3}. By induction
hypothesis, we have Î âą {Î0;đ0 âđ3} đ {ÎâČ0 ;đ âČ0 âđ âČ
3}. Toapply (Q-Weaken), we need to show that Î |= đ1 â đ2 âđ0 â đ3 and ÎâČ0 |= đ âČ
0 â đ âČ3 â đ âČ
1 â đ âČ2. Then appeal to
Lemmas E.1 and E.2.âĄ
Now we can construct the annotated transition kernel toreduce the soundness proof to OST.
Proof of Lem. F.1. Let a def= âŠâ(âšđŸ, đ, đŸ, đŒâ©). By inversion
on Î âą {Î;đ} âšđŸ, đ, đŸ, đŒâ©, we know that đŸ |= Î, Î âą{Î;đ} đ {ÎâČ;đ âČ}, and Î âą {ÎâČ;đ âČ} đŸ for some ÎâČ, đ âČ. Weconstruct a probability measure ` as ^ (âšÎ, đ,đŸ, đ, đŸ, đŒâ©) byinduction on the derivation of Î âą {Î;đ} đ {ÎâČ;đ âČ}.
âą(Q-Skip)
Î âą {Î;đ} skip {Î;đ}By induction on the derivation of Î âą {Î;đ} đŸ .
â
(QK-Stop)
Î âą {Î;đ} KstopWe have a = đż (âšđŸ, skip,Kstop, đŒâ©). Then we set ` =
đż (âšÎ, đ,đŸ, skip,Kstop, đŒâ©). It is clear that đđ (đŸ) = 1 âđđ (đŸ).
â
(QK-Loop)Î âą {Π⧠đż;đ} đ {Î;đ} Î âą {Π⧠đż;đ} đŸ
Î âą {Î;đ} Kloop đż đ đŸLet đ â {â„,â€} be such that đŸ âą đż â đ.If đ = â€, then a = đż (âšđŸ, đ,Kloop đż đ đŸ, đŒâ©). We set` = đż (âšÎ ⧠đż,đ,đŸ, đ,Kloop đż đ đŸ, đŒâ©). In this case, weknow that đŸ |= Π⧠đż. By the premise, we know thatÎ âą {Π⧠đż;đ} đ {Î;đ}. It then remains to show thatÎ âą {Î;đ} Kloop đż đ đŸ . By (QK-Loop), it suffices toshow that Î âą {Îâ§đż;đ} đ {Î;đ} and Î âą {Îâ§ÂŹđż;đ}đŸ .Then appeal to the premise.If đ = â„, then ` = đż (âšđŸ, skip, đŸ, đŒâ©). We set ` = đż (âšÎ â§ÂŹđż,đ,đŸ, skip, đŸ, đŒâ©). In this case, we know that đŸ |=Îâ§ÂŹđż. By (Q-Skip), we have Î âą {Îâ§ÂŹđż;đ} skip {Îâ§
26
Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI â21, June 20â25, 2021, Virtual, UK
ÂŹđż;đ}. It then remains to show that Î âą {Î â§ÂŹđż;đ} đŸ .Then appeal to the premise.In both cases, đŸ andđ do not change, thus we concludethat đđ (đŸ) = 1 â đđ (đŸ).
â
(QK-Seq)Î âą {Î;đ} đ {ÎâČ;đ âČ} Î âą {ÎâČ;đ âČ} đŸ
Î âą {Î;đ} Kseq đ đŸWe have a = đż (âšđŸ, đ, đŸ, đŒâ©). Then we set ` =
đż (âšÎ, đ,đŸ, đ, đŸ, đŒâ©). By the premise, we know that Î âą{Î;đ} đ {ÎâČ;đ âČ} and Î âą {ÎâČ;đ âČ} đŸ . Because đŸ and đdo not change, we conclude that đđ (đŸ) = 1 â đđ (đŸ).
â
(QK-Weaken)Î âą {ÎâČ;đ âČ} đŸ Î |= ÎâČ Î |= đ â đ âČ
Î âą {Î;đ} đŸBecause đŸ |= Î and Î |= ÎâČ, we know thatđŸ |= ÎâČ. Let ` âČ be obtained from the inductionhypothesis on Î âą {ÎâČ;đ âČ} đŸ . Then đđâČ (đŸ) âđŒđâČâŒ`âČ [
ââââââââââââââââââââââââââââââš[(đŒ âČ â đŒ)đ , (đŒ âČ â đŒ)đ ]â©0â€đâ€đâđđâČâČ (đŸ âČ)], where
đ âČ = âš_, đ âČâČ, đŸ âČ, _, _, đŒ âČâ©. We set ` = ` âČ. Because Î |= đ âđ âČ and đŸ |= Î, we conclude that đđ (đŸ) â đđâČ (đŸ).
âą
(Q-Tick)
đ =ââââââââââââââââš[đđ , đđ ]â©0â€đâ€đ â đ âČ
Î âą {Î;đ} tick(đ) {Î;đ âČ}We have a = đż (âšđŸ, skip, đŸ, đŒ + đâ©). Then we set ` =
đż (âšÎ, đ âČ, đŸ, skip, đŸ, đŒ + đâ©). By (Q-Skip), we have Î âą{Î;đ âČ} skip {Î;đ âČ}. Then by the assumption, we haveÎ âą {Î;đ âČ} đŸ . It remains to show that đđ (đŸ) ââââââââââââââââš[đđ , đđ ]â©0â€đâ€đ â đđâČ (đŸ). Indeed, we have đđ (đŸ) =âââââââââââââââš[đđ , đđ ]â©0â€đâ€đ â đđâČ (đŸ) by the premise.
âą
(Q-Assign)Î = [đž/đ„]ÎâČ đ = [đž/đ„]đ âČ
Î âą {Î;đ} đ„ B đž {ÎâČ;đ âČ}Let đ â â be such that đŸ âą đž â đ . We have a = đż (âšđŸ [đ„ âŠâđ ], skip, đŸ, đŒâ©). Then we set ` = đż (âšÎâČ, đ âČ, đŸ [đ„ âŠâđ ], skip, đŸ, đŒâ©). Because đŸ âą Î, i.e., đŸ âą [đž/đ„]ÎâČ, weknow that đŸ [đ„ âŠâ đ ] âą ÎâČ. By (Q-Skip), we have Î âą{ÎâČ;đ âČ} skip {ÎâČ;đ âČ}. Then by the assumption, we haveÎ âą {ÎâČ;đ âČ} đŸ . It remains to show that đđ (đŸ) = 1 âđđâČ (đŸ [đ„ âŠâ đ ]). By the premise, we have đ = [đž/đ„]đ âČ,thus đđ (đŸ) = đ [đž/đ„ ]đâČ (đŸ) = đđâČ (đŸ [đ„ âŠâ đ ]).
âą
(Q-Sample)Î = âđ„ â supp(`đ· ) : ÎâČ đ = đŒđ„âŒ`đ· [đ âČ]
Î âą {Î;đ} đ„ ⌠đ· {ÎâČ;đ âČ}We have a = `đ· â«= _đ .đż (âšđŸ [đ„ âŠâ đ ], skip, đŸ, đŒâ©). Thenwe set ` = `đ· â«= _đ .đż (âšÎâČ, đ âČ, đŸ [đ„ âŠâ đ ], skip, đŸ, đŒâ©). Forall đ â supp(`đ· ), because đŸ |= âđ„ â supp(`đ· ) : ÎâČ, weknow that đŸ [đ„ âŠâ đ ] |= ÎâČ. By (Q-Skip), we have Î âą{ÎâČ;đ âČ} skip {ÎâČ;đ âČ}. Then by the assumption, we haveÎ âą {ÎâČ;đ âČ}đŸ . It remains to show thatđđ (đŸ) â đŒđâŒ`đ· [1âđđâČ (đŸ [đ„ âŠâ đ ])]. Indeed, because đ = đŒđ„âŒ`đ· [đ âČ], weknow that đđ (đŸ) = đđŒđ„âŒ`đ· [đâČ ] (đŸ) = đŒđâŒ`đ· [đđâČ (đŸ [đ„ âŠâđ ])].
âą
(Q-Call)âđ : (Î;đđ , ÎâČ;đ âČ
đ ) â Î(đ )Î âą {Î;âđ đđ } call đ {ÎâČ;âđ đ
âČđ}
We have a = đż (âšđŸ,đ(đ ), đŸ, đŒâ©). Then we set ` =
đż (âšÎ,âđ đđ , đŸ,đ(đ ), đŸ, đŒâ©). By the premise, we knowthat Î âą {Î;đđ } đ(đ ) {ÎâČ;đ âČ
đ } for each đ . ByLem. F.2 and simple induction, we know that Î âą{Î;âđ đđ } đ(đ ) {ÎâČ;âđ đ
âČđ }. Because đŸ and
âđ đđ do
not change, we conclude that đâđ đđ
(đŸ) = 1 â đâđ đđ
(đŸ).
âą
(Q-Prob)Î âą {Î;đ1} đ1 {ÎâČ;đ âČ} Î âą {Î;đ2} đ2 {ÎâČ;đ âČ}đ = đ â đ đ = âš[đ, đ], [0, 0], · · · , [0, 0]â© â đ1
đ = âš[1 â đ, 1 â đ], [0, 0], · · · , [0, 0]â© â đ2
Î âą {Î;đ} if prob(đ) then đ1 else đ2 fi {ÎâČ;đ âČ}We have a = đ · đż (âšđŸ, đ1, đŸ, đŒâ©) + (1 â đ) · đż (âšđŸ, đ2, đŸ, đŒâ©).Then we set ` = đ · đż (âšÎ, đ1, đŸ, đ1, đŸ, đŒâ© + (1 âđ) · đż (âšÎ, đ2, đŸ, đ2, đŸ, đŒâ©). From the assumption and thepremise, we know that đŸ |= Î, Î âą {ÎâČ;đ âČ} đŸ , andÎ âą {Î;đ1} đ1 {ÎâČ;đ âČ}, Î âą {Î;đ2} đ2 {ÎâČ;đ âČ}. It remainsto show thatđđ (đŸ)đ â„I (đ ·đđ1 (đŸ)đ )+I ((1âđ) ·đđ2 (đŸ)đ ),where the scalar product đ · [đ, đ] def
= [đđ, đđ] for đ â„ 0. Onthe other hand, from the premise, we haveđđ = đđ +PI đ đand đđ = (( [đ, đ], · · · , [0, 0]) âđ1)đ =
(đ0) ĂPI ( [đ, đ] ·PI
(đ1)đ ) = đ · (đ1)đ , as well asđ đ = (1âđ) · (đ2)đ . Therefore,we have đđ (đŸ)đ = đâââââââââââââââââââšđđ+PIđ đ â©0â€đâ€đ
(đŸ)đ = đ · đđ1 (đŸ)đ +I(1 â đ) · đđ2 (đŸ)đ .
âą
(Q-Cond)Î âą {Π⧠đż;đ} đ1 {ÎâČ;đ âČ} Î âą {Π⧠đż;đ} đ2 {ÎâČ;đ âČ}
Î âą {Î;đ} if đż then đ1 else đ2 fi {ÎâČ;đ âČ}Let đ â {â€,â„} be such that đŸ âą đż â đ.If đ = â€, then a = đż (âšđŸ, đ1, đŸ, đŒâ©). We set ` = đż (âšÎ â§đż,đ,đŸ, đ1, đŸ, đŒâ©). In this case, we know that đŸ |= Π⧠đż.By the premise and the assumption, we know that Î âą{Π⧠đż;đ} đ1 {ÎâČ;đ âČ} and Î âą {ÎâČ;đ âČ} đŸ .If đ = â„, then a = đż (âšđŸ, đ2, đŸ, đŒâ©). We set ` = đż (âšÎ â§ÂŹđż,đ,đŸ, đ2, đŸ, đŒâ©). In this case, we know that đŸ |= Π⧠đż.By the premise and the assumption, we know that Î âą{Π⧠đż;đ} đ2 {ÎâČ;đ âČ} and Î âą {ÎâČ;đ âČ} đŸ .In both cases, đŸ and đ do not change, thus we concludethat đđ (đŸ) = 1 â đđ (đŸ).
âą
(Q-Loop)Î âą {Π⧠đż;đ} đ {Î;đ}
Î âą {Î;đ} while đż do đ1 od {Π⧠đż;đ}We have a = đż (âšđŸ, skip,Kloop đż đ đŸ, đŒâ©). Then we set` = đż (âšÎ, đ,đŸ, skip,Kloop đż đ đŸ, đŒâ©). By (Q-Skip), wehave Î âą {Î;đ} skip {Î;đ}. Then by the assumptionÎ âą {Π⧠đż;đ} đŸ and the premise, we know that Î âą{Î;đ} Kloop đż đ đŸ by (QK-Loop). Because đŸ and đ donot change, we conclude that đđ (đŸ) = 1 â đđ (đŸ).
âą
(Q-Seq)Î âą {Î;đ} đ1 {ÎâČ;đ âČ} Î âą {ÎâČ;đ âČ} đ2 {ÎâČâČ;đ âČâČ}
Î âą {Î;đ} đ1; đ2 {ÎâČâČ;đ âČâČ}
27
PLDI â21, June 20â25, 2021, Virtual, UK Di Wang, Jan Hoffmann, and Thomas Reps
We have a = đż (âšđŸ, đ1,Kseq đ2 đŸ, đŒâ©). Then we set` = đż (âšÎ, đ,đŸ, đ1,Kseq đ2 đŸ, đŒâ©). By the first premise,we have Î âą {Î;đ} đ1 {ÎâČ;đ âČ}. By the assumptionÎ âą {ÎâČâČ;đ âČâČ} đŸ and the second premise, we know thatÎ âą {ÎâČ;đ âČ} Kseq đ2 đŸ by (QK-Seq). Because đŸ and đ donot change, we conclude that đđ (đŸ) = 1 â đđ (đŸ).
âą
(Q-Weaken)Î âą {Î0;đ0} đ {ÎâČ0 ;đ âČ
0}Î |= Î0 ÎâČ0 |= ÎâČ Î |= đ â đ0 ÎâČ0 |= đ âČ
0 â đ âČ
Î âą {Î;đ} đ {ÎâČ;đ âČ}By đŸ |= Î and Î |= Î0, we know that đŸ |= Î0. By theassumption Î âą {ÎâČ;đ âČ} đŸ and the premise ÎâČ0 |= ÎâČ,ÎâČ0 |= đ âČ
0 â đ âČ, we derive Î âą {ÎâČ0 ;đ âČ0} đŸ by (QK-
Weaken). Thus let `0 be obtained by the induction hy-pothesis on Î âą {Î0;đ0} đ {ÎâČ0 ;đ âČ
0}. Then đđ0 (đŸ) âđŒđâČâŒ`0 [
ââââââââââââââââââââââââââââââš[(đŒ âČ â đŒ)đ , (đŒ âČ â đŒ)đ ]â©0â€đâ€đ â đđâČâČ (đŸ âČ)], where
đ âČ = âš_, đ âČâČ, đŸ âČ, _, _, đŒ âČâ©. We set ` = `0. By the premiseÎ |= đ â đ0 and đŸ |= Î, we conclude that đđ (đŸ) â đđ0 (đŸ).
âĄ
Therefore, we can use the annotated kernel ^ above tore-construct the trace-based moment semantics in appen-dix C. Then we can define the potential function on an-notated program configurations as đ (đ) def
= đđ (đŸ) wheređ = âš_, đ,đŸ, _, _, _â©.
The next step is to apply the extended OST for intervalbounds (Thm. E.6). Recall that the theorem requires that forsome â â â and đ¶ â„ 0, â„đđ â„⠆𶠷 (đ + 1)â almost surelyfor all đ â â€+. One sufficient condition for the requirementis to assume the bounded-update property, i.e., every (deter-ministic or probabilistic) assignment to a program variableupdates the variable with a bounded change. As observedbyWang et al. [43], bounded updates are common in practice.We formulate the idea as follows.Lemma F.3. If there exists đ¶0 â„ 0 such that for all đ â â€+
and đ„ â VID, it holds that â[|đŸđ+1 (đ„) â đŸđ (đ„) | †đ¶0] = 1where đ is an infinite trace, đđ = âšđŸđ, _, _, _â©, and đđ+1 =
âšđŸđ+1, _, _, _â©, then there exists đ¶ â„ 0 such that for all đ â â€+,â„đđ â„⠆𶠷 (đ + 1)đđ almost surely.
Proof. Let đ¶1 â„ 0 be such that for all tick(đ) statements inthe program, |đ | †đ¶1. Then for all đ , if đđ = âš_, _, _, đŒđâ©,then |đŒđ | †đ · đ¶1. On the other hand, we know thatâ[|đŸđ (đ„) â đŸ0 (đ„) | †đ¶0 · đ] = 1 for any variable đ„ . As weassume all the program variables are initialized to zero, weknow that â[|đŸđ (đ„) | †đ¶0 · đ] = 1. From the construction inthe proof of Lem. F.1, we know that all the templates usedto define the interval-valued potential function should havealmost surely bounded coefficients. Let đ¶2 â„ 0 be such abound. Also, the đ-th component in a template is a poly-nomial in âđđ [VID]. Therefore, Ίđ (đ) = đ (đđ) = đđđ
(đŸđ),and
|đđđ(đŸđ)đ | â€
đđâïžđ=0
đ¶2 · |VID|đ · |đ¶0 · đ |đ †đ¶3 · (đ + 1)đđ , a.s.,
for some sufficiently large constant đ¶3. Thus| (đđ)đ | = | (đŽđ â Ίđ)đ |
= |đâïžI
đ=0
(đ
đ
)ĂI ((đŽđ)đ ·I (Ίđ)đâđ ) |
â€đâïžđ=0
(đ
đ
)· (đ ·đ¶1)đ · (đ¶3 · (đ + 1)) (đâđ)đ
†đ¶4 · (đ + 1)đđ , a.s.,for some sufficiently large constant đ¶4. Therefore â„đđ â„â â€đ¶5 · (đ+1)đđ , a.s., for some sufficiently large constantđ¶5. âĄ
Now we prove the soundness of bound inference.
Theorem (Thm. 4.4). Suppose Î âą {Î;đ} đmain {ÎâČ; 1} andâą Î. Then đŒ[đŽđ ] â đđ (__.0), i.e., the moments đŒ[đŽđ ] of theaccumulated cost upon program termination are bounded byintervals in đđ (__.0) where đ is the quantitative context and__.0 is the initial valuation, if both of the following propertieshold:(i) đŒ[đđđ ] < â, and(ii) there exists đ¶0 â„ 0 such that for all đ â â€+ and đ„ â VID,
it holds almost surely that |đŸđ+1 (đ„) â đŸđ (đ„) | †đ¶0 whereâšđŸđ, _, _, _â© = đđ and âšđŸđ+1, _, _, _â© = đđ+1 of an infinitetrace đ .
Proof. By Lem. F.3, there exists đ¶ â„ 0 such that â„đđ â„⠆𶠷(đ+1)đđ almost surely for all đ â â€+. By the assumption, wealso know that đŒ[đđđ ] < â. Thus by Thm. E.6, we concludethat đŒ[đđ ] â đŒ[đ0], i.e., đŒ[đŽđ ] â đŒ[Ί0] = đđ (__.0). âĄ
G Termination AnalysisIn this section, we develop a technique to reason about upperbounds on higher moments đŒ[đđ] of the stopping time đ .We adapt the idea of expected-potential functions, but relyon a simpler convergence proof. In this section, we assumeR = ( [0,â], â€, +, ·, 0, 1) to be a partially ordered semiringon extended nonnegative real numbers.
Definition G.1. A map đ : ÎŁ â M(đ)R is said to be an
expected-potential function for upper bounds on stoppingtime if(i) đ (đ)0 = 1 for all đ â ÎŁ,(ii) đ (đ) = 1 if đ = âš_, skip,Kstop, _â©, and(iii) đ (đ) â đŒđâČâŒâŠâ(đ) [âš1, 1, · · · , 1â© â đ (đ âČ)] for all non-
terminating configuration đ â ÎŁ.
Intuitively, đ (đ) is an upper bound on the moments ofthe evaluation steps upon termination for the computationthat continues from the configuration đ . We defineđŽđ and Κđwheređ â â€+ to be random variables on the probability space(Ω,F ,â) of the trace semantics as đŽđ (đ) def
=âââââââââââšđđâ©0â€đâ€đ and
Κđ (đ) def= đ (đđ). Then we define đŽđ (đ) def
= đŽđ (đ) (đ). Notethat đŽđ =
âââââââââââšđđâ©0â€đâ€đ .
28
Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI â21, June 20â25, 2021, Virtual, UK
Table 4. Upper bounds on the raw/central moments of run-times, with comparison to Kura et al. [26]. âT/Oâ stands fortimeout after 30 minutes. âN/Aâ means that the tool is notapplicable. â-â indicates that the tool fails to infer a bound.Entries with more precise results or less analysis time aremarked in bold.
program moment this work Kura et al. [26]upper bnd. time (sec) upper bnd. time (sec)
(1-1)
2nd raw 201 0.019 201 0.0153rd raw 3829 0.019 3829 0.0204th raw 90705 0.023 90705 0.027
2nd central 32 0.029 N/A N/A4th central 9728 0.058 N/A N/A
(1-2)
2nd raw 2357 1.068 3124 0.0373rd raw 148847 1.512 171932 0.0624th raw 11285725 1.914 12049876 0.096
2nd central 362 3.346 N/A N/A4th central 955973 9.801 N/A N/A
(2-1)
2nd raw 2320 0.016 2320 11.3803rd raw 691520 0.018 - 16.0564th raw 340107520 0.021 - 23.414
2nd central 1920 0.026 N/A N/A4th central 289873920 0.049 N/A N/A
(2-2)
2nd raw 8375 0.022 8375 38.4633rd raw 1362813 0.028 - 73.4084th raw 306105209 0.035 - 141.072
2nd central 5875 0.029 N/A N/A4th central 447053126 0.086 N/A N/A
(2-3)
2nd raw 3675 0.039 6710 48.6623rd raw 618584 0.049 19567045 0.0394th raw 164423336 0.055 - T/O
2nd central 3048 0.053 N/A N/A4th central 196748763 0.123 N/A N/A
(2-4)
2nd raw 6625 0.035 10944 216.3523rd raw 742825 0.048 - 453.4354th raw 101441320 0.072 - 964.579
2nd central 6624 0.051 N/A N/A4th central 313269063 0.215 N/A N/A
(2-5)
2nd raw 21060 0.045 - 216.6053rd raw 9860940 0.063 - 467.5774th raw 7298339760 0.101 - 1133.947
2nd central 20160 0.068 N/A N/A4th central 8044220161 0.271 N/A N/A
Table 3. Upper bounds of the expectations of runtimes, withcomparison to Kura et al. [26].program upper bound by our tool upper bound by Kura et al. [26](1-1) 13 (deg 2, 0.106s) 13 (deg 1, 0.011s)(1-2) 44.6667 (deg 4, 0.417s) 68 (deg 1, 0.020s)(2-1) 20 (deg 1, 0.096s) 20 (deg 1, 0.011s)(2-2) 75 (deg 1, 0.116s) 75 (deg 1, 0.015s)(2-3) 42 (deg 1, 0.260s) 62 (deg 1, 0.017s)(2-4) 73 (deg 1, 0.143s) 96 (deg 1, 0.013s)(2-5) 90 (deg 1, 0.244s) 90 (deg 1, 0.017s)
We now show that a valid potential function for stoppingtime always gives a sound upper bound.
Theorem G.2. đŒ[đŽđ ] â đŒ[Κ0].Proof. Let đ¶đ (đ) def
= âš1, 1, · · · , 1â© if đ < đ (đ), otherwiseđ¶đ (đ) def
= âš1, 0, · · · , 0â©. Then đŽđ =ââ
đ=0đ¶đ . By Prop. A.3,we know that đŒ[đŽđ ] = limđââ đŒ[âđ
đ=0đ¶đ ]. Thus it sufficesto show that for all đ â â€+, đŒ[âđ
đ=0đ¶đ ] â đŒ[Κ0].Observe that {đ¶đ}đââ€+ is adapted to {Fđ}đââ€+ , because
the event {đ †đ} is Fđ-measurable. Then we haveđŒ[
âđ
đ=0đ¶đ â Κđ+1 | Fđ]
=âđâ1
đ=0đ¶đ â đŒ[đ¶đ â Κđ+1 | Fđ], đ.đ .,
ââđâ1
đ=0đ¶đ â Κđ .
Therefore, đŒ[âđ
đ=0đ¶đ â Κđ+1] â đŒ[Κ0] for all đ â â€+ by asimple induction. Because Κđ+1 â 1, we conclude that
đŒ[âđ
đ=0đ¶đ ] â đŒ[
âđ
đ=0đ¶đ â Κđ+1] â đŒ[Κ0] .
âĄ
H Experimental EvaluationTab. 4 and Fig. 15 are the complete versions of the statisticsin Tab. 1 and the plots in Fig. 9, respectively.Tab. 3 compares our tool with Kura et al. [26] on their
benchmarks for upper bounds on the first moments of run-times. The programs (1-1) and (1-2) are coupon-collectorproblems with a total of two and four coupons, respectively.The other five are variants of random walks. The first threeare one-dimensional random walks: (2-1) is integer-valued,(2-2) is real-valued with continuous sampling, and (2-3) ex-hibits adversarial nondeterminism. The programs (2-4) and(2-5) are two-dimensional random walks. In the parenthesesafter the bounds, we record the degree of polynomials for thetemplates and the running time of the analysis. All analysescompleted is less than half a second, and our tool can derivebetter bounds than the compared tool [26].
Tab. 5 compares our tool with Absynth by Ngo et al. [31]on their benchmarks for upper bounds on the first momentsof monotone cost accumulators. Both tools are able to infersymbolic polynomial bounds. Absynth uses a finer-grainedset of base functions, and it supports bounds of the form| [đ„,đŠ] |, which is defined as max(0, đŠ â đ„). Our tool is as pre-cise as, and sometimes more precise than Absynth,3 but it isless efficient thanAbsynth. One reason for this could be thatwe use a full-fledged numerical abstract domain to derivethe logical contexts, while Absynth employs less powerfulbut more efficient heuristics to do that step. Nevertheless, allthe analyses completed in less than one second.Tab. 6 compares our tool with Wang et al. [43] on their
benchmarks for lower and upper bounds on the first mo-ments of accumulated costs. All the benchmark programs3We compared the precision by first simplifying the bounds derived byAbsynth using the pre-conditions shown in Tab. 5, and then comparingthe polynomials.
29
PLDI â21, June 20â25, 2021, Virtual, UK Di Wang, Jan Hoffmann, and Thomas Reps
20 30 40 500
0.1
0.2
đ
tailprobability
â[đ
â„đ]
(1-1)
by raw moments [26]by 2nd central momentby 4th central moment
100 150 2000
0.1
0.2
đ
tailprobability
â[đ
â„đ]
(1-2)
by raw moments [26]by 2nd central momentby 4th central moment
200 400 6000
0.1
0.2
đ
tailprobability
â[đ
â„đ]
(2-1)
by raw moments [26]by 2nd central momentby 4th central moment
200 400 6000
0.1
0.2
đ
tailprobability
â[đ
â„đ]
(2-2)
by raw moments [26]by 2nd central momentby 4th central moment
200 400 6000
0.1
0.2
đ
tailprobability
â[đ
â„đ]
(2-3)
by raw moments [26]by 2nd central momentby 4th central moment
200 400 6000
0.1
0.2
đ
tailprobability
â[đ
â„đ]
(2-4)
by raw moments [26]by 2nd central momentby 4th central moment
400 700 1,000 1,3000
0.1
0.2
đ
tailprobability
â[đ
â„đ]
(2-5)
by raw moments [26]by 2nd central momentby 4th central moment
Figure 15. Upper bound of the tail probability â[đ â„ đ] as a function of đ , with comparison to Kura et al. [26]. Each gray lineis the minimum of tail bounds obtained from the raw moments of degree up to four inferred by Kura et al. [26]. Green linesand red lines are the tail bounds given by 2nd and 4th central moments inferred by our tool, respectively.
satisfy the bounded-update property. To ensure soundness,our tool has to perform an extra termination check requiredby Thm. 4.4. Our tool derives similar symbolic lower andupper bounds, compared to the results of Wang et al. [43].Meanwhile, our tool is more efficient than the compared tool.One source of slowdown for Wang et al. [43] could be thatthey use Matlab as the LP backend, and the initialization ofMatlab has significant overhead.
I Case Study: Timing-Attack AnalysisWe motivate our work on central-moment analysis using aprobabilistic program with a timing-leak vulnerability, anddemonstrate how the results from an analysis can be used tobound the success rate of an attack program that attemptsto exploit the vulnerability. The program is extracted andmodified from a web application provided by DARPA duringengagements as part of the STAC program [15]. In essence,the program models a password checker that compares aninput guess with an internally stored password secret, rep-resented as two đ -bit vectors. The program in Fig. 16(a) isthe interface of the checker, and Fig. 16(b) is the comparisonfunction compare, which carries out most of the computa-tion. The statements of the form âtick(·)â represent a costmodel for the running time of compare, which is assumedto be observable by the attacker. compare iterates over thebits from high-index to low-index, and the running time ex-pended during the processing of bit đ depends on the currentcomparison result (stored in cmp), as well as on the valuesof the đ-th bits of guess and secret. Because the running timeof compare might leak information about the relationshipbetween guess and secret, compare introduces some random
delays to add noise to its running time. However, we willsee shortly that such a countermeasure does not protect theprogram from a timing attack.We now show how the moments of the running time of
compareâthe kind of information provided by our central-moment analysis (§3)âare useful for analyzing the successprobability of the attack program given in Fig. 16(c). Let đbe the random variable for the running time of compare.A standard timing attack for such programs is to guess thebits of secret successively. The idea is the following: Sup-pose that we have successfully obtained the bits secret [đ + 1]through secret [đ ]; we now guess that the next bit, secret [đ],is 1 and set guess[đ] B 1. Theoretically, if the following twoconditional expectations
đŒ[đ1] def= đŒ[đ | â§đ
đ=đ+1 (secret [ đ] = guess[ đ])⧠(secret [đ] = 1 ⧠guess[đ] = 1)]
(11)
đŒ[đ0] def= đŒ[đ | â§đ
đ=đ+1 (secret [ đ] = guess[ đ])⧠(secret [đ] = 0 ⧠guess[đ] = 1)]
(12)
have a significant difference, then there is an opportunityto check our guess by running the program multiple times,using the average running time as an estimate of đŒ[đ ], andchoosing the value of guess[đ] according to whichever of (11)and (12) is closest to our estimate. However, if the differencebetween đŒ[đ1] and đŒ[đ0] is not significant enough, or theprogram produces a large amount of noise in its runningtime, the attack might not be realizable in practice. To deter-mine whether the timing difference represents an exploitablevulnerability, we need to reason about the attack programâssuccess rate.
30
Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI â21, June 20â25, 2021, Virtual, UK
Table 5. Upper bounds of the expectations of monotone costs, with comparison to Ngo et al. [31].program pre-condition upper bound by our tool upper bound by Absynth [31]2drdwalk đ < đ 2(đ â đ + 1) (deg 1, 0.269s) 2| [đ, đ + 1] | (deg 1, 0.170s)C4B_t09 đ„ > 0 17đ„ (deg 1, 0.061s) 17| [0, đ„] | (deg 1, 0.014s)C4B_t13 đ„ > 0 ⧠đŠ > 0 1.25đ„ + đŠ (deg 1, 0.060s) 1.25| [0, đ„] | + | [0, đŠ] | (deg 1, 0.008s)C4B_t15 đ„ > đŠ ⧠đŠ > 0 1.1667đ„ (deg 1, 0.072s) 1.3333| [0, đ„] | (deg 1, 0.014s)C4B_t19 đ > 100 ⧠đ > 0 đ + 2đ â 49 (deg 1, 0.059s) | [0, 51 + đ + đ] | + 2| [0, đ] | (deg 1, 0.010s)C4B_t30 đ„ > 0 ⧠đŠ > 0 0.5đ„ + 0.5đŠ + 2 (deg 1, 0.044s) 0.5| [0, đ„ + 2] | + 0.5| [0, đŠ + 2] | (deg 1, 0.007s)C4B_t61 đ â„ 8 1.4286đ (deg 1, 0.046s) | [0, đ] | + 0.5| [1, đ] | (deg 1, 0.007s)
bayesian_network đ > 0 4đ (deg 1, 0.264s) 4| [0, đ] | (deg 1, 0.057s)ber đ„ < đ 2(đ â đ„) (deg 1, 0.035s) 2| [đ„, đ] | (deg 1, 0.004s)bin đ > 0 0.2(đ + 9) (deg 1, 0.036s) 0.2| [0, đ + 9] | (deg 1, 0.030s)
condand đ > 0 â§đ > 0 2đ (deg 1, 0.042s) 2| [0,đ] | (deg 1, 0.004s)cooling đđĄ > đ đĄ ⧠đĄ > 0 đđĄ â đ đĄ + 0.42đĄ + 2.1 (deg 1, 0.071s) 0.42| [0, đĄ + 5] | + | [đ đĄ,đđĄ] | (deg 1, 0.017s)coupon †11.6667 (deg 4, 0.066s) 15 (deg 1, 0.016s)
cowboy_duel †1.2 (deg 1, 0.030s) 1.2 (deg 1, 0.004s)cowboy_duel_3way †2.0833 (deg 1, 0.110s) 2.0833 (deg 1, 0.142s)
fcall đ„ < đ 2(đ â đ„) (deg 1, 0.053s) 2| [đ„, đ] | (deg 1, 0.004s)filling_vol đŁđđđ đč > 0 0.6667đŁđđđ đč + 7 (deg 1, 0.082s) 0.3333| [0, đŁđđđ đč + 10] | + 0.3333| [0, đŁđđđ đč + 11] | (deg 1, 0.079s)
geo †5 (deg 1, 0.061s) 5 (deg 1, 0.003s)hyper đ„ < đ 5(đ â đ„) (deg 1, 0.035s) 5| [đ„, đ] | (deg 1, 0.005s)linear01 đ„ > 2 0.6đ„ (deg 1, 0.034s) 0.6| [0, đ„] | (deg 1, 0.008s)prdwalk đ„ < đ 1.1429(đ â đ„ + 4) (deg 1, 0.037s) 1.1429| [đ„, đ + 4] | (deg 1, 0.011s)prnes đŠ > 0 ⧠đ < 0 â68.4795đ + 0.0526(đŠ â 1) (deg 1, 0.071s) 68.4795| [đ, 0] | + 0.0526| [0, đŠ] | (deg 1, 0.016s)prseq đŠ > 9 ⧠đ„ â đŠ > 2 1.65đ„ â 1.5đŠ (deg 1, 0.061s) 1.65| [đŠ, đ„] | + 0.15| [0, đŠ] | (deg 1, 0.011s)prspeed đ„ + 1 < đ ⧠đŠ < đ 2(đ â đŠ) + 0.6667(đ â đ„) (deg 1, 0.065s) 2| [đŠ,đ] | + 0.6667| [đ„, đ] | (deg 1, 0.010s)race â < đĄ 0.6667(đĄ â â + 9) (deg 1, 0.039s) 0.6667| [â, đĄ + 9] | (deg 1, 0.027s)rdseql đ„ > 0 ⧠đŠ > 0 2.25đ„ + đŠ (deg 1, 0.059s) 2.25| [0, đ„] | + | [0, đŠ] | (deg 1, 0.008s)rdspeed đ„ + 1 < đ ⧠đŠ < đ 2(đ â đŠ) + 0.6667(đ â đ„) (deg 1, 0.062s) 2| [đŠ,đ] | + 0.6667| [đ„, đ] | (deg 1, 0.010s)rdwalk đ„ < đ 2(đ â đ„ + 1) (deg 1, 0.035s) 2| [đ„, đ + 1] | (deg 1, 0.004s)
rejection_sampling đ > 0 2đ (deg 2, 0.071s) 2| [0, đ] | (deg 1, 0.350s)rfind_lv †2 (deg 1, 0.033s) 2 (deg 1, 0.004s)rfind_mc đ > 0 2 (deg 1, 0.047s) | [0, đ] | (deg 1, 0.007s)robot đ > 0 0.2778(đ + 7) (deg 1, 0.047s) 0.3846| [0, đ + 6] | (deg 1, 0.017s)roulette đ < 10 â4.9333đ + 98.6667 (deg 1, 0.057s) 4.9333| [đ, 20] | (deg 1, 0.073s)sprdwalk đ„ < đ 2(đ â đ„) (deg 1, 0.036s) 2| [đ„, đ] | (deg 1, 0.004s)
trapped_miner đ > 0 7.5đ (deg 1, 0.081s) 7.5| [0, đ] | (deg 1, 0.015s)complex đŠ,đ€, đ,đ > 0 4đđ + 2đ +đ€ + 0.6667(đŠ + 1) (deg 2, 0.142s) (4| [0,đ] | + 2) | [0, đ] | + (| [0,đ€] | + 0.3333| [0, đŠ] |) | [0, đŠ + 1] | + 0.6667 (deg 2, 0.451s)multirace đ > 0 â§đ > 0 2đđ + 4đ (deg 2, 0.080s) 2| [0,đ] | | [0, đ] | + 4| [0, đ] | (deg 2, 0.692s)pol04 đ„ > 0 4.5đ„2 + 10.5đ„ (deg 2, 0.052s) 1.5| [0, đ„] |2 + 3| [1, đ„] | | [0, đ„] | + 10.5| [0, đ„] | (deg 2, 0.101s)pol05 đ„ > 0 0.6667(đ„2 + 3đ„) (deg 2, 0.059s) 2| [0, đ„ + 1] | | [0, đ„] | (deg 2, 0.133s)pol07 đ > 1 1.5đ2 â 4.5đ + 3 (deg 2, 0.057s) 1.5| [1, đ] | | [2, đ] | (deg 2, 0.203s)rdbub đ > 0 3đ2 (deg 2, 0.055s) 3| [0, đ] |2 (deg 2, 0.030s)
recursive đđđđ < đđđâ 0.25(đđđâ â đđđđ )2 + 1.75(đđđâ â đđđđ ) (deg 2, 0.313s) 0.25| [đđđđ , đđđâ] |2 + 1.75| [đđđđ , đđđâ] | (deg 2, 0.398s)trader đđ > 0 ⧠đ đ > đđ 1.5(đ đ2 âđđ2 + đ đ âđđ) (deg 2, 0.073s) 1.5| [đđ, đ đ] |2 + 3| [đđ, đ đ] | | [0,đđ] | + 1.5| [đđ, đ đ] | (deg 2, 0.192s)
Table 6. Upper and lower bounds of the expectation of (possibly) non-monotone costs, with comparison to Wang et al. [43].program pre-cond. termination bounds by our tool bounds by Wang et al. [43]Bitcoin
đ„ â„ 1 đŒ[đ ] < â ub. â1.475đ„ (deg 1, 0.078s) â1.475đ„ + 1.475 (deg 2, 3.705s)Mining (deg 1, 0.080s) lb. â1.5đ„ (deg 1, 0.088s) â1.5đ„ (deg 2, 3.485s)Bitcoin
đŠ â„ 0 đŒ[đ 2] < â ub. â7.375đŠ2 â 66.375đŠ (deg 2, 0.127s) â7.375đŠ2 â 41.625đŠ + 49 (deg 2, 5.936s)Mining Pool (deg 4, 0.168s) lb. â7.5đŠ2 â 67.5đŠ (deg 2, 0.134s) â7.5đŠ2 â 67.5đŠ (deg 2, 6.157s)Queueing
đ > 0 đŒ[đ 2] < â ub. 0.0531đ (deg 2, 0.134s) 0.0492đ (deg 3, 69.669s)Network (deg 2, 0.208s) lb. 0.028đ (deg 2, 0.139s) 0.0384đ (deg 3, 68.849s)Running
đ„ â„ 0 đŒ[đ 2] < â ub. 0.3333(đ„2 + đ„) (deg 2, 0.063s) 0.3333đ„2 + 0.3333đ„ (deg 2, 3.766s)Example (deg 2, 0.064s) lb. 0.3333(đ„2 + đ„) (deg 2, 0.059s) 0.3333đ„2 + 0.3333đ„ â 0.6667 (deg 2, 3.555s)Nested
đ â„ 0 đŒ[đ 2] < â ub. 0.3333đ2 + đ (deg 2, 0.117s) 0.3333đ2 + đ (deg 2, 28.398s)Loop (deg 4, 0.127s) lb. 0.3333đ2 + đ (deg 2, 0.115s) 0.3333đ2 â đ (deg 2, 7.299s)
Randomđ„ †đ đŒ[đ ] < â ub. 2.5đ„ â 2.5đ â 2.5 (deg 1, 0.064s) 2.5đ„ â 2.5đ (deg 2, 4.536s)
Walk (deg 1, 0.063s) lb. 2.5đ„ â 2.5đ â 2.5 (deg 1, 0.068s) 2.5đ„ â 2.5đ â 2.5 (deg 2, 4.512s)
2D Robot đ„ â„ đŠ đŒ[đ 2] < â ub. 1.7280(đ„ â đŠ)2 + 31.4539(đ„ â đŠ) + 126.5167 (deg 2, 0.132s) 1.7280(đ„ â đŠ)2 + 31.4539(đ„ â đŠ) + 126.5167 (deg 2, 7.133s)(deg 2, 0.145s) lb. 1.7280(đ„ â đŠ)2 + 31.4539(đ„ â đŠ) + 29.7259 (deg 2, 0.121s) 1.7280(đ„ â đŠ)2 + 31.4539(đ„ â đŠ) (deg 2, 7.040s)
Good đ †30 ⧠đŒ[đ 2] < â ub. â0.5đ â 3.6667đ + 117.3333 (deg 2, 0.093s) 0.0067đđ â 0.7đ â 3.8035đ + 0.0022đ2 + 119.4351 (deg 2, 5.272s)Discount đ â„ 1 (deg 2, 0.093s) lb. â0.005đ2 â 0.5đ (deg 2, 0.092s) 0.0067đđ â 0.7133đ â 3.8123đ + 0.0022đ2 + 112.3704 (deg 2, 5.323s)Pollutant
đ â„ 0 đŒ[đ 2] < â ub. â0.2đ2 + 50.2đ (deg 2, 0.092s) â0.2đ2 + 50.2đ (deg 2, 5.851s)Disposal (deg 2, 0.091s) lb. â0.2đ2 + 50.2đ â 435.6 (deg 2, 0.094s) â0.2đ2 + 50.2đ â 482 (deg 2, 5.215s)Species đ â„ 5 ⧠đŒ[đ 2] < â ub. 40đđ â 180đ â 180đ + 810 (deg 2, 0.045s) 40đđ â 180đ â 180đ + 810 (deg 3, 5.545s)Fight đ â„ 5 (deg 2, 0.042s) lb. N/A N/A
31
PLDI â21, June 20â25, 2021, Virtual, UK Di Wang, Jan Hoffmann, and Thomas Reps
1 func compare(guess, secret) begin2 đ B đ ; cmp B 0;3 while đ > 0 do4 tick(2);5 while đ > 0 do6 if prob(0.5) then break fi7 tick(5);8 if cmp > 0 âš (cmp = 0 ⧠guess[đ] > secret [đ])9 then cmp B 110 else11 tick(5);12 if cmp < 0 âš (cmp = 0 ⧠guess[đ] < secret [đ])13 then cmp B â114 fi15 fi16 tick(1);17 đ B đ â 118 od19 od;20 return cmp21 end
(b)
1 func check(guess) begin2 cmp B compare(guess, secret);3 if cmp = 0 then4 login()5 fi6 end
(a)
1 guess B Âź0;2 đ B đ ;3 while đ > 0 do4 next B guess;5 next [đ] B 1;6 est B estimateTime(đŸ, check(next));7 if est †13đ â 1.5đ then8 guess[đ] B 09 else10 guess[đ] B 111 fi;12 đ B đ â 113 od
(c)
Figure 16. (a) The interface of the password checker. (b) A function that compares two bit vectors, adding some randomnoise. (c) An attack program that attempts to exploit the timing properties of compare to find the value of the password storedin secret.
Toward this end, we can analyze the failure probabilityfor setting guess[đ] incorrectly, which happens when, dueto an unfortunate fluctuation, the running-time estimate estis closer to one of đŒ[đ1] and đŒ[đ0], but the truth is actuallythe other. For instance, suppose that đŒ[đ0] < đŒ[đ1] andđđ đĄ < đŒ[đ0 ]+đŒ[đ1 ]
2 ; the attack program would pick đ0 as thetruth, and set guess[đ] to 0. If such a choice is incorrect, thenthe actual distribution of est on the đ-th round of the attackprogram satisfies đŒ[est] = đŒ[đ1], and the probability of thisfailure event is
â
[est <
đŒ[đ0] + đŒ[đ1]2
]= â
[est â đŒ[đ1] < đŒ[đ0] â đŒ[đ1]
2
]= â
[est â đŒ[est] < đŒ[đ0] â đŒ[đ1]
2
]under the condition given by the conjunction in (11). Thisformula has exactly the same shape as a tail probability,whichmakes it possible to utilizemoments and concentration-of-measure inequalities [13] to bound the probability.
The attack program is parameterized by đŸ > 0, which rep-resents the number of trials it performs for each bit positionto obtain an estimate of the running time. Assume that wehave applied our central-moment-analysis technique (§3),and obtained the following inequalities on the mean (i.e., thefirst moment), the second moment, and the variance (i.e., the
second central moment) of the quantities (11) and (12).đŒ[đ1] â„ 13đ, đŒ[đ1] †15đ,đ [đ1] †26đ 2 + 42đ, (13)đŒ[đ0] â„ 13đ â 5đ, đŒ[đ0] †13đ â 3đ,đ [đ0] †8đ â 36đ2 + 52đđ + 24đ . (14)
To bound the probability that the attack program makes anincorrect guess for the đ-th bit, we do case analysis:âą Suppose that secret [đ] = 1, but the attack program assignsguess[đ] B 0. The truthâwith respect to the actual distri-bution of the running timeđ of compare for the đ-th bitâisthatđŒ[est] = đŒ[đ1], but the attack program in Fig. 16(c) ex-ecutes the then-branch of the conditional statement. Thus,our task reduces to that of bounding â[est < 13đ â 1.5đ].The estimate est is the average of đŸ i.i.d. random variablesdrawn from a distribution with mean đŒ[đ1] and varianceđ [đ1]. We derive the following, using the inequalities from(13):
đŒ[est] = đŒ[đ1] â„ 13đ,
đ [est] = đ [đ1]đŸ
†26đ 2 + 42đđŸ
. (15)We now use Cantelliâs inequality.
Proposition (Cantelliâs inequality). Ifđ is a random vari-able and đ > 0, then we haveâ[đ âđŒ[đ ] †âđ] †đ [đ ]
đ [đ ]+đ2and â[đ â đŒ[đ ] â„ đ] †đ [đ ]
đ [đ ]+đ2 .
32
Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI â21, June 20â25, 2021, Virtual, UK
We are now able to derive an upper bound on â[est <
13đ â 1.5đ] as follows:â[est †13đ â 1.5đ]
= â[est â 13đ †â1.5đ]†â[est â đŒ[est] †â1.5đ] â đŒ[est] â„ 13đ by (15) â
†đ [est]đ [est] + (1.5đ)2 â Cantelliâs inequality â
=26đ 2 + 42đ
26đ 2 + 42đ + 2.25đŸđ2 .âą The other case, in which secret [đ] = 0 but the attack pro-gram chooses to set guess[đ] B 1, can be analyzed in asimilar way to the previous case, and the bound obtainedis the following:
â[est > 13đ â 1.5đ]†â[est â„ 13đ â 1.5đ]
†8đ â 36đ2 + 52đđ + 24đ8đ â 36đ2 + 52đđ + 24đ + 2.25đŸđ2 .
Let đč đ1 and đč đ0, respectively, denote the two upper bounds onthe failure probabilities for the đ-th bit.
For the attack program to succeed, it has to succeed forall bits. If the number of bits is đ = 32, and in each iterationthe number of trials that the attack program uses to estimatethe running time is đŸ = 104, we derive a lower bound on thesuccess rate of the attack program from the upper boundson the failure probabilities derived above:
â[Success] â„32âđ=1
(1 âmax(đč đ1, đč đ0)) â„ 0.219413,
which is low, but not insignificant. However, the somewhatlow probability is caused by a property of compare: if guessand secret share a very long prefix, then the running-timebehavior on different values of đđąđđ đ becomes indistinguish-able. However, if instead we bound the success rate for allbut the last six bits, we obtain:â[Success for all but the last six bits] â„ 0.830561,
which is a much higher probability! The attack programcan enumerate all the possibilities to resolve the last sixbits. (Moreover, by brute-forcing the last six bits, a total of10,000Ă 26+ 64 = 260,064 calls to check would be performed,rather than 320,000.)Overall, our analysis concludes that the check and com-
pare procedures in Fig. 16 are vulnerable to a timing attack.
33