checkdp: an automated and integrated approach for proving
TRANSCRIPT
CheckDP: An Automated and Integrated Approach for ProvingDifferential Privacy or Finding Precise Counterexamples
Yuxin Wang, Zeyu Ding, Daniel Kifer, Danfeng ZhangThe Pennsylvania State University
{yxwang,zyding}@psu.edu,{dkifer,zhang}@cse.psu.edu
ABSTRACTWe propose CheckDP, an automated and integrated approach forproving or disproving claims that a mechanism is differentiallyprivate. CheckDP can find counterexamples for mechanisms withsubtle bugs for which prior counterexample generators have failed.Furthermore, it was able to automatically generate proofs for correctmechanisms for which no formal verification was reported before.CheckDP is built on static program analysis, allowing it to be moreefficient and precise in catching infrequent events than samplingbased counterexample generators (which run mechanisms hun-dreds of thousands of times to estimate their output distribution).Moreover, its sound approach also allows automatic verification ofcorrect mechanisms. When evaluated on standard benchmarks andnewer privacy mechanisms, CheckDP generates proofs (for cor-rect mechanisms) and counterexamples (for incorrect mechanisms)within 70 seconds without any false positives or false negatives.
CCS CONCEPTS⢠Security and privacyâ Logic and verification; ⢠Theory ofcomputationâ Program analysis.
KEYWORDSDifferential privacy; formal verification; counterexample detectionACM Reference Format:Yuxin Wang, Zeyu Ding, Daniel Kifer, Danfeng Zhang. 2020. CheckDP: AnAutomated and Integrated Approach for Proving Differential Privacy orFinding Precise Counterexamples. In Proceedings of the 2020 ACM SIGSACConference on Computer and Communications Security (CCS â20), November9â13, 2020, Virtual Event, USA. ACM, New York, NY, USA, 20 pages. https://doi.org/10.1145/3372297.3417282
1 INTRODUCTIONDifferential privacy [27] has been adopted in major data sharinginitiatives by organizations such as Google [15, 29], Apple [48],Microsoft [22], Uber [36] and the U.S. Census Bureau [1, 17, 35, 41].It allows these organizations to collect and share data with provablebounds on the information that is leaked about any individual.
Crucial to any differentially private system is the correctness ofprivacy mechanisms, the underlying privacy primitives in larger
Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected] â20, November 9â13, 2020, Virtual Event, USA© 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-7089-9/20/11. . . $15.00https://doi.org/10.1145/3372297.3417282
privacy-preserving algorithms. Manually developing the necessaryrigorous proofs that a mechanism correctly protects privacy is asubtle and error-prone process. For example, detailed explanationsof significant errors in peer-reviewed papers and systems can befound in [21, 40, 42]. Such mistakes have led to research in the ap-plication of formal verification for proving that mechanisms satisfydifferential privacy [3, 5, 7, 9â11, 51, 52]. However, if a mechanismhas a bug making its privacy claim incorrect, these techniques can-not disprove the privacy claims â a counterexample detector mustbe used instead [14, 23, 34]. Finding a counterexample is typicallya two-phase process that (1) first searches an infinitely large spacefor candidate counterexamples and then (2) uses an exact symbolicprobabilistic solver like PSI [33] to verify that the counterexam-ple is indeed valid. The search phase currently presents the mostproblems (i.e., large runtimes or failure to find counterexamples aremost often attributed to the search phase). Earlier search techniqueswere based on sampling (running a mechanism hundreds of thou-sands of times), which made them slow and inherently imprecise:even with enormous amounts of samples, they can still fail if aprivacy-violating section of code is not executed frequently enoughor if the actual privacy cost is slightly higher than the privacy claim.Recently, static program analyses were proposed to accomplishboth goals [4, 30]. However, they either only analyze a non-trivialbut restricted class of programs [4], or rely on heuristic strategieswhose effectiveness on many sutble mechanisms is unclear [30].
In this paper, we present CheckDP, an automated and integratedtool for proving or disproving the correctness of a mechanism thatclaims to be differentially private. Significantly, CheckDP automati-cally finds counterexamples via static analysis, making it unneces-sary to run the mechanism. Like prior work [14], CheckDP still usesPSI [33] at the end. However, replacing sampling-based search withstatic analysis enables CheckDP to find violations in a few seconds,while previous sampling-based methods [14, 23] may fail even afterrunning for hours. Furthermore, sampling-based methods may stillrequire manual setting of some program inputs (e.g., DP-Finder [14]requires additional arguments to be set manually for Sparse VectorTechnique in our evaluation) while CheckDP is fully automated.Furthermore, the integrated approach of CheckDP allows it to effi-ciently analyze a larger class of differentially privacy mechanisms,compared with concurrent work using static analyses [4, 30].
Meanwhile, CheckDP still offers state-of-the-art verification ca-pability compared with existing language-based verifiers and isfurther able to automatically generate proofs for 3 mechanismsfor which no formal verification was reported before. CheckDPtakes the source code of a mechanism along with its claimed levelof privacy and either generates a proof of correctness or a verifi-able counterexample (a pair of related inputs and a feasible output).
CheckDP is built upon a proof technique called randomness align-ment [24, 51, 52], which recasts the task of proving differentialprivacy into one of finding alignments between random variablesused by two related runs of the mechanism. CheckDP uses a novelverify-invalidate loop that alternatively improves tentative proofs(in the form of alignments), which are then used to improve ten-tative counterexamples (and vice versa) until either the tentativeproof has no counterexample, or the tentative counterexample hasno alignment. We evaluated CheckDP on correct/incorrect versionsof existing benchmarks and newly proposed mechanisms. It gener-ated a proof for each correct mechanism within 70 seconds and acounterexample for each incorrect mechanism within 15 seconds.
In summary, this paper makes the following contributions:(1) CheckDP, one of the first automated tools (with concurrent
work [4, 30]) that generates both proofs for correct mechanismsand counterexamples for incorrect mechanisms (Section 2.4).
(2) A syntax-directed translation from the probabilistic mechanismbeing checked to non-probabilistic target code with explicitproof obligations (Section 3).
(3) An alignment template generation algorithm (Section 3.4).(4) A novel verify-invalidate loop that incrementally improves
tentative proofs and counterexamples (Section 4).(5) Case studies and experimental comparisons between CheckDP
and existing tools using correct/incorrect versions of existingbenchmarks and newly proposed mechanisms. For incorrectmechanisms, CheckDP automatically found counterexamples inall cases, even in cases where competing methods [14, 23] failed.For correct mechanisms, CheckDP automatically generatedproofs of privacy, including proofs for 3 mechanisms for whichno formal verification was reported before (Section 5).
2 PRELIMINARIES AND RUNNING EXAMPLE2.1 Differential PrivacyAmong several popular variants of differential privacy [16, 26, 27,43], we focus on pure differential privacy [27]. The goal of differ-ential privacy is to hide the effect of any personâs record on theoutput of an algorithm. This is achieved by considering all pairs ofdatasets ð· and ð· â² that differ on one record. We call such datasetsadjacent and denote it by ð· ⌠ð· â². To offer privacy, a differentiallyprivate algorithm injects carefully calibrated random noise dur-ing its computation. Given a pair of datasets (ð·,ð· â²), we call theexecution of an algorithm on ð· the original execution and the ex-ecution on (neighboring) ð· â² the related execution. Intuitively, wesay a randomized algorithm is differentially private if the outputdistribution of the original execution and its related execution arehard to distinguish for all such dataset pairs:
Definition 1 (Pure Differential Privacy [25]). Let ð ⥠0. Aprobabilistic computation ð : D â O is ð-differentially private iffor every pair of neighboring datasets ð· ⌠ð· â² â D and every outputð â O, P[ð (ð·) = ð] †ðð P[ð (ð· â²) = ð] .
Often, a differentially private algorithmð interacts with a datasetð· through a list of queries ð1, ð2, . . .: it iteratively runs a query ððon ð· to get an exact answer ðð , then performs some randomizedcomputation on the set of query answers {ð ð | ð †ð}. We call thevector (ð1, ð2, . . .) along with other data-independent parameters to
ð (e.g., privacy parameter ð) an input toð . The notion of adjacentdatasets translates into the notion of sensitivity on those queries:
Definition 2 (Global Sensitivity [28]). The global sensitivityof a query ð is Îð = supð·âŒð·â²
ᅵᅵð (ð·) â ð (ð· â²)ᅵᅵ.
We say two inputs ððð = {(ð1, ð2, . . .), params} and ððð â² =
{(ðâ²1, ðâ²2, . . .), params} are adjacentwith respect to the queries ð1, ð2, . . .,
and write ððð ⌠ððð â², if the params are the same and there existtwo adjacent datasets ð· and ð· â² such that (ð1 (ð·), ð2 (ð·), . . . ) =
(ð1, ð2, . . .) and (ð1 (ð· â²), ð2 (ð· â²), . . . ) = (ðâ²1, ðâ²2, . . .). Note that this
implies thatᅵᅵᅵðð â ðâ²ð ᅵᅵᅵ †Îðð ,âð . It follows that differential privacy
can be proved by showing that for all pair of inputs ððð ⌠ððð â² andall outputs ð â O, P[ð (ððð) = ð] †ðð P[ð (ððð â²) = ð]. As stan-dard, we assume that the sensitivity of inputs are either manuallyspecified or computed by sensitivity analysis tools (e.g., [32, 44]).
Many mechanisms are built on top of the Laplace Mechanism[27] which adds Laplace noise to query answers:
Theorem 1 (Laplace Mechanism [27]). Let ð > 0, let ð· bea dataset, let ð be a query with sensitivity Îð and let ð = ð (ð·).The Laplace Mechanism which, on input ð, outputs ð + [ (where[ is sampled from the Laplace distribution with mean 0 and scaleparameter Îð /ð) satisfies ð-differential privacy.
We sometimes abuse notation and refer to the sensitivity Îð of anumerical value ð â we always take this to mean as the sensitivityof the function that produced ð.
2.2 Randomness AlignmentRandomness alignment is a simple yet powerful proof techniquethat underpins the verification tools LightDP [52] and its successorShadowDP [51]. Precise reasoning using this proof technique wasused to improve a variety of algorithms, allowing them to releasestrictly more information at the same privacy cost [24]. Given twoexecutions of a randomized algorithmð onð· andð· â² respectively, arandomness alignment is a mapping between the random variablesin the first execution to random variables in the second executionthat will cause the second execution to always produce the sameoutput as the first. Upper bounds on privacy parameters depend onhow much the random variables change under this mapping [52].
We use the Laplace Mechanism [28] to illustrate the key ideasbehind randomness alignment. Let ð· ⌠ð· â² be a pair of neighboringdatasets and let ð be a query with sensitivity Îð . Let ð = ð (ð·)and ðâ² = ð (ð· â²) be the respective query answers. If we use theLaplace Mechanism to answer these queries with privacy, on inputð (resp. ðâ²) it will output ð + [ (resp. ðâ² + [ â²) where [ (resp. [ â²) is aLaplace random variable with scale Îð /ð . In order for the LaplaceMechanism to produce the same output in both executions, weneed ð + [ = ðâ² + [ â² and therefore [ â² = [ + ð â ðâ². This createsa âmappingâ between the values of random noises: if we changethe input from ð to ðâ², we need to adjust the random noise by anamount of ð â ðâ² (i.e., this is the distance we need to move [ â² to getto [). Clearly
ᅵᅵð â ðâ²ï¿œï¿œ †Îð by definition of sensitivity. The privacyproof follows from the fact that if two random samples [ and [ â²
(from the Laplace distribution with scale Îð /ð) are at most distanceÎð apart, the ratio of their probabilities is at most ðð . Hence, theprivacy cost, the natural log of this ratio, is bounded by ð .
Thus randomness alignment can be viewed in terms of distancesthat we need tomove random variables. Letð ⌠ðâ² be query answersfrom neighboring datasets andð be a randomized algorithm whichuses a set of random noisesð» =
{[}. We associate to every random
variable [ a numeric value [ which tracks precisely the amount invalue we need to change [ in order to obtain the same output whenthe input toð is changed from ð to ðâ². In other words, the outputof ð with input ð and random values
{[}is the same as that of
ð with input ðâ² and random values{[ + [
}. Taking ð to be the
Laplace Mechanism, then the alignment in the previous paragraphis{[ = ð â ðâ²
}. Note that the alignment is a function that depends
onð as well as ð and ðâ².If all of the random variables are Laplace, the cost of an alignment
is the summation of distancenoise scale for each random variable. To find
the overall privacy cost (e.g., the ð in differential privacy), we thenfind an upper bound on the alignment cost for all related ð and ðâ².
2.3 Privacy Proof and CounterexampleNot all randomness alignments serve as proofs of differential pri-vacy. To form a proof, one must show that (1) the alignment forcesthe two related executions to produce the same output, (2) the pri-vacy cost of an alignment must be bounded by the promised level ofprivacy, and (3) the alignment is injective. Hence, in this paper, an(alignment-based) privacy proof refers to a randomness alignmentthat satisfies these requirements.
On the other hand, to show that an algorithm violates differentialprivacy, it suffices to demonstrate the existence of a counterexample.Formally, if an algorithmð claims to satisfy ð-differential privacy,a counterexample to this claim is a triple (ððð, ððð â², ð) such thatððð ⌠ððð â² and P[ð (ððð) = ð] > ðð P[ð (ððð â²) = ð].
Challenges. LightDP [52] and ShadowDP [51] can check if amanually generated alignment is an alignment-based privacy proof.On the other hand, an exact symbolic probabilistic solver, such asPSI [33], can check if a counterexample, either generated manuallyor via a sampling-based generator, witnesses violation of differ-ential privacy. To the best of our knowledge, CheckDP is the firsttool that automatically generates alignment-based proofs/counterex-amples via static program analysis.1 To do so, a key challenge isto tackle the infinite search space of proofs (i.e., alignments) andcounterexamples. CheckDP uses a novel proof template generationalgorithm to reduce the search space of candidate alignments (Sec-tion 3) and uses a novel verify-invalidate loop (Section 4) to findtentative proofs, counterexamples showing their privacy cost is toohigh, improved proofs, improved counterexamples, etc.
2.4 Running ExamplesTo illustrate our approach, we now discuss two variants of theSparse Vector Technique [28], one correct and one incorrect. Usingthe two variants, we sketch how CheckDP automatically proves/dis-proves (as appropriate) their claimed privacy properties.
Sparse Vector Technique (SVT) [28]. Apowerfulmechanism provento satisfy differential privacy. It can be used as a building block for
1Prior work [3] automatically generates coupling proofs, an alternative language-basedproof technique for differential privacy. But all existing verifiers using alignment-basedproofs[51, 52] require manually provided alignments.
many advanced differentially private algorithms. This mechanismis designed to solve the following problem: given a series of queriesand a preset public threshold, we want to identify the firstð querieswhose answers are above the threshold, but in a privacy-preservingmanner. To achieve this, it adds independent Laplace noise both tothe threshold and each query answer, then it returns the identitiesof the first ð queries whose noisy answers are above the noisythreshold. The standard implementation of SVT outputs true forthe above-threshold queries and false for the others (and termi-nates when there are a total of ð outputs equal to true). We usetwo variants of SVT for an overview of CheckDP.
GapSVT. This is an improved (and correct) variant of SVT whichprovides numerical information about some queries. When a noisyquery exceeds the noisy threshold, it outputs the difference betweenthese noisy values; otherwise it returns false. This provides anestimate for how much higher a query is compared to the threshold.The algorithm was first proposed and verified in [51]; its pseudocode is shown in Figure 1. Here, Lap (2/ð) draws one sample fromLaplace distribution with mean 0 and scale factor of 2/ð . This ran-dom value is then added to the public threshold ð (stored as noisythresholdð[ ). For each query answer, another independent Laplacenoise [2 = Lap (4ð /ð) is added. If the noisy query answer q[i] +[2 is above the noisy threshold ð[ , the gap between them (q[i] +[2 âð[ ) is added to the output list out, otherwise 0 is added.
One key observation from the manual proofs of SVT and itsvariants [21, 24, 28, 40] is that the privacy cost is only paid forthe queries whose noisy answers are above the noisy threshold. Inother words, outputting false does not incur any new privacy cost.Correspondingly, the correct alignment for GapSVT [24, 51] (that is,the distance that [1 and [2 need to be moved to ensure the outputis the same when the input changes from ð [ð] to ðâ²[ð] â¡ ð [ð] + ᅵᅵ [ð],for all ð) is: [1 : 1 and [2 : q[i] + [2 ⥠ð[ ? (1 - q[i]) : 0.
Note that [2 is aligned with non-zero distance only under thetrue branch; hence, no privacy cost is paid in the other branch. Itis easy to verify that if every query has sensitivity 1, the cost of thisalignment is bounded by ð .
BadGapSVT. We also consider a variant of SVT (and GapSVT)that incorrectly tries to release numerical information. When anoisy query answer is larger than the noisy threshold, the variantreleases that noisy query answer (that is, it does not subtract from itthe noisy threshold); otherwise it outputs false. This is an incorrectvariant of SVT [45] that was reported in [40] and was called iSVT4in [23]. More precisely, BadGapSVT replaces line 7 of GapSVTwith out := (q[i] + [2)::out;. This small change makes itnot ð-differentially private [40]. The reason why is subtle, but theintuition is the following. Suppose BadGapSVT returns a noisyquery answer q[i] + [2 = 3, the attacker is able to deduce thatð[ †3. Once this information is leaked, outputting false in theelse branch is no longer âfreeâ; every output incurs a privacy cost.
2.5 Approach OverviewWe use GapSVT and BadGapSVT to illustrate how CheckDP gener-ates proofs and counterexamples.
Code Transformation (Section 3). CheckDP first takes the proba-bilistic algorithm being checked, written in the CheckDP language
function GapSVT (T,N,size : num0 ,q : list numâ )
returns (out : list num0 ), check(ð)precondition â i. â1 †q[i] †11 [1 := Lap (2/ð)2 ð[ := ð + [1;3 count := 0; i := 0;4 while (count < N ⧠i < size)5 [2 := Lap (4ð /ð)6 if (q[i] + [2 ⥠ð[ ) then7 out := (q[i] + [2 - ð[ )::out;8 count := count + 1;9 else
10 out := false::out;11 i := i + 1;
function Transformed GapSVT (T,N,size,q, q, ð ððððð , \ )returns (out)12 vð := 0; idx = 0;
13 [1 := ð ððððð[idx]; idx := idx + 1;
14 vð := vð + |A1 | Ã ð/2; [1 := A1;
15 ð[ := ð + [1;
16 ð[ := [1;
17 count := 0; i := 0;18 while (count < N ⧠i < size)19 [2 := ð ððððð[idx]; idx := idx + 1;
20 vð := vð + |A2 | Ã ð/4ð; [2 := A2;
21 if (q[i] + [2 ⥠ð[ ) then
22 assert(q[i] + [2 + q[i] + [2 ⥠ð[ + ð[);
23 assert(q[i] + [2 - ð[ == 0);
24 out := (q[i] + [2 - ð[ )::out;25 count := count + 1;26 else
27 assert(¬(q[i] + [2 + q[i] + [2 ⥠ð[ + ð[));
28 out := false::out;29 i := i + 1;30 assert(vð †ð);
Figure 1: GapSVT and its transformed code, where under-lined parts are added by CheckDP. The transformed codecontains two alignment templates for [1 and [2: A1 = \ [0]and A2 = (ð [ð] + [2 ⥠ð[ ) ? (\ [1] + \ [2] à ð[ + \ [3] à ᅵᅵ [ð]) :(\ [4] +\ [5] Ãð[ +\ [6] à ᅵᅵ [ð]). The random variables and \ areinserted as part of the function input.
(Section 3.1), and generates the non-probabilistic target code withassertions and alignment templates (i.e. templates for possible align-ments). The bottom of Figure 1 shows the transformed code ofGapSVT with alignment templates. The transformed code is dis-tinguished from the source code in a few important ways: (1) Theprobabilistic sampling commands (at lines 1 and 5) are replacedby non-probabilistic counterparts that read samples from the in-strumented function input ð ððððð . (2) An alignment template (e.g.,A1, A2) is generated for each sampling command; each templatecontains a few holes, i.e., \ , which is also instrumented as func-tion input. (3) A distinguished variable vð is added to track theoverall privacy cost and lines 14 and 20 update the cost variable ina sound way. (4) Assertions are inserted in the transformed code
(lines 22,23,27,30) to ensure the following soundness property:ifð (ððð) is transformed toð â²(ððð, ððð, ð ððððð, \ ), then
â\ . âððð, ððð, ð ððððð. all assertions inð â² pass
=â ð is differentially private
We note that the transformed code forms the basis for both proofand counterexample generation in CheckDP.
Proof/Counterexample Generation (Section 4). Inspired by theCounterexample Guided Inductive Synthesis (CEGIS) [46] tech-nique, originally proposed for program synthesis, CheckDP uses averify-invalidate loop to simultaneously generate proofs and coun-terexamples. Unlike CEGIS, however, the verify-invalidate loopis bidirectional, in the sense that it internally records all previouscounterexamples (resp. proofs) to generate one proof (resp. coun-terexample) as the algorithm output. On the other hand, the CEGISloop is unidirectional: it only collects and uses a set of inputs toguide synthesis internally. At a high level, the verify-invalidateloop of CheckDP includes two integrated sub-loops, one for proofgeneration and the other for counterexample generation.
Verify Sub-loop. Its goal is to generate a proof (i.e., an instantia-tion of \ ) such that âððð, ððð, ð ððððð. all assertions inð â² pass
This is done by two iterative phases:(1) Generating invalidating inputs: Given a proof candidate (i.e.,
an instantiation of \ ), it is incorrect ifâððð, ððð, ð ððððð. some assertion inð â² fails
We use ðŒ to denote a triple of ððð, ððð, ð ððððð . Hence, given anyinstantiation of \ , we use an off-the-shelf symbolic executiontool such as KLEE [18] to find invalidating inputs when possible.
(2) Generating proof candidates: with a set of invalidating inputsfound so far ðŒ1, · · · , ðŒð , we can try to generate a new proofcandidate to satisfy â\ . ð â²(ðŒ1, \ ) ⧠· · · â§ð â²(ðŒð , \ ).Starting from a default instantiation (e.g., one that sets âð . \ [ð] =
0), CheckDP iteratively repeats Phases 1 and 2. Since CheckDP usesall invalidating inputs found so far in Phase 2, the proof candidateafter each iteration is improving.When Phase 1 gets stuck, CheckDPobtains a proof candidate \ which is a privacy proof if
âððð, ððð, ð ððððð. ð â²(ððð, ððð, ð ððððð, \ )due to the soundness property above. Hence, a proof (alignment) canbe validated by program verification tools such as CPAChecker [13].For GapSVT, CheckDP generates and verifies (via CPAChecker) that\ = {1, 1, 0,â1, 0, 0, 0} results in a proof that GapSVT satisfies ð-differential privacy.
Invalidate Sub-loop. While the verify sub-loop is conceptuallysimilar to a CEGIS loop [46], CheckDP also employs an invalidatesub-loop (integrated with the verify sub-loop); its goal is to generateone invalidating input ðŒ such that â\ . some assertion inð â² fail. Thisis done by two iterative phases:2
(1) Generating proof candidates: Given an invalidating input ðŒ ,it is incorrect if â\ . ð â²(ðŒ , \ ). Hence, given any ðŒ , we can useKLEE [18] to find an alignment when possible.
2Note that a set of invalidating inputs ðŒ1, · · · , ðŒð , generated from Phase 2 of the verifysub-loop is not a counterexample candidate, since by definition, a differential privacycounterexample consists of only one invalidating input.
Reals ð â RBooleans ð â {true, false}Vars ð¥ â VRand Vars [ â HLinear Ops â ::= + | âOther Ops â ::= à | /Comparators â ::= < |> |=|â€|â¥Bool Exprs b ::= true | false | ð¥ | ¬b | n1 â n2Num Exprs n ::= ð | ð¥ | [ | n1 â n2 | n1 â n2 | b ? n1 : n2Expressions ð ::= n | b | ð1 :: ð2 | ð1 [ð2]Commands ð ::= skip | ð¥ := ð | [ := ð | ð1; ð2 |
if ð then (ð1) else (ð2) |while ð do (ð) | return ð
Rand Exps ð ::= Lap ðTypes ð ::= numd | bool | list ðDistances d ::= 0 | â
Figure 2: CheckDP: language syntax.
(2) Generating counterexamples: with a set of previously foundalignments \1, · · · , \ð , we try to find a new invalidating inputto satisfy âðŒ .¬ð â²(ðŒ , \1) ⧠· · · ⧠¬ð â²(ðŒ , \ð ).
To integrate with the verify sub-loop, Phase 1 of the invalidatesub-loop starts when Phase 2 of the verify sub-loop gets stuck witha set of invalidating inputs ðŒ1, · · · , ðŒð ; it uses ðŒð to proceed since it isthe most promising one. When Phase 1 of invalidate sub-loop getsstuck, CheckDP obtains a counterexample candidate, which can bevalidated by PSI [33] (this is necessary since a mechanism mightbe differentially private even if no alignment-based proof exists).
For example, the counterexample found for BadGapSVT setsthe threshold ð = 0, ð = 1 (max number of outputs equal totrue before termination), neighboring inputs ð = [0, 0, 0, 0, 0] andðâ² = [1, 1, 1, 1,â1], and the following output to examine [0, 0, 0, 0, 1].PSI confirms that the probability of this output when ð is an inputis ⥠ðð times the probability of this output when ðâ² is the input.
When Phase 1 of the invalidate sub-loop generates a new align-ment \ , which happens in our empirical study (Section 5), Phase2 follows to generate an âimprovedâ invalidating input, which isthen used to start Phase 2 of the validate sub-loop.
3 PROGRAM TRANSFORMATIONCheckDP takes a probablistic program along with an adjacencyspecification (i.e., how much two adjacent inputs can differ) andthe claimed level of differential privacy as inputs. It translates thesource code into a non-probabilistic program with assertions toensure differential privacy. The transformed code forms the basisof finding a proof or a counterexample (Section 4).
3.1 SyntaxThe syntax of CheckDP source code is listed in Figure 2. Most ofthe syntax is standard with the following features:⢠Real numbers, booleans and their standard operations;⢠Ternary expressions b ? n1 : n2, it returns n1 when b evaluatesto true or n2 otherwise;⢠List operations: ð1 :: ð2 appends element ð1 to list ð2, and ð1 [ð2]gets the ðth2 element of list ð1;⢠Loop with keyword while and branch with keyword if;⢠A final return command return ð .
We now introduce other interesting parts that are needed fordeveloping differentially private algorithms.
Random Expressions. Differential privacy relies heavily on prob-abilistic computations: many mechanisms achieve differential pri-vacy by adding appropriate random noise to variables. Tomodel thisbehavior, we embed a sampling command [ := Lap ð in CheckDP,which draws a sample from the Laplace distribution with mean 0and scale of ð . In this paper, we only focus on the most interestingsampling command Lap ð (which is used in Laplace Mechanismand GapSVT in Section 2). However, we note that it is fairly easyto add new sampling distributions to CheckDP.
For clarity, we distinguish variables holding random values, de-noted by [ â H, from other ones, denoted by ð¥ â V.
Types with Distances. To enable alignment-based proof, one im-portant aspect of the type system in CheckDP is the ability tocompute and track the distances for each program variable. Moti-vated by verification tools using alignments (e.g., LightDP [52] andShadowDP [51]), types in the source language of CheckDP havethe form of B0 or Bâ, where B is the base type such as numerics(num), booleans (bool) and lists (list ð ). The subscript of each typeis the key to alignment-based proofs: it explicitly tracks the exactdifference between the value of a variable in two related runs.
In the source language of CheckDP, the distances can eitherbe 0 or â: the former indicates the variables stay the same in therelated runs; the latter means that the variable might hold differentvalues in two related runs and the value difference is stored in adistinguished variable ᅵᅵ added by the program transformation (i.e.,a syntactic sugar for dependent sum type
â(ᅵᅵ : num0) Bᅵᅵ ). For ex-
ample, inputs T,N,size are annotated with distance 0 in Figure 1,meaning that they are public parameters to the algorithm; query an-swers ð are annotated with distance â, meaning that each ð [ð] differby exactly ᅵᅵ [ð] in two related runs. The type system distinguisheszero-distance variables as an optimization: as we show shortly, ithelps to reduce the code size for later stages (Section 3.3) as well asaids proof template generation (Section 3.4).
Note that boolean types (bool) and list types (list ð) cannotbe associated with numeric distances, hence omitted in the syntax.However, nested cases such as list numâ still accurately track thedistances of the elements inside the list.
The semantics of CheckDP follows the standard definitions ofprobabilistic programs [37]; the formal semantics can be found inthe full version of this paper [50]. Finally, CheckDP also supportsshadow execution, a technique that underpins ShadowDP [51] andis crucial to the verification of challenging mechanisms such asReport Noisy Max [25]. However, in order to focus on the mostinteresting parts of CheckDP, we first present the transformationwithout shadow execution, and later discuss how to support it.
3.2 Program TransformationCheckDP is equipped with a flow-sensitive type system whosetyping rules are shown in Figure 3. At command level, each rule hasthe following format: ⢠Π{ð â ð â²} Îâ² where a typing environmentÎ tracks for each program variable its typewith distance, ð and ð â² arethe source and target programs respectively, and the flow-sensitivetype system also updates typing environment to Îâ² after command
Transformation rules for expressions with form Π⢠ð : Bn
Π⢠ð : num0 | true(T-Num)
Π⢠ð : bool | true (T-Boolean)Î, ð¥ : B0 ⢠ð¥ : B0 | true
(T-VarZero)
Î, ð¥ : Bâ ⢠ð¥ : Bð¥ | true(T-VarStar)
Π⢠ð : bool | CΠ⢠¬ð : bool | C (T-Neg)
Π⢠ð1 : Bn1 | C1 Π⢠ð2 : Bn2 | C2Π⢠ð1 â ð2 : Bn1ân2 | C1 ⧠C2
(T-OPlus)
Π⢠ð1 : numn1 | C1 Π⢠ð2 : numn2 | C2Π⢠ð1 â ð2 : num0 | C1 ⧠C2 ⧠(n1 = n2 = 0) (T-OTimes)
Π⢠ð1 : numn1 | C1 Π⢠ð1 : numn2 | C2
Π⢠ð1 â ð2 : bool | C1 ⧠C2 â§(ð1 â ð2) â
(ð1 + n1) â (ð2 + n2)(T-ODot)
Π⢠ð1 : Bn1 | C1 Π⢠ð2 : list Bn2 | C2Π⢠ð1 :: ð2 : list Bn | C1 ⧠C2 ⧠(n1 = n2 = 0) (T-Cons)
Π⢠ð1 : list ð | C1 Π⢠ð2 : numn | C2Π⢠ð1 [ð2 ] : ð | C1 ⧠C2 ⧠(n = 0) (T-Index)
Π⢠ð1 : bool | C1 Π⢠ð2 : Bn1 | C2 Π⢠ð3 : Bn2 | C3Π⢠ð1 ? ð2 : ð3 : Bn1 | C1 ⧠C2 ⧠C3 ⧠(n1 = n2)
(T-Select)
Transformation rules for commands with form ⢠Π{ð â ðâ² } Îâ²
Π⢠ð : Bn | C âšd, ð â© ={âš0, skipâ©, if n == 0,âšâ, ᅵᅵ := nâ©, otherwise
⢠Π{ð¥ := ð ;â assert(C) ;ð¥ := ð ;ð } Î [ð¥ âŠâ Bd ](T-Asgn)
⢠Π{ð1 â ðâ²1 } Î1 ⢠Î1 {ð2 â ðâ²2 } Î2⢠Π{ð1;ð2 â ðâ²1;ð
â²2 } Î2
(T-Seq)
Π⢠ð : Bn | C⢠Π{return ð â assert(C ⧠n = 0) ; return ð } Î (T-Return) ⢠Π{skip â skip} Î (T-Skip)
Π⢠ð : bool | C ⢠Πâ Îð {ð â ðâ² } Îð Î, Î â Îð â ðð Îð , Î â Îð â ðâ²â²
⢠Π{while ð do ð â ðð ; (while ð do (assert(C) ;ðâ²;ðâ²â²)) } Î â Îð(T-While)
Π⢠ð : bool | C ⢠Π{ðð â ðâ²ð } Îð Îð , Î1 â Î2 â ðâ²â²ð ð â {1, 2}⢠Π{if ð then ð1 else ð2 â if ð then (assert(C) ;ðâ²1;ðâ²â²1 ) else (assert(C) ;ðâ²2;ðâ²â²2 ) } Î1 â Î2
(T-If)
A = GenerateTemplate(Î,All Assertions) ðð = assert( ( ([ + A) {[1/[ } = ([ + A) {[2/[ } â [1 = [2))
⢠Π{ðð ;[ := Lap ð â [ := ð ððððð [ððð¥ ]; ððð¥ := ððð¥ + 1; vð := vð + | A |/ð ; [ := A ; } Î [[ âŠâ numâ ](T-Laplace)
Transformation rules for merging environments
Î1 â Î2 ð = {ᅵᅵ := 0 | Î1 (ð¥) = num0 ⧠Î2 (ð¥) = numâ }Î1, Î2 â ð
Figure 3: Program transformation rules. Distinguished variable vð and assertions are added to ensure differential privacy.
ð . At a high-level, the type system transforms the probabilisticsource code ð into the non-probabilistic target code ð â² in a way thatif all assertions in ð â² holds, then ð is differentially private.
CheckDPâs program transformation is motivated by those ofLightDP and ShadowDP [51, 52], all built on randomness alignmentproof. However, there are a few important differences:⢠CheckDP generates an alignment template for each sampling in-struction, rather than requiring manually provided alignments.⢠CheckDP defers all privacy-related checks to assertions. This iscrucial since information needed for proof and counterexamplegeneration is unavailable in a lightweight static type system.⢠CheckDP only tracks if a variable has the same value in tworelated runs (with distance 0) or not (with distance â). Thisdesign aids alignment template generation and reduces the sizeof transformed code.
Checking Expressions. Each typing rule for expression ð computesthe correct distance for its resulting value: Π⢠ð : Bn | C, whichreads as: expression ð has type B and distance n under the typingenvironment Î if the constraints C are satisfied. The reason to
collect constraints ð¶ instead of statically checking them, is to deferall privacy-related checks to later stages.
Most of the expression rules are straightforward: they check thebase types (just like a traditional type system) and compute thedistance of ðâs value in two related runs. For example, all constantsmust be identical (Rules (T-Num,T-Boolean)) and the distance of avariable is retrieved from the environment (T-VarZero,T-VarStar)(note that rule (T-VarStar) just desugers the â notation). For linearoperation (â), the distance of the result is computed in a preciseway (Rule (T-OPlus)), while the other operations are treated in amore conservative way: constraints are generated to ensure thatthe result is identical in Rules (T-OTimes, T-ODot). For example,(T-ODot) ensures boolean value of ð1 â ð2 will be the same in tworelated runs by adding a constraint
(ð1 â ð2) â (ð1 + n1) â (ð2 + n2)(T-Cons) restricts constructed list elements to have 0-distance (notethat the restriction does not apply to input lists), while (T-Index)requires the index to have zero-distance. Rule (T-Select) restrictsð1 and ð2 to have the same distance. The constraints gathered in the
expression rules will later be explicitly instrumented as assertionsin the translated programs, which we will explain shortly.
3.3 Checking CommandsFor each program statement, the type system updates the typing en-vironment and if necessary, instruments code to update ᅵᅵ variablesto the correct distances. Moreover, it ensures that the two relatedruns take the same branch in if-statement and while-statement.
Flow-Sensitivity. Each typing rule updates the typing environ-ment to track if a variable has zero-distance. When a variable hasnon-zero distance, it instruments the source code to properly main-tain the corresponding ᅵᅵ variables. The most interesting rules are:rule (T-Asgn) properly promotes the type of ð¥ to be Bâ (tracked bydistance variables) in Îâ² if the distance of ð is not 0. Meanwhile itoptimizes away updates to x and properly downgrades type to B0if ð has a zero-distance. For example, line 16 in GapSVT (Figure 1)is instrumented to update distance of ð[ , according to the distanceofð +[1. Moreover, variable count in GapSVT always has the typenum0; therefore its distance variable never appears in the translatedprogram due to the optimization in (T-Asgn).
Rule (T-If) and (T-While) are more complicated since they bothneed to merge environments. In rule (T-If), as ð1 and ð2 mightupdate Î to Î1 and Î2 respectively, we need to merge them in anatural way: the distance of a type form a two-level lattice with0 â â. Thus we define a union operator â for distances d as:
d1 â d2 â
{d1 if d1 = d2â otherwise
therefore the union operator for two environments are defined asfollows: Î1 â Î2 = _ð¥. Î1 [ð¥] â Î2 [ð¥].
Moreover, we use an auxiliary function Î1, Î2 â ð to âpromoteâa variable to star type. For example, with Î(ð¥) = â, Î(ðŠ) = â andÎ(ð) = 0, rule (T-If) translates the source codeif ð then ð¥ := ðŠ else ð¥ := 1 to the following:if ð then (ð¥ := ðŠ; ᅵᅵ := ᅵᅵ; ) else (ð¥ := 1; ᅵᅵ := 0)where ᅵᅵ := ᅵᅵ is instrumented by (T-Asgn) and ᅵᅵ := 0 is instru-mented due to the promotion.
Similarly, the typing environments are merged in rule (T-While),except that it requires a fixed point Îð such that ⢠Πâ Îð {ð} Îð .We follow the construction in [51] to compute a fixed point, notingthat the computation always terminates since all of the translationrules are monotonic and the lattice only has two levels.
Assertion Generation. To ensure differential privacy, the typesystem inserts assertion in various rules:⢠To ensure that two related runs take the same control flow,(T-If) and (T-While) asserts constraints gathered from makingsure that the boolean has zero-distance (e.g., from (T-ODot)).As an optimization, we use the branch condition to simplifyconstraints when possible. For example, constraints ð1 â ð2 â(ð1 +n1) â (ð2 +n2) are simplified as (ð1 +n1) â (ð2 +n2) (resp.¬((ð1 + n1) â (ð2 + n2))) in the true (resp. false) branch.⢠To ensure that the final output value is differentially private,rule (T-Return) asserts that its distance is zero (i.e., identicalin two related runs).
⢠To ensure all constraints collected in the expression rules aresatisfied, assignment rules (T-Asgn) and (T-AsgnStar) alsoinsert corresponding assertions.
3.4 Checking Sampling CommandsRule (T-Laplace) performs a few important tasks:
Replacing Sampling Command. Rule (T-Laplace) removes thesampling instruction and assign to [ the next (unknown) samplevalue sample[idx], where sample is a parameter of type list numadded to the transformed code. The typing rule also increments idxso that the next sampling command will read out the next value.
Checking Injectivity. T-Laplace adds an assertion ðð to check theinjectivity of the generated alignment (a fundamental requirementof alignment-based proofs): the same aligned value of [ implies thesame value of [ in the original execution.
Tracking Privacy Cost. A distinguished privacy cost variable vðis also instrumented to track the cost for aligning the random vari-ables in the program. Due to the properties of Laplace distribution,for a sampling command [ := Lap ð with alignment template A,we have P([)/P([ + A) †ð|A |/ð . Hence, the privacy cost foraligning [ by A is |A| /ð . Note that the symbols in gray, includ-ing A, are placeholders when the rule is applied, since functionGenerateTemplate takes all assertions in the transformed code asinputs. Once translation is complete, the placeholders are filled inby the algorithm that we discuss in Section 4.
Alignment Template Generation. For each sampling command[ := Lap ð , an alignment of [ is needed in a randomness alignmentproof. In its most flexible form, the alignment can be written asany numerical expression n, which is prohibitive for our goal ofautomatic proof generation. On the other hand, using simple heuris-tics such as only considering constant alignment does not work:for example, the correct alignment for [2 in GapSVT is written asâ(q[i] + [2 ⥠ð[ ) ? (1 â q[i]) : 0â, where the alignment actuallydepends on which branch is taken during the execution.
To tackle the challenges, CheckDP generates an alignment tem-plate for each sampling instruction; a template is a numerical ex-pression with âholesâ whose values are to be searched for in laterstages. For example, the template generated for [2 in GapSVT is
(q[i] + [2 ⥠ð[ )?(\ [0] + \ [1] Ãð[ + \ [2] à q[i]):
(\ [3] + \ [4] Ãð[ + \ [5] Ã q[i])where \ [0] â \ [5] are symbolic coefficients to be found later.
In general, for each sampling command [ = Lap ð , CheckDPfirst uses static program analysis to find a set of relevant programexpressions, denoted by E, and a set of relevant program variables,denoted by V (as described shortly). Second, it generates an align-ment template as follows:
AE ::=
{ð0 ? AE\{ð0 } :AE\{ð0 } , when E = {ð0, · · · }\0 +
âð£ð âV \ð à ð£ðwith fresh \0, · · · , \ |V | , otherwise
where \ denotes coefficients (âholesâ) to be filled out out by laterstages and each of them is generated fresh.
To find proper E and V, our insight is that the alignments serveto âcancel outâ the differences between two related runs (i.e., to
make all assertions pass). Algorithm 1 follows the insight to com-pute E and V for each sampling instruction: it takes Îð , the typingenvironment right before the sampling instruction and ðŽ, all asser-tions in the transformed code, as inputs. It also assumes an oracleDepends(ð, ð¥) which returns true whenever the expression ð de-pends on the variable ð¥ . We note that the oracle can be implementedas standard program dependency analysis [2, 31] or informationflow analysis [12]; hence, we omit the details in this paper.
Algorithm 1: Template generation for [ := Lap ð
input :Îð : typing environment at sampling commandðŽ: set of the generated assertions in the program
1 function GenerateTemplate(Îð , ðŽ):2 Eâ â , Vâ â 3 foreach assert(ð) â ðŽ do4 if Depends(ð, [) then5 if assert(ð) is generated by (T-If) then6 ð â² â the branch condition of if7 Eâ E ⪠{ð â²}8 foreach ð£ â ðððð ⪠{ð1 [ð2] |ð1 [ð2] â ð} do9 if Îð ⢠ð£ : B0 ⧠Depends(ð, ð£) then10 Vâ V ⪠{ð£}
11 foreach ð â E ⪠V do12 remove ð from E and V if not in scope13 return E,V;
The algorithm first checks (at line 4) if aligning [ has a chance tomake an assertion pass. If so, it will increment E and V as follows.For E, we notice that only for the assertions generated by rule (T-If),depending on the branch condition allows the alignment to havedifferent values under different branches. Hence, we add the branchcondition to E in this case. For V, our goal is to use the alignment toâcancelâ the differences caused by other variables and array elementssuch as ð [ð] used in ð . Hence, we only need to consider ᅵᅵ if (1) ð£is different between two related runs (i.e., Îð ⢠ð£ : B0) and (2) ð£contributes the assertion (i.e., ð depends on ð£).
Finally, the algorithm performs a âscope checkâ: if any element inE orV contains out-of-scope variables, then the element is excluded;for example, [1 should not depend on ð [ð] in GapSVT since ð [ð],essentially an iterator of ð, is not in scope at that point.
Consider [1 and [2 in GapSVT. The assertions in the translatedprograms are (we only list the assertion in the true branch sincethe constraint in false branch is symmetric) :(1) assert(q[i] + [2 + q[i] + [2 ⥠ð[ + ð[)
(2) assert(q[i] + [2 - ð[ = 0)
For [1, we have Îð = {q : â} (we omit the base types and the vari-ables that have 0 distance for brevity) and both assertions dependon [1. Since both assertions depend on [1 and q[i], Algorithm 1adds q[i] into V. Moreover, assertion (1) is generated by rule (T-If).Thus,the algorithm adds q[i] + [2 ⥠ð[ into E. Finally, sinceq[i] is out of scope at the sampling instruction, expression usingq[i] and variable q[i] are excluded, resulting V ={ } and E ={ }.
For [2, we have Îð = {q: *,ð[ : *}. Since both assertions depend on[2 and q[i] and ð , Algorithm 1 adds q[i] and ð[ into V. Similarto [1, the algorithm also adds q[i] + [2 ⥠ð[ into E. Finally, all
expressions and variable are in scope, resulting V ={q[i], ð[ } andE ={q[i] + [2 ⥠ð[ }.
3.5 Function Signature RewriteFinally, CheckDP rewrites the function signature to reflect theextra parameters and holes introduced in the transformed code.In general, ð (ððð) is transformed to a new function signatureð â²(ððð, ððð, ð ððððð, \ ) where ððð are the distance variables asso-ciated with inputs whose distance is not zero (e.g., ᅵᅵ is associatedwith ð in GapSVT), ð ððððð is a list of random values used in ð ,and \ are the missing holes in alignment templates.
3.6 Shadow ExecutionTo tackle challenging mechanisms such as Report Noisy Max [25],CheckDP uses shadow execution [51]. Intuitively, the shadow exe-cution tracks another program execution where the injected noisesare always the same as those in the original execution. Therefore,values computed in the shadow execution incur no privacy cost.The aligned execution can then switch to shadow execution whencertain conditions are met, allowing extra permissiveness [51].
Supporting shadow execution only requires a few modifications:(1) Expressions will have a pair of distances (âšdâŠ, dâ â©), where the
extra distance dâ tracks the distance in the shadow execution;(2) Since the branches and loop conditions in shadow execution
are not aligned, they might diverge from the original execution.Hence, a separate shadow branch/loop is generated to correctlyupdate the shadow distances for the variables.Since the extended transformation rules largely follow the cor-
responding typing rules of ShadowDP, we present the complete setof rules with detailed explanations in the Appendix.
3.7 SoundnessCheckDP enforces a fundamental property: supposeð (ððð) is trans-formed toð â²(ððð, ððð, ð ððððð, \ ), thenð (ððð) is differentially pri-vate if there is a list of values of \ , such that all assertions in ð â²
hold for all ððð, ððð, ð ððððð . Recall that an alignment template Ais a function of \ . Hence, we have a concrete alignment A(\ ) (i.e.,a proof) when such values of \ exist.
We build the soundness of CheckDP based on that of Shad-owDP [51]. The main difference is that ShadowDP requires everysampling command [ := Lap ð to be manually annotated. Thus,we can easily rewrite a programð in CheckDP to a program ᅵᅵ inShadowDP by adding the following annotations:[ := Lap ð â [ := Lap ð ; âŠ; A[ (\ ) (CheckDP to ShadowDP)where A[ is the alignment template for [. We formalize the mainsoundness results next; the full proof can be found in the full versionof this paper [50].
Theorem 2 (Soundness). Let ð be a mechanism written inCheckDP. With a list of concrete values of \ , let ᅵᅵ be the correspond-ing mechanism in ShadowDP by rule (CheckDP to ShadowDP). If(1) ð type checks, i.e., ⢠Π{ð â ð â²} Îâ² and (2) the assertions inð â² hold for all inputs. Then ᅵᅵ type checks in ShadowDP, and theassertions in ᅵᅵ â² (transformed from ᅵᅵ by ShadowDP) pass.
Theorem 3 (Privacy). With exactly the same notation and as-sumption as Theorem 2,ð satisfies ð-differential privacy.
Invalidating InputGeneration
Proof GenerationðŒ!,⯠, ðŒ"
ð"
Counterexample Generation
Verifier
ð"
CheckDP
Restart
PSI
ð!,⯠,ð"
ðŒ!"#
Verify Sub-loop Exit
Unknown
Proof
ð¶
ð! ðŒ"
ð"
ðŒ"
Counterexample
Invalidate Sub-loop
Figure 4: Overviewof the verify-invalidate loop ofCheckDP.
4 PROOF AND COUNTEREXAMPLEGENERATION
Recall that the transformed source code has the form of the fol-lowing:ð â²(ððð, ððð, ð ððððð, \ ). For brevity, Let ðŒ denote a triple of(ððð, ððð, ð ððððð), and ð¶ denote a counterexample in the form ofð¶ = (ððð, ððð â², ð) as defined in Section 2.3. Proof/counterexamplegeneration is divided into two tasks:⢠Proof Generation: find an instantiation of \ such that assertionsinð â² never fail for any input ðŒ , or⢠Counterexample Generation: find an instantiation of ðŒ , suchthat no \ exists to make all assertions in ð â² pass, and thenconstruct a counterexample ð¶ based on ðŒ .The key challenge here is the infinite search space of both \
and ðŒ . Our insight is to use a verify-invalidate loop, as depicted inFigure 4, to improve \ and ðŒ after each iteration. At a high-level,the iterative process involves two sub-loops: the (green) verifysub-loop generates proofs, and the (blue) invalidate sub-loop gener-ates counterexamples. Moreover, the two sub-loops are integrated:starting from a default \0 where \0 [ð] = 0. âð , the procedure gen-erates sequences of proofs and invalidating inputs in the form of\0, ðŒ0, \1, ðŒ1, · · · . The final \ð or ðŒð is used to construct proof orcounterexamples correspondingly.
4.1 Verify Sub-LoopThe verify sub-loop that involves Invalidating Input Generationand Proof Generation components is responsible of generating asequence of improving alignments \0, \1, · · · , \ð such that, if themechanism is correct, \ð is a privacy proof (i.e, âðŒ . ð â²(ðŒ , \ð )).
Invalidating Input Generation. This component takes a proofcandidate \ð and then tries to find an input ðŒð such that ¬ð â²(ðŒð , \ð )(meaning that at least one assertion inð â² fails).
Intuitively, \ð is the currently âbestâ proof candidate (initially,a default null proof \0 = [0, · · · ] is used to bootstrap the process)that is able to validate all previously found inputs (ðŒ0, · · · , ðŒðâ1). Aninput ðŒð , if any, shows that \ð is in fact not a valid proof (recallthat a proof needs to ensure ð â²(ðŒ , \ð ) âðŒ ). Hence, we call such ðŒðan invaliding input of \ð and feed it with all previously identifiedinvalidating inputs to the Proof Generation component followingthe âVerify Sub-loopâ edge.
Take GapSVT (Figure 1) for example. Since the initial null proof\0 = [0, · · · ] does not align any random variable, any input, say ðŒ0,that diverges on the branch ð [ð] + [2 ⥠ð will trigger an assertion
Input Space
ðŒ0 ð1ðŒ$ð2
(a) A case where \ð cannot beimproved.
Input Space
ðŒ0ð1
ðŒ$ âŠ
ð2
(b) Iteratively improving thealignment \ð .
Figure 5: Tentative alignments and invalidating inputs.
violation at Line 22. Hence, the identified invalidating input ðŒ0 isfed to the Proof Generation component.
Proof Generation. This component takes in a series of invalidat-ing inputs ðŒ0, · · · , ðŒð seen so far, and tries to find an proof candidate\ð such that:ð â²(ðŒ0, \ð ) ⧠· · · â§ð â²(ðŒð , \ð ).
Intuitively, the goal is to find a proof candidate \ð that success-fully âcoversâ all invalidating inputs seen so far. Most likely, animproved proof candidate \ð that is able to align randomness formore inputs is generated by the component. Then \ð is fed back tothe Invalidating Input Generation component, closing the loop.
Consider the GapSVT example again. In order to align random-ness for the invalidating input ðŒ0, one possible \1 is to align therandom variable [2 by âᅵᅵ [ð] to cancel out the difference introducedby ð [ð]. Note that this tentative proof \1 does not work for all pos-sible inputs: it only serves as the âbestâ proof given ðŒ0. With theVerify Sub-loop, such imperfect proof candidates enable the gen-eration of more invalidating inputs, such as an invalidating inputðŒ1 where the query answers are mostly below the threshold ð (ðŒ1invalidates \1 since a privacy cost incurs whenever any branch istaken, which eventually exhausts the given privacy budget). There-fore, a more general proof that leverages the conditional expressionð [ð] + [2 ⥠ð ? ⢠: ⢠in the alignment template can be discoveredby Proof Generation. For GapSVT, the Verify sub-loop eventuallyterminates with a correct proof (Section 5).
Exit Edges. The verify loop has two exit edges. First, when noinvalidating input is generated, \ð is likely a valid proof. Hence, \ðis passed to a verifier with the following condition: âðŒ . ð â²(ðŒ , \ð ) .Due to the soundness result (Theorem 3), we have a proof of dif-ferential privacy when the verifier passes (the âExitâ edge fromVerifier component). Otherwise, CheckDP uses the counterexamplereturned by the verifier to construct ðŒð (the âVerify Sub-loopâ edge).We note that the verification step is required since KLEE, the sym-bolic executor that we use to find invalidating inputs, is unsound(i.e., it might miss an invalidating input) in theory; however, we didnot experience any such unsound case of KLEE in our experience.
Second, the Proof Generation component might fail to find analignment for ðŒ0, · · · , ðŒð , a case that will eventually occur for incor-rect mechanisms. This exit edge leads to the invalidate sub-loopthat we discuss next.
4.2 Invalidate Sub-LoopThe invalidate sub-loop involves Counterexample Generation andRestart; it is responsible of generating one single invalidating inputðŒ such that, if the mechanism is incorrect, ðŒ cannot be aligned (i.e,ï¿œ\ . ð â²(ðŒ , \ )). At first glance, it could be attempting to directly useðŒð from the Verify Sub-Loop. However, this is problematic both in
theory and in practice: no alignment for ðŒ0, · · · , ðŒð does not imply noalignment of ðŒð alone. In practice, we found such a naive approachfails for BadSmartSum and BadGapSVT in Section 5.
Counterexample Generation. This component takes an invalidat-ing input ðŒð and then tries to find an alignment\ð such thatð â²(ðŒð , \ð )(meaning that ðŒð is not a counterexample since it can be aligned by\ð ). For example, consider a corner case in Figure 5a, where ProofGeneration fails to find a common proof of both ðŒ0 and ðŒ1, but eachof ðŒ0 and ðŒ1 has a proof (illustrated by the two solid circles aroundthem). Mostly likely, this occurs when the program being analyzedis incorrect (hence, no common proof) but neither ðŒ1 nor ðŒ2 is agood candidate for counterexample of differential privacy, sinceeach of them can be aligned in isolation.
Restart. This component is symmetric to the Invalidating InputGeneration component in the verify sub-loop: it takes all previouslyfound proof candidates \1, · · · , \ð and tries to find an invalidatinginput ðŒð+1 such that: ¬ð â²(ðŒð+1, \1) ⧠· · · ¬ð â²(ðŒð+1, \ð ) .
If found, ðŒð+1 will intuitively be out of scope of all found proofsand serve as a âbetterâ invalidating input. In theory, we can closethe invalidate sub-loop by feeding ðŒð+1 back to CounterexampleGeneration. However, doing sowill make proof and counterexamplegeneration isolated tasks. Instead, we take an integrated approach,which we discuss shortly, where the verify and invalidate sub-loopscommunicate to generate proofs and counterexamples in a moreefficient and simultaneous way.
Exit Edges. If no \ is found to prove ðŒð = (ððð, inp, ð ððððð), acounterexample ð¶ = (ððð, ððð + ððð,ð â²(ððð, ððð, ð ððððð, \0)) canbe formed and sent to an external exact probabilistic solver PSI [33]for validation. In theory, the Restart component might fail to find anew invalidating input given \1, · · · , \ð . However, this âunknownâstate never showed up in our experience.
4.3 Integrating Verify and Invalidate Sub-LoopsWe integrate the verify and invalidate sub-loops as follows: follow-ing the âInvalidate Sub-loopâ edge of the Proof Generation com-ponent, the latest invalidating input ðŒð (i.e., the âbestâ invalidatinginput so far) is passed to the Counterexample Generation compo-nent to start the invalidate sub-loop. Moreover, the newly generatedinvalidating input ðŒð from the Restart component is fed back to theProof Generation component to start the verify sub-loop.
We note that by the design of the verify-invalidate loop, it alter-natively runs Invalidating Input Generation and Proof Generationcomponents. By doing so, the proof keeps improving while the in-validating inputs are getting closer to a true counterexample (sincethe most recent one violates a âbetterâ proof). More intuitively,consider an invalidating input ðŒ0 as a point in the entire input space,illustrated in Figure 5b. A proof candidate \1 is able to prove thealgorithm for a subset of inputs including ðŒ0 (indicated by the circlearound ðŒ0). The Invalidating Input Generation component then triesto find another invalidating ðŒ1 that violates \1 (falls outside of the\1 circle). Next, the Proof Generation component finds better proofcandidate \2 which proves (âcoversâ) both ðŒ0 and ðŒ1.
We also note that it is crucial to consider all invalidating inputsso far rather than the last input ðŒð in the Proof Generation compo-nent: the efficiency of our approach crucially relies on âimprovingâ
the proofs quantified by validating more invalidating inputs. With-out the improving proofs, the iterative procedure might fail toterminate in case shown in Figure 5a: the procedure might repeatðŒ0, \1, ðŒ1, \2, ðŒ0, \1, · · · . This is confirmed in our empirical study.
Unknown State. Due to the soundness result (Theorem 3), theprogram being analyzed is verified whenever CheckDP returns witha proof. Moreover, a validated counterexample by PSI disprovesan incorrect mechanism. However, two reasons might lead to theâunknownâ state in the Figure 4: the generated counterexampleis invalid or the Restart component fails to find a new invalidat-ing input. However, for all the correct and incorrect examples weexplored, the unknown state never showed up.
5 IMPLEMENTATION AND EVALUATIONWe implemented CheckDP in Python3. The Program Transforma-tion phase is implemented as a trans-compiler from CheckDP code(Figure 2) to C code. Following the transformation rules in Fig-ure 3, the trans-compiler tracks the typing environment, gathersthe needed constraints for the expressions, and more importantly,instruments corresponding statements when appropriate. More-over, it adds a final assertion assert(vð †ðð ) before each returncommand, where ðð is the annotated privacy bound to be checked.Once all assertions are generated, the trans-compiler generates onealignment template for each sampling instruction as described inAlgorithm 1. For the Proof and Counterexample Generation phase(i.e., verify-invalidate loop in Section 4), we used an efficient sym-bolic executor KLEE [18] for most tasks. Due to limited support ofunbounded lists in KLEE, we fix the length of lists to be 5 in ourevaluation. Also, to speed up the search, KLEE is configured to exitonce an assertion is hit. We note that the use of KLEE is to discoveralignments and counterexamples, where alignments are eventu-ally verified by our sound Verifier component with arbitrary arraylength; counterexamples are confirmed by PSI. Moreover, CheckDPautomatically extends the array length until either a verified proofor verified counterexample is produced.
Finally, we deploy a verification tool CPAChecker [13] for theVerifier component in CheckDP, which is capable of automaticallyverifying C programs with given configuration (predicateAnalysisis used). Note that CPAChecker is able to generate counterexam-ples for a failed verification. If the verification fails (which did nothappen in our evaluation), CheckDP can feed the counterexampleback to the Proof and Counterexample Generation component.
5.1 Case StudiesAside from GapSVT, we also evaluate CheckDP on the standardbenchmark used in previous mechanism verifiers [3, 51, 52] andcounterexample generators [14, 23],4 including correct ones such asNumSVT, PartialSum, and SmartSum, as well as the incorrect vari-ants of SVT reported in [40] and BadPartialSum. To show the powerof CheckDP and expressiveness of our template generation algo-rithm, we also evaluate on a couple of correct/incorrect mechanismsthat, to the best of our knowledge, have not been proved/disproved
3Publically available at https://github.com/cmla-psu/checkdp.4We note that like all tools designed for privacy mechanisms (e.g., [3, 14, 23, 51, 52]),the benchmark do not include iterative programs that are built on those privacymechanisms, such as k-means clustering, k-medians, since they are out of scope.
Table 1: Detected counterexamples for the incorrect algorithms and comparisons with other sampling-based counterexampledetectors. #ð¡ stands for true and #ð stands for false.
Mechanism q qâ² Extra Args Output Iterations Time(s) StatDP [23] DP-Finder [14] DiPC [4]BadNoisyMax [0, 0, 0, 0, 0] [â1, 1, 1, 1, 1] N/A 0 3 5.7 11.2 2561.5 N/ABadSVT1 [0, 0, 0, 0, 1] [1, 1, 1, 1, 0] ð : 0, ð : 1 [#ð , #ð , #ð , #ð , #ð¡ ] 4 3.2 4.9 3847.5 (Semi-Manual) N/ABadSVT2 [0, 0, 0, 0, 1] [1, 1, 1, 1,â1] ð : 0, ð : 1 [#ð , #ð , #ð , #ð , #ð¡ ] 4 2.0 15.6 4126.1 (Semi-Manual) N/ABadSVT3 [0, 0, 0, 0, 1] [1, 1, 1, 1,â1] ð : 0, ð : 1 [#ð , #ð , #ð , #ð , #ð¡ ] 4 2.1 9.1 3476.2 (Semi-Manual) 269
BadGapSVT [0, 0, 0, 0, 0] [1, 1, 1, 1,â1] ð : 0, ð : 1 [0, 0, 0, 0, 1] 4 5.7 10.6 11611.6 (Semi-Manual) N/ABadAdaptiveSVT [0, 0, 0, 0, 2] [1, 1, 1, 1,â1] ð : 0, ð : 1 [0, 0, 0, 0, 17] 8 14.2 Search Failed Search Failed N/AImprecise SVT [0, 0, 0, 0, 1] [1, 1, 1, 1,â1] ð : 0, ð : 1 [#ð , #ð , #ð , #ð , #ð¡ ] 4 8.6 Search Failed Search Failed N/ABadSmartSum [0, 0, 0, 0, 0] [0, 0, 0, 1, 0] ð : 3,ð : 4 [0, 0, 0, 0, 0] 4 6.3 22.4 (Semi-Manual) Search Failed N/ABadPartialSum [0, 0, 0, 0, 0] [0, 0, 0, 0, 1] N/A 0 3 3.7 3.8 1128.5 N/A
Table 2: Alignments found for the correct algorithms. Ωâ stands for the branch condition in each mechanism, where Ωðð =
ð [ð] + [ > ðð âš ð = 0, Ωððð = ð [ð] + [2 ⥠ð[ , Ωððð = ð [ð] + [2 âð[ ⥠ð , Ωðððððð = ð [ð] + [3 âð[ ⥠0
Mechanism Alignment Iterations Time (s) ShadowDP [51] Coupling [3] DiPC [4][1 [2 [3
ReportNoisyMax Ωðð ? 1 â ᅵᅵ [ð ] : 0 N/A N/A 10 69.3 Manual 22 193PartialSum âð ð¢ð N/A N/A 2 5.6 Manual 14 N/ASmartSum âð ð¢ð â ᅵᅵ [ð ] âᅵᅵ [ð ] N/A 6 6.8 Manual 255 N/A
SVT 1 Ωððð ? 1 â ᅵᅵ [ð ] : 0 N/A 4 6.2 Manual 580 825Monotone SVT (Increase) 0 Ωððð ? 1 â ᅵᅵ [ð ] : 0 N/A 8 18.4 N/A N/A N/AMonotone SVT (Decrease) 0 Ωððð ? âᅵᅵ [ð ] : 0 N/A 8 20.5 N/A N/A N/A
GapSVT 1 Ωððð ? 1 â ᅵᅵ [ð ] : 0 N/A 6 13.5 Manual N/A N/ANumSVT 1 Ωððð ? 2 : 0 âᅵᅵ [ð ] 4 8.8 Manual 5 N/A
AdaptiveSVT 1 Ωððð ? 1 â ᅵᅵ [ð ] : 0 Ωðððððð ? 1 â ᅵᅵ [ð ] : 0 10 25.6 N/A N/A N/A
by existing verifiers and counterexample generators. This set ofmechanisms include: Sparse Vector with monotonic queries [40],AdaptiveSVT (called Adaptive Sparse Vector with Gap in [24]) aswell as new incorrect variants of SVT, AdaptiveSVT and SmartSum.For all mechanisms we explore, CheckDP is able to: (1) provide aproof if it satisfies differential privacy, or (2) provide a counterexam-ple if it violates the claimed level of privacy. Neither false positivesnor false negatives were observed. In this section, we discuss thenew cases; detailed explanations can be found in the Appendix.
Sparse Vector with Monotonic Queries. The queries in some usagesof SVT are monotonic. In such cases, a Lap 2ð /ð noise (instead ofLap 4ð /ð in SVT) is sufficient for ð-privacy [40].
AdaptiveSVT, BadAdaptiveSVT and BadSmartSum. Ding et al. [24]recently proposed a new variant of SVT which adaptively allocatesprivacy budget, saving privacy cost when noisy query answers aremuch larger than the noisy threshold. The difference from standard(correct) SVT is that it first draws a [2 := Lap 8ð /ð noise (insteadof Lap 4ð /ð in SVT) and checks if the gap between noisy query andnoisy threshold ð[ is larger than a preset hyper-parameter ð (ifð [ð] + [2 -ð[ ⥠ð). If the test succeeds, the gap is directly returned,hence costing only ð/(8ð ) (instead of ð/(4ð )) privacy budget. Oth-erwise, it draws [3 := Lap 4ð /ð and follows the same procedure asSVT. We also create an incorrect variant called BadAdaptiveSVT.It directly releases the noisy query answer instead of the gap afterthe first test. Sampling-based methods can have difficulty detectingthe privacy leakage because the privacy-violating branch of theBadAdaptiveSVT code is not executed frequently. We also createan incorrect variant of SmartSum by releasing a noise-less sumof queries in an infrequent branch. Details of SmartSum and thisvariant can be found in the Appendix.
SVT with Wrong Privacy Claims (Imprecise SVT). We also studyanother interesting yet quite challenging violation of differential
privacy: suppose a mechanism satisfies 1.1-differential privacy butclaims to be 1-differentially private. This slight violation requiresprecise reasoning about the privacy cost and poses challenges forprior sampling-based approaches.We thus evaluate a variant of SVT,referred to as Imprecise SVT, which is ð = 1.1-differentially privatebut with an incorrect claim of ð = 1 (check(1) in the signature).
5.2 ExperimentsWe evaluate CheckDP on a Intel® Xeon® E5-2620 v4 CPU machinewith 64 GB memory. To compare CheckDP with the state-of-the-arttools, we either directly run tools on the benchmark when theyare publicly available (including ShadowDP [51], StatDP [23] andDP-Finder [14]), or cite the reported results from the correspondingpapers (including Coupling [3] and DiPC [4]).5 For the latter case,we note that the numbers are for reference only, due to differentsettings, including hardware, used in the experiments.
Counterexample Generation. Table 1 lists the counterexamples(i.e., a pair of related inputs and a feasible output that witness theviolation of claimed level of privacy) automatically generated byCheckDP for the incorrect algorithms. For all incorrect algorithms,CheckDP is able to provide a counterexample (validated by PSI [33])in 15 seconds and 8 iterations.6
Notably, both StatDP and DP-Finder fail to find the privacy viola-tions in BadSmartSum and BadAdaptiveSVT, as well as the violationof ð = 1-privacy in Imprecise SVT after hours of searching.7 Thisis due to the limitations of sampling-based approaches. In certaincases, we can help these sampling-based algorithms by manually
5Default settings are used in our evaluation: 100K/500K samples for event selection/hy-pothesis testing components of StatDP; 50 iterations for sampling and optimizationcomponents of DP-Finder where each iteration collects 409,600 samples on average.6We note that the counterexample of BadSmartSum is validated on a slightly modifiedalgorithm since PSI does not support modulo operation.7For StatDP, we use 1000X of the default number of samples to confirm the failure.
providing proper values for the extra arguments that some of themechanisms require (4ð¡â column of Table 1). This extra advantage(labeled Semi-Manual in the table) sometimes allows the sampling-based methods to find counterexamples. We note that CheckDP, incontrast, generates all inputs automatically.
Verification. Table 2 lists the automatically generated proofs (i.e.,alignments) for each random variable in the correct algorithms.Due to the soundness of CheckDP, all returned proofs are valid.We note that correct algorithms on average take more iterations(and hence, time) to verify; still all of them are verified within 70seconds. . Report Noisy Max is the only example that uses shadowexecution; the selector generated is S = ð [ð] +[2 ⥠ððâš ð = 0?â :âŠ,the same as the manually generated one in [51].
Performance. We note that all examples finish within 10 itera-tions. We contribute the efficiency to the reduced search space ofAlgorithm 1 (e.g., the alignment template for GapSVT only con-tains 7 âholesâ) as well as our novel verify-invalidate loop thatallows verification and counterexample generation componentsto communicate in meaningful ways. Compared with StatDP andDP-Finder, CheckDP is more efficient on the cases where they dofind counterexamples. Compared with static tools [3, 4], we notethat CheckDP is much faster on BadSVT3, SmartSum and SVT. Insummary, CheckDP is mostly more efficient compared to coun-terexample detectors and automated provers.
6 RELATEDWORKProving and Disproving Differential Privacy. Concurrent works [4,
30] also target both proving and disproving differential privacy.Barthe et al. [4] identify a non-trivial class of programs wherechecking differential privacy is decidable. Their work also supportsapproximate differential privacy. However, the decidable programsonly allow finite inputs and outputs, while CheckDP is applicable toa larger class of programs. Moreover, CheckDP is more scalable, asobserved in our evaluation. Farina [30] builds a relational symbolicexecution framework, which when combined with probabilisticcouplings, is able to prove differential privacy or generate failingtraces for SVT and its two incorrect variants. However, it is unclearif the employed heuristic strategies work on other mechanisms,such as Report Noisy Max. Moreover, CheckDP is likely to be morescalable since their approach treats both program inputs and proofsin a symbolic way, whereas in the novel verify-invalidate loop ofCheckDP, either program inputs or proofs are concrete.
Formal Verification of Differential Privacy. From the verificationperspective, CheckDP is mostly related to LightDP [52] and Shad-owDP [51] â all use randomness alignment. The type system ofCheckDP is directly inspired by that of [51, 52]. However, the mostimportant difference is that CheckDP is the first that automati-cally generates alignment-based proofs; both LightDP and Shad-owDP assume manually-provided proofs. As discussed in Section 3,CheckDP also simplifies the previous type systems and defers allprivacy-related checks to later stages. Both changes are importantfor automatically generating proofs and counterexamples.
Besides alignment-based proofs, probabilistic couplings and lift-ings [3, 7, 9] have also been used in language-based verificationof differential privacy. Most notably, Albarghouthi and Hsu [3]
proposed the first automated tool capable of generating couplingproofs for complex mechanisms. Coupling proofs are known to bemore general than alignment-based proofs, while alignment-basedproofs are more light-weight. Since CheckDP and [3] are built ondifferent proof techniques, the proof generation algorithm in [3]is not directly applicable in our context. Moreover, [3] does notgenerate counterexamples and we do not see an obvious way toextend the Synthesize-Verify loop of [3] to do so .
With verified privacymechanisms, such as SVT and Report NoisyMax, we still need to verify that the larger program built on top ofthem is differentially private. An early line of work [8, 10, 11, 32, 44]uses (variations of) relational Hoare logic and linear indexed typesto derive differential privacy guarantees. For example, Fuzz [44]and its successor DFuzz[32] combine linear indexed types and light-weight dependent types to allow rich sensitivity analysis and thenuse the composition theorem to prove overall system privacy. Wenote that CheckDP and those systems are largely orthogonal: thosesystems rely on trusted mechanisms (e.g., SVT and Report NoisyMax) without verifying them, while CheckDP is likely less scalable;they can be combined for sophisticated verification tasks.
Counterexample Generation. Ding et al. [23] and Bichsel et al.[14] proposed counterexample generators that rely on sampling ârunning an algorithm hundreds of thousands of times to estimatethe output distribution of mechanisms (this information is thenused to find counterexamples). The strength of these methods isthat they do not rely on external solvers, andmore importantly, theyare not tied to (the limitation of) any particular proof technique(e.g., randomness alignment and coupling). However, sampling alsomake the counterexample detectors imprecise and more likely tofail in some cases, as confirmed in the evaluation.
7 CONCLUSIONS AND FUTUREWORKWe proposed CheckDP, an integrated tool based on static analysisfor automatically proving or disproving that a mechanism satisfiesdifferential privacy. Evaluation shows that CheckDP is able to pro-vide proofs for a number of algorithms, as well as counterexamplesfor their incorrect variants within 2 to 70 seconds. Moreover, allgenerated proofs and counterexamples are validated.
For future work, CheckDP relies on the underlying randomnessalignment technique; hence it is subject to its limitations, includinglack of support for (ð, ð¿)-differential privacy and renyi differentialprivacy [43]. We plan to extend the underlying proof technique forother variants of differential privacy.
Moreover, subtle mechanisms such as PrivTree [53] and privateselection [39], where the costs of intermediate results are dependenton the data but the cost of sum is data-independent, is still out ofreach for formal verification (including CheckDP).
Finally, CheckDP is designed for DP mechanisms, rather thanlarger programs built on top of them. An interesting area of futurework is integrating CheckDP with tools like DFuzz [32], which aremore efficient on programs built on top of DP mechanisms (butdonât verify the mechanisms themselves).
ACKNOWLEDGMENTSWe thank the anonymous reviewers for their insightful feedbacks.This work was supported by NSF Awards CNS-1702760.
REFERENCES[1] John M. Abowd. 2018. The U.S. Census Bureau Adopts Differential Privacy.
In Proceedings of the 24th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (London, United Kingdom) (KDD â18). ACM, NewYork, NY, USA, 2867â2867.
[2] Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman. 1986. Compilers, principles,techniques. Addison wesley 7, 8 (1986), 9.
[3] Aws Albarghouthi and Justin Hsu. 2017. Synthesizing Coupling Proofs of Differ-ential Privacy. Proceedings of ACM Programming Languages 2, POPL, Article 58(Dec. 2017), 30 pages.
[4] Gilles Barthe, Rohit Chadha, Vishal Jagannath, A. Prasad Sistla, and MaheshViswanathan. 2020. Deciding Differential Privacy for Programs with FiniteInputs and Outputs. In Proceedings of the 35th Annual ACM/IEEE Symposium onLogic in Computer Science (SaarbrÃŒcken, Germany) (LICS â20). Association forComputing Machinery, New York, NY, USA, 141â154. https://doi.org/10.1145/3373718.3394796
[5] Gilles Barthe, George Danezis, Benjamin Gregoire, Cesar Kunz, and SantiagoZanella-Beguelin. 2013. Verified Computational Differential Privacy with Appli-cations to Smart Metering. In Proceedings of the 2013 IEEE 26th Computer SecurityFoundations Symposium (CSF â13). IEEE Computer Society, Washington, DC, USA,287â301.
[6] Gilles Barthe, Pedro R. DâArgenio, and Tamara Rezk. 2004. Secure InformationFlow by Self-Composition. In Proceedings of the 17th IEEE Workshop on ComputerSecurity Foundations (CSFW â04). IEEE Computer Society, Washington, DC, USA,100â.
[7] Gilles Barthe, Noémie Fong, Marco Gaboardi, Benjamin Grégoire, Justin Hsu,and Pierre-Yves Strub. 2016. Advanced Probabilistic Couplings for DifferentialPrivacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer andCommunications Security (Vienna, Austria) (CCS â16). ACM, New York, NY, USA,55â67.
[8] Gilles Barthe, Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu, CésarKunz, and Pierre-Yves Strub. 2014. Proving Differential Privacy in Hoare Logic.In Proceedings of the 2014 IEEE 27th Computer Security Foundations Symposium(CSF â14). IEEE Computer Society, Washington, DC, USA, 411â424.
[9] Gilles Barthe, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, and Pierre-YvesStrub. 2016. Proving Differential Privacy via Probabilistic Couplings. In Proceed-ings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science (NewYork, NY, USA) (LICS â16). ACM, New York, NY, USA, 749â758.
[10] Gilles Barthe, Boris Köpf, Federico Olmedo, and Santiago Zanella Béguelin. 2012.Probabilistic Relational Reasoning for Differential Privacy. In Proceedings of the39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages (Philadelphia, PA, USA) (POPL â12). ACM, New York, NY, USA, 97â110.
[11] Gilles Barthe and Federico Olmedo. 2013. Beyond Differential Privacy: Compo-sition Theorems and Relational Logic for f-divergences Between ProbabilisticPrograms. In Proceedings of the 40th International Conference on Automata, Lan-guages, and Programming - Volume Part II (Riga, Latvia) (ICALPâ13). Springer-Verlag, Berlin, Heidelberg, 49â60.
[12] Jean-Francois Bergeretti and Bernard A. Carré. 1985. Information-flow and Data-flow Analysis of While-programs. ACM Trans. Program. Lang. Syst. 7, 1 (Jan.1985), 37â61. https://doi.org/10.1145/2363.2366
[13] Dirk Beyer and M. Erkan Keremoglu. 2011. CPACHECKER: A Tool for Config-urable Software Verification. In Proceedings of the 23rd International Conferenceon Computer Aided Verification (Snowbird, UT) (CAVâ11). Springer-Verlag, Berlin,Heidelberg, 184â190.
[14] Benjamin Bichsel, TimonGehr, DanaDrachsler-Cohen, Petar Tsankov, andMartinVechev. 2018. DP-Finder: Finding Differential Privacy Violations by Sampling andOptimization. In Proceedings of the 2018 ACM SIGSAC Conference on Computerand Communications Security (Toronto, Canada) (CCS â18). ACM, New York, NY,USA, 508â524.
[15] Andrea Bittau, Ãlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghu-nathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and BernhardSeefeld. 2017. Prochlo: Strong Privacy for Analytics in the Crowd. In Proceedingsof the 26th Symposium on Operating Systems Principles (Shanghai, China) (SOSPâ17). ACM, New York, NY, USA, 441â459. https://doi.org/10.1145/3132747.3132769
[16] Mark Bun and Thomas Steinke. 2016. Concentrated Differential Privacy: Sim-plifications, Extensions, and Lower Bounds. In Proceedings, Part I, of the 14thInternational Conference on Theory of Cryptography - Volume 9985. Springer-VerlagNew York, Inc., New York, NY, USA, 635â658.
[17] U. S. Census Bureau. 2019. On The Map: Longitudinal Employer-HouseholdDynamics. https://lehd.ces.census.gov/applications/help/onthemap.html#!confidentiality_protection
[18] Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted andAutomatic Generation of High-coverage Tests for Complex Systems Programs.In Proceedings of the 8th USENIX Conference on Operating Systems Design andImplementation (San Diego, California) (OSDIâ08). USENIX Association, Berkeley,CA, USA, 209â224. http://dl.acm.org/citation.cfm?id=1855741.1855756
[19] T.-H. Hubert Chan, Elaine Shi, and Dawn Song. 2011. Private and ContinualRelease of Statistics. ACM Trans. Inf. Syst. Secur. 14, 3, Article 26 (Nov. 2011),
24 pages.[20] Rui Chen, Qian Xiao, Yu Zhang, and Jianliang Xu. 2015. Differentially Private
High-Dimensional Data Publication via Sampling-Based Inference. In Proceedingsof the 21th ACM SIGKDD International Conference on Knowledge Discovery andData Mining (Sydney, NSW, Australia) (KDD â15). ACM, New York, NY, USA,129â138. https://doi.org/10.1145/2783258.2783379
[21] Yan Chen and Ashwin Machanavajjhala. 2015. On the Privacy Properties ofVariants on the Sparse Vector Technique. http://arxiv.org/abs/1508.07306.
[22] Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting TelemetryData Privately. In Proceedings of the 31st International Conference on NeuralInformation Processing Systems (Long Beach, California, USA) (NIPSâ17). CurranAssociates Inc., USA, 3574â3583. http://dl.acm.org/citation.cfm?id=3294996.3295115
[23] Zeyu Ding, Yuxin Wang, Guanhong Wang, Danfeng Zhang, and Daniel Kifer.2018. Detecting Violations of Differential Privacy. In Proceedings of the 2018 ACMSIGSAC Conference on Computer and Communications Security (Toronto, Canada)(CCS â18). ACM, New York, NY, USA, 475â489.
[24] Zeyu Ding, Yuxin Wang, Danfeng Zhang, and Daniel Kifer. 2019. Free Gap Infor-mation from the Differentially Private Sparse Vector and Noisy Max Mechanisms.PVLDB 13, 3 (2019), 293â306. https://doi.org/10.14778/3368289.3368295
[25] Cynthia Dwork. 2006. Differential Privacy. In Proceedings of the 33rd InternationalConference on Automata, Languages and Programming - Volume Part II (Venice,Italy) (ICALPâ06). Springer-Verlag, Berlin, Heidelberg, 1â12. https://doi.org/10.1007/11787006_1
[26] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, andMoni Naor. 2006. Our Data, Ourselves: Privacy via Distributed Noise Generation.In Proceedings of the 24th Annual International Conference on The Theory andApplications of Cryptographic Techniques (St. Petersburg, Russia) (EUROCRYPTâ06).Springer-Verlag, Berlin, Heidelberg, 486â503.
[27] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Cali-brating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography,Shai Halevi and Tal Rabin (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg,265â284.
[28] Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differ-ential privacy. Theoretical Computer Science 9, 3â4 (2014), 211â407.
[29] Ãlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. RAPPOR: Ran-domized Aggregatable Privacy-Preserving Ordinal Response. In Proceedings ofthe 2014 ACM SIGSAC Conference on Computer and Communications Security(Scottsdale, Arizona, USA) (CCS â14). ACM, New York, NY, USA, 1054â1067.
[30] Gian Pietro Farina. 2020. Coupled Relational Symbolic Execution. Ph.D. Disserta-tion. State University of New York at Buffalo.
[31] Jeanne Ferrante, Karl J Ottenstein, and Joe D Warren. 1987. The program de-pendence graph and its use in optimization. ACM Transactions on ProgrammingLanguages and Systems (TOPLAS) 9, 3 (1987), 319â349.
[32] Marco Gaboardi, Andreas Haeberlen, Justin Hsu, Arjun Narayan, and Benjamin C.Pierce. 2013. Linear Dependent Types for Differential Privacy. In Proceedings ofthe 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages (Rome, Italy) (POPL â13). ACM, New York, NY, USA, 357â370. https://doi.org/10.1145/2429069.2429113
[33] Timon Gehr, Sasa Misailovic, and Martin Vechev. 2016. PSI: Exact SymbolicInference for Probabilistic Programs. In Computer Aided Verification, SwaratChaudhuri and Azadeh Farzan (Eds.). Springer International Publishing, Cham,62â83.
[34] Anna Gilbert and Audra McMillan. 2018. Property Testing for Differential Privacy.arXiv:1806.06427 [cs.CR]
[35] Samuel Haney, Ashwin Machanavajjhala, John M. Abowd, Matthew Graham,Mark Kutzbach, and Lars Vilhuber. 2017. Utility Cost of Formal Privacy forReleasing National Employer-Employee Statistics. In Proceedings of the 2017ACM International Conference on Management of Data (Chicago, Illinois, USA)(SIGMOD â17). ACM, New York, NY, USA, 1339â1354. https://doi.org/10.1145/3035918.3035940
[36] Noah Johnson, Joseph PNear, andDawn Song. 2018. Towards practical differentialprivacy for SQL queries. Proceedings of the VLDB Endowment 11, 5 (2018), 526â539.
[37] Dexter Kozen. 1981. Semantics of probabilistic programs. J. Comput. System Sci.22, 3 (1981), 328 â 350.
[38] Jaewoo Lee and Christopher W. Clifton. 2014. Top-k Frequent Itemsets via Dif-ferentially Private FP-trees. In Proceedings of the 20th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (New York, New York, USA)(KDD â14). ACM, New York, NY, USA, 931â940. https://doi.org/10.1145/2623330.2623723
[39] Jingcheng Liu and Kunal Talwar. 2019. Private Selection from Private Candidates.In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing(Phoenix, AZ, USA) (STOC 2019). Association for Computing Machinery, NewYork, NY, USA, 298â309. https://doi.org/10.1145/3313276.3316377
[40] Min Lyu, Dong Su, and Ninghui Li. 2017. Understanding the sparse vectortechnique for differential privacy. Proceedings of the VLDB Endowment 10, 6(2017), 637â648.
[41] A. Machanavajjhala, D. Kifer, J. Abowd, J. Gehrke, and L. Vilhuber. 2008. Privacy:Theory meets Practice on the Map. In 2008 IEEE 24th International Conference onData Engineering. IEEE, Piscataway, NJ, USA, 277â286. https://doi.org/10.1109/ICDE.2008.4497436
[42] Frank McSherry. 2018. Uberâs differential privacy .. probably isnât. https://github.com/frankmcsherry/blog/blob/master/posts/2018-02-25.md (retrieved11/15/2019).
[43] I. Mironov. 2017. Rényi Differential Privacy. In 2017 IEEE 30th Computer SecurityFoundations Symposium (CSF). IEEE, Piscataway, NJ, USA, 263â275. https://doi.org/10.1109/CSF.2017.11
[44] Jason Reed and Benjamin C. Pierce. 2010. Distance Makes the Types GrowStronger: A Calculus for Differential Privacy. In Proceedings of the 15th ACM SIG-PLAN International Conference on Functional Programming (Baltimore, Maryland,USA) (ICFP â10). ACM, New York, NY, USA, 157â168. https://doi.org/10.1145/1863543.1863568
[45] Aaron Roth. 2011. The Sparse Vector Technique. http://www.cis.upenn.edu/~aaroth/courses/slides/Lecture11.pdf.
[46] Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, and VijaySaraswat. 2006. Combinatorial Sketching for Finite Programs. In Proceedings of the12th International Conference on Architectural Support for Programming Languagesand Operating Systems (San Jose, California, USA) (ASPLOS XII). Association forComputing Machinery, New York, NY, USA, 404â415. https://doi.org/10.1145/1168857.1168907
[47] Ben Stoddard, Yan Chen, and Ashwin Machanavajjhala. 2014. DifferentiallyPrivate Algorithms for Empirical Machine Learning. arXiv:1411.5428 [cs.LG]
[48] Apple Differential Privacy Team. 2017. Learning with Privacy at Scale. https://machinelearning.apple.com/2017/12/06/learning-with-privacy-at-scale.html
[49] Tachio Terauchi and Alex Aiken. 2005. Secure information flow as a safetyproblem. In International Static Analysis Symposium. Springer, 352â367.
[50] Yuxin Wang, Zeyu Ding, Daniel Kifer, and Danfeng Zhang. 2020. CheckDP: AnAutomated and Integrated Approach for Proving Differential Privacy or FindingPrecise Counterexamples. arXiv:2008.07485 [cs.PL]
[51] YuxinWang, Zeyu Ding, GuanhongWang, Daniel Kifer, and Danfeng Zhang. 2019.Proving Differential Privacy with Shadow Execution. In Proceedings of the 40thACM SIGPLAN Conference on Programming Language Design and Implementation(Phoenix, AZ, USA) (PLDI 2019). ACM, New York, NY, USA, 655â669. https://doi.org/10.1145/3314221.3314619
[52] Danfeng Zhang andDaniel Kifer. 2017. LightDP: Towards AutomatingDifferentialPrivacy Proofs. In Proceedings of the 44th ACM SIGPLAN Symposium on Principlesof Programming Languages (Paris, France) (POPL 2017). ACM, New York, NY,USA, 888â901.
[53] Jun Zhang, Xiaokui Xiao, and Xing Xie. 2016. PrivTree: A Differentially PrivateAlgorithm for Hierarchical Decompositions. In Proceedings of the 2016 Interna-tional Conference on Management of Data (San Francisco, California, USA) (SIG-MOD â16). Association for Computing Machinery, New York, NY, USA, 155â170.https://doi.org/10.1145/2882903.2882928
A SHADOW EXECUTIONWe show how to extend the program transformation in Figure 3to support shadow execution. At a high level, the extension en-codes the selectors (which requires manual annotations in Shad-owDP [51]) and integrates themwith the generated templates. Withthe extra âholesâ in the templates, the verify-invalidate loop will au-tomatically find alignments (including selectors)/counterexamples.The complete set of transformation rules with shadow execution isshown in Figure 7, where the extensions are highlighted in gray.
Syntax and Expressions. Since a new shadow execution is tracked,types for each variable would be expanded to include a pair of dis-tances âšdâŠ, dâ â©. More specifically, the types should now be definedas: ð ::= numâšdâŠ,dâ â© | bool | list ð .
With the modified types, corresponding modifications to thetransformation rules for expressions are straightforward and mini-mal: the handling of shadow distances are essentially the same asthat of aligned distances.
Lð, ÎMâ = ð Ltrue, ÎMâ = true Lfalse, ÎMâ = false
Lð¥, ÎMâ ={ð¥ + nâ , if Π⢠ð¥ : numâšnâŠ,nâ â©ð¥ , else
Lð1 op ð2, ÎMâ = Lð1, ÎMâ op Lð2, ÎMâ where op = â ⪠â ⪠â
Lð1 [ð2 ], ÎMâ ={ð1 [ð2 ] + ð1â [ð2 ] , if Îâ ⢠ð1 : list numâð1 [ð2 ] , else
Lð1 :: ð2, ÎMâ = Lð1, ÎMâ :: Lð2, ÎMâ L¬ð, ÎMâ = ¬Lð, ÎMâ
Lð1 ? ð2 : ð3, ÎMâ = Lð1Mâ ? Lð2, ÎMâ : Lð3, ÎMâ
Lskip, ÎMâ = skipLð1; ÎMâ = ðâ²1 Lð2; ÎMâ = ðâ²2
Lð1;ð2, ÎMâ = ðâ²1;ðâ²2
Lð¥ := ð, ÎMâ = (ᅵᅵâ := Lð, ÎMâ â ð¥)
Lðð , ÎMâ = ðâ²ð ð â {1, 2}Lif ð then ð1 else ð2, ÎMâ = if Lð, ÎMâ then ðâ²1 else ð
â²2
Lð, ÎMâ = ðâ²
Lwhile ð do ð, ÎMâ = while Lð, ÎMâ do ðâ²
Figure 6: Shadow execution for expressions and commands.
Normal Commands. Following the type system of ShadowDP,a program counter ðð â {â€,â¥} is introduced to each transforma-tion rule for commands to capture potential divergence of shadowexecution. Specifically, ðð ⢠ð â ð â². ðð = †(resp. â¥) means thatthe branch / loop command might diverge in the shadow execu-tion (resp. must stay the same). The value of ðð is used to guidehow each rule should handle the shadow distances (e.g., (T-Asgn)),which we will explain shortly. Therefore, another auxiliary functionupdatePC is added to track the value of pc.
Compared with the type system of ShadowDP, the first majordifference is in (T-Asgn). If ðð = â¥, shadow distances are handledas the aligned distances. However, when ðð = †(shadow executiondiverges), it updates the shadow distance of the variable to makesure the value in shadow execution (i.e., ð¥ + ᅵᅵâ ) remains the same af-ter the assignment. For example, Line 20 in Figure 8 is instrumentedto maintain the value of bq in the shadow execution (bq + bqâ ), sothat the branch at Line 26 is not affected by the new assignment ofbq.
As previously explained, a separate shadow branch / loop hasto be generated to correctly track the shadow distances of thevariables. More specifically, Rule (T-If) and (T-While) is extendedto include an extra shadow execution command ðâ when pctransitsfrom ⥠to â€. The shadow execution is constructed by an auxiliaryfunction Lð, ÎMâ , which is the same as the ones in ShadowDP [51],hence omitted due to space constraints. It essentially replaces eachvariable with its correspondence (e.g., variable ð¥ to ð¥ + ᅵᅵâ ), as isstandard in self-composition [6, 49].
Transformation rules for expressions with form Π⢠ð : BâšnâŠ,nâ â©
Π⢠ð : numâš0,0â© | true(T-Num)
Π⢠ð : bool | true (T-Boolean)Π⢠ð : bool | CΠ⢠¬ð : bool | C (T-Neg)
Î (ð¥) = BâšdâŠ,dâ â© nâ =
{ᅵᅵâ if dâ = â
0 otherwise â â {âŠ, â }
Π⢠ð¥ : BâšnâŠ,nâ â© | true(T-Var)
Π⢠ð1 : numâšn1,n2 â© | C1 Π⢠ð2 : numâšn3,n4 â© | C2Π⢠ð1 â ð2 : numâšn1ân3,n2ân4 â© | C1 ⧠C2
(T-OPlus)
Π⢠ð1 : numâšn1,n2â© | C1 Π⢠ð2 : numâšn3,n4 â© | C2
Π⢠ð1 â ð2 : numâš0,0â© | C1 ⧠C2 â§(n1 = n2 =
n3 = n4 = 0)(T-OTimes)
Π⢠ð1 : numâšn1,n2 â© | C1 Π⢠ð1 : numâšn3,n4 â© | C2Π⢠ð1 â ð2 : bool | C1 ⧠C2 ⧠(ð1 â ð2) â (ð1 + n1) â (ð2 + n3)
(T-ODot)
Π⢠ð1 : Bâšn1,n2 â© | C1 Π⢠ð2 : list Bâšn3,n4 â© | C2
Π⢠ð1 :: ð2 : list Bâšn3,n4 â© | C1 ⧠C2 â§(n1 = n2 =n3 = n4 = 0)
(T-Cons) Π⢠ð1 : list ð | C1 Π⢠ð2 : numâšn1,n2 â© | C2Π⢠ð1 [ð2 ] : ð | C1 ⧠C2 ⧠(n1 = n2 = 0) (T-Index)
Π⢠ð1 : bool | C1 Π⢠ð2 : Bâšn1,n2 â© | C2 Π⢠ð3 : Bâšn3,n4 â© | C3Π⢠ð1 ? ð2 : ð3 : Bâšn1,n2 â© | C1 ⧠C2 ⧠C3 ⧠(n1 = n2= n3 = n4 )
(T-Select)
Transformation rules for commands with form ðð ⢠Π{ð â ðâ² } Îâ²
Π⢠ð : BâšnâŠ,nâ â© âšdâŠ, ð⊠⩠=
{âš0, skipâ©, if n⊠== 0,âšâ, ᅵᅵ⊠:= n⊠â©, otherwise âšdâ , ðâ , ðâ²â© =
âš0, skip, skipâ©, if ðð = ⥠⧠nâ = 0âšâ, ᅵᅵâ := nâ , skipâ©, if ðð = ⥠⧠nâ â 0âšâ, skip, ᅵᅵâ := ð¥ + nâ â ð â©, otherwise
ðð ⢠Π{ð¥ := ð â ðâ² ;ð¥ := ð ;ðâŠ; ðâ } Î [ð¥ âŠâ BâšdâŠ,dâ â© ](T-Asgn)
ðð ⢠Π{ð1 â ðâ²1 } Î1 ðð ⢠Î1 {ð2 â ðâ²2 } Î2ðð ⢠Π{ð1;ð2 â ðâ²1;ð
â²2 } Î2
(T-Seq)ðð ⢠Π{skip â skip} Î (T-Skip)
Π⢠ð : BâšnâŠ,nâ â© | Cðð ⢠Π{return ð â assert(C ⧠n⊠= 0) ; return ð } Î (T-Return)
Π⢠ð : bool | Cðð ⢠Πâ Îð {ð â ðâ² } Îð
ððâ² = updatePC(ðð, Î, ð)Î, Î â Îð , ðð
â² â ðð Îð , Î â Îð , ðð â ðâ²â²
ðâ =
{skip, if (ðð = †⚠ððâ² = â¥)Lwhile ð do ð, Î â Îð Mâ , else
ðð ⢠Π{while ð do ð â ðð ; (while ð do (assert(C) ;ðâ²;ðâ²â²)) ; ðâ } Î â Îð(T-While)
Π⢠ð : bool | Cðð ⢠Π{ðð â ðâ²ð } Îð
ððâ² = updatePC(ðð, Î, ð)Îð , Î1 â Î2, ðð
â² â ðâ²â²ð ð â {1, 2} ðâ =
{skip, if (ðð = †⚠ððâ² = â¥)Lif ð then ð1 else ð2, Î1 â Î2Mâ , else
ðð ⢠Π{if ð then ð1 else ð2 â (if ð then (assert(C) ;ðâ²1;ðâ²â²1 ) else (assert(C) ;ðâ²2;ðâ²â²2 )) ; ðâ } Î1 â Î2(T-If)
A, S = GenerateTemplate(Î,All Assertions)ðð = assert( ( ([ + A) {[1/[ } = ([ + A) {[2/[ } â [1 = [2))
Îâ = _ð¥. âšnâ ,nâ â© where Π⢠ð¥ : BâšnâŠ,nâ â©
Îâ² = Î â Îâ
ðâ² = {ᅵᅵ⊠:= nâ | Îâ² (ð¥) = numâšâ,nâ â© }ðd = (if S then ðâ² else skip)
ðð ⢠Π{[ := Lap ð â ðð ; [ := ð ððððð [ððð¥ ]; ððð¥ := ððð¥ + 1; vð := (S ? vð : 0) + |A |/ð ; [ := A; ðd } Îâ² [[ âŠâ numâšâ,0â© ](T-Laplace)
Transformation rules for merging environments
Î1 â Î2
ð⊠= {ᅵᅵ⊠:= 0 | Î1 (ð¥) = numâš0,d1⩠⧠Î2 (ð¥) = numâšâ,d2â© }ðâ = {ᅵᅵâ := 0 | Î1 (ð¥) = numâšd1,0⩠⧠Î2 (ð¥) = numâšd2,ââ© }
ðâ² =
{ðâŠ;ðâ if ðð = â¥ð⊠if ðð = â€
Î1, Î2, ðð â ðâ²
PC update functionupdatePC(ðð, Î, ð) =
{⥠, if ðð = ⥠⧠Π⢠ð : numâšâ,0â©â€ , else
Figure 7: Rules for transforming probabilistic programs into deterministic ones with shadow execution extension. Differencesthat shadow execution introduce are marked in gray boxes.
Sampling Commands. The most interesting rule is (T-Laplace).In order to enable the automatic discovery of the selectors, ourGenerateTemplate algorithm needs to be extended to return aselector template S. Intuitively, a selector expression S with thefollowing syntax decides if the aligned or shadow execution ispicked:
Var Versions ð â {âŠ, â }Selectors S ::= ð ? S1 : S2 | ð
The definition of the selector template is then similar to thealignment template, where the value can depend on the branchconditions:
SE ::=
{ð0 ? SE\{ð0 } : SE\{ð0 } , when E = {ð0, · · · }\ with fresh \ , otherwise
Compared with other holes (\ ) in the alignment template (AE),the only difference is that \ in SE has Boolean values representingwhether to switch to shadow execution (â ) or not (âŠ).
To embed shadow execution into CheckDP, the type systemdynamically instruments an auxiliary command (ðd) according tothe selector templateS. Once a switch is made (S = â ), the distancesof all variables are replaced with their shadow versions by thiscommand. Moreover, the privacy cost vð will also be properly resetaccording to the selector.
B EXTRA CASE STUDIESIn this section we list the pseudo-code of the algorithms we eval-uated in the paper for completeness. The incorrect part for theincorrect algorithms is marked with a box.
B.1 Report Noisy MaxReport Noisy Max [25]. This is an important building block for de-
veloping differentially private algorithms. It generates differentiallyprivate synthetic data by finding the identity with the maximum(noisy) score in the database. Here we present this mechanism in asimplified manner: for a series of query answers q, where each ofthem can differ at most one in the adjacent underlying database,its goal is to return the index of the maximum query answer in aprivacy-preserving way. To achieve differential privacy, the mecha-nism first adds [ = Lap 2/ð noise to each of the query answer, thenreturns the index of the maximum noisy query answers q[i] + [,instead of the true query answers q[i]. The pseudo code of thismechanism is shown in Figure 8.
To prove its correctness using randomness alignment technique,we need to align the only random variable [ in the mechanism(Line 3). Therefore, a corresponding privacy cost of aligning [
would be incurred for each iteration of the loop. However, manualproof [25] suggests that we only need to align the random variableadded to the actual maximum query answer. In other words, weneed an ability to âresetâ the privacy cost upon seeing a new currentmaximum noisy query answer.
Bad Noisy Max. We also created an incorrect variant of ReportNoisy Max. This variant directly returns the maximum noisy queryanswer, instead of the index.
More specifically, it can be obtained by changing Line 5 in Fig-ure 8 from max := i to max := q[i] + [. CheckDP is then ableto find a counterexample for this incorrect variant.
function NoisyMax (size : numâš0,0â© , q : list numâšâ,ââ© )
returns max : numâš0,ââ©
precondition â i. â1 †qâŠ[i] †1 ⧠qâ [i] = qâŠ[i]
1 i := 0; bq := 0; max := 0;2 while (i < size)3 [ := Lap (2/ð);4 if (q[i] + [ > bq âš i = 0)5 max := i;6 bq := q[i] + [;7 i := i + 1;
function Transformed NoisyMax (size,q, q, ð ððððð , \ )returns (max)8 vð := 0; idx := 0;
9 i := 0; bq := 0; max := 0;
10 bqâŠ:= 0; bq
â := 0; max⊠:= 0; maxâ := 0;
11 while (i < size)12 [ := ð ððððð[idx]; vð := (S ? vð : 0) + |A | Ã ð;13 [ := A;
14 if (S) bqâŠ:= bq
â ; max⊠:= maxâ ;
15 if (q[i] + [ > bq ⚠i = 0)16 assert(q[i] + q[i] + [ + [⊠> bq + bq⊠⚠i = 0);17 assert(max⊠= 0);
18 max := i;19 max⊠:= 0;
20 bqâ := bq + bq
â - (q[i] + [);
21 bq := q[i] + [;
22 bqâŠ:= qâŠ[i] + [
âŠ;
23 else24 assert(¬(q[i] + q[i] + [ + [⊠> bq + bq⊠⚠i = 0));25 // shadow execution
26 if (q[i] + qâ [i] + [ > bq + bqâ âš i = 0)
27 bqâ := q[i] + qâ [i] + [ â bq;
28 maxâ := i - max;
29 i := i + 1;
Figure 8: Report NoisyMax and its transformed code, whereS = q[i] + [ > bq âš i = 0 ? \ [0] : \ [1] and A = \ [3] + \ [4] ÃqâŠ[i] + \ [5] à bqâŠ
B.2 Variants of Sparse Vector TechniqueSVT. We first show a correctly-implemented standard version
of SVT [40]. This standard implementation is less powerful thanrunning example GapSVT, as it outputs true instead of the gapbetween noisy query answer and noisy threshold. This can beobtained by changing Line 7 in Figure 1 from out := (q[i] +[2)::out; to out := true::out;.
SVT with Monotonic Queries. There exist use cases with SVTwhere the queries are monotonic. More formally, queries are mono-tonic if for related queries ð ⌠ðâ², âð . ðð †ðâ²
ðor âð . ðð ⥠ðâ²
ð. As
shown in [40]. When the queries are monotonic, it suffices to add[2 := Lap 2ð /ð to each queries (Line 5 in Figure 1) and the algo-rithm still satisfies ð-DP.
Thanks to the flexibility of CheckDP, it only requires one changein the function specification in order to verify this variant: mod-ify the constraint on q[i] in the precondition. Specifically, the
function SVT (T,N,size : num0 , ð : list numâ )
returns (out : list bool ), check(ð)precondition â i. â1 †q[i] †11 [1 := Lap (2/ð)2 ð[ := ð + [1;3 count := 0; i := 0;4 while (count < N ⧠i < size)5 [2 := Lap (4ð /ð)6 if (q[i] + [2 ⥠ð[ ) then7 out := true::out;8 count := count + 1;9 else
10 out := false::out;11 i := i + 1;
function Transformed SVT (T,N,size,q, q, ð ððððð , \ )returns (out)12 vð := 0; idx = 0;
13 [1 := ð ððððð[idx]; idx := idx + 1;
14 vð := vð + |A1 | Ã ð/2; [1 := A1;
15 ð[ := ð + [1;
16 ð[ := [1;
17 count := 0; i := 0;18 while (count < N ⧠i < size)19 [2 := ð ððððð[idx]; idx := idx + 1;
20 vð := vð + |A2 | Ã ð/4ð ; [2 := A2;
21 if (q[i] + [2 ⥠ð[ ) then22 assert(q[i] + [2 + q[i] + [2 ⥠ð[ + ð[);
23 out := true::out;24 count := count + 1;25 else26 assert(¬(q[i] + [2 + q[i] + [2 ⥠ð[ + ð[));
27 out := false::out;28 i := i + 1;29 assert(vð †ð);
Figure 9: Standard Sparse Vector Technique and its trans-formed code, where underlined parts are added by CheckDP.The transformed code contains two alignment templates for[1 and [2: A1 = \ [0] and A2 = (q[i] + [2[i] ⥠ð[ ) ? (\ [1] +\ [2] Ãð[ + \ [3] à q[i]) : (\ [4] + \ [5] Ãð[ + \ [6] à q[i]).
new precondition for SVT with monotonic queries becomes âi. 0 †q[i] †1 for the âð . ðð †ðâ²
ðand â i. â1 †q[i] †0
for the other case. The final found alignment by CheckDP is thesame as the ones reported in the manual randomness alignmentbased proofs [24]:
[1 : 0 [2 :
{q[i] + [2 ⥠ð[ ? 1 â q[i] : 0, if âð . ðð †ðâ²
ð
q[i] + [2 ⥠ð[ ? âq[i] : 0, otherwise
To the best of our knowledge, no prior verification works haveautomatically verified this variant.
Adaptive SVT. As mentioned in Section 5, we list the pseudocode of Adaptive SVT in Figure 10.
function AdaptiveSVT (T,N,size : num0 ,q : list numâ )
returns (out : list num0 ), check(ð)precondition â i. â1 †q[i] †11 cost := 0;2 [1 := Lap (2/ð);3 cost := cost + ð/2;4 ð[ := ð + [1;5 i := 0;6 while (cost †ð - 4ð /ð ⧠i < size)7 [2 := Lap (8ð /ð);8 if (q[i] + [2 âð[ ⥠ð) then9 out := (q[i] + [2 - ð[ )::out;
10 cost := cost + ð/(8ð );11 else12 [3 := Lap (4ð /ð);13 if (q[i] + [3 âð[ ⥠0) then14 out := (q[i] + [3 - ð[ )::out;15 cost := cost + ð/(4ð );16 else17 out := 0::out;18 i := i + 1;
function Transformed AdaptiveSVT (T,N,size,q, q, ð ððððð , \ )returns (out)12 vð := 0; idx = 0;
13 [1 := ð ððððð[idx]; idx := idx + 1;
14 vð := vð + |A1 | Ã ð/2; [1 := A1;
15 ð[ := ð + [1;
16 ð[ := [1;
17 count := 0; i := 0;18 while (cost †ð - 4ð /ð ⧠i < size)19 [2 := ð ððððð[idx]; idx := idx + 1;
20 vð := vð + |A2 | Ã ð/8ð; [2 := A2;
21 if (q[i] + [2 - ð[ ⥠ð) then
22 assert(q[i] + [2 + q[i] + [2 - (ð[ + ð[) ⥠ð);
23 assert(q[i] + [2 - ð[ == 0);
24 out := (q[i] + [2 - ð[ )::out;25 cost := cost + ð/(8ð );26 else
27 assert(¬(q[i] + [2 + q[i] + [2 - (ð[ + ð[) ⥠ð));
28 [3 := ð ððððð[idx]; idx := idx + 1;
29 vð := vð + |A3 | Ã ð/4ð ; [2 := A3;
30 if (q[i] + [3 - ð[ ⥠0)
31 assert(q[i] + [3 + q[i] + [3 - (ð[ + ð[ ⥠0) ;
32 assert(q[i] + [3 - ð[ == 0);
33 out := (q[i] + [3 - ð[ )::out;34 cost := cost + ð/(4ð );35 else
36 assert(¬(q[i] + [3 + q[i] + [3 - (ð[ + ð[) ⥠0));
37 out := false::out;38 i := i + 1;39 assert(vð †ð);
Figure 10: Adaptive SVT and its transformed code, where un-derlined parts are added by CheckDP. The transformed codecontains three alignment templates for [1 and [2:A1 = \ [0],A2 = Ωððð ? (\ [1] + \ [2] à ð[ + \ [3] à q[i]) : (\ [4] + \ [5] Ãð[ + \ [6] à q[i]) and A3 = Ωðððððð ? (\ [1] + \ [2] Ãð[ + \ [3] Ãq[i]) : (\ [4] + \ [5] Ãð[ + \ [6] à q[i]), where Ωâ denotes thecorresponding branch condition at Line 8 and 13.
function NumSVT (T,N,size : num0 , ð : list numâ )
returns (out : list bool ), check(ð)precondition â i. â1 †q[i] †11 [1 := Lap (3/ð)2 ð[ := ð + [1;3 count := 0; i := 0;4 while (count < N ⧠i < size)5 [2 := Lap (6ð /ð)6 if (q[i] + [2 ⥠ð[ ) then7 [3 := Lap (3ð /ð);8 out := (q[i] + [3)::out;9 count := count + 1;
10 else11 out := false::out;12 i := i + 1;
function Transformed NumSVT (T,N,size,q, q, ð ððððð , \ )returns (out)12 vð := 0; idx = 0;
13 [1 := ð ððððð[idx]; idx := idx + 1;
14 vð := vð + |A1 | Ã ð/3; [1 := A1;
15 ð[ := ð + [1;
16 ð[ := [1;
17 count := 0; i := 0;18 while (count < N ⧠i < size)19 [2 := ð ððððð[idx]; idx := idx + 1;
20 vð := vð + |A2 | Ã ð/6ð ; [2 := A2;
21 if (q[i] + [2 ⥠ð[ ) then22 assert(q[i] + [2 + q[i] + [2 ⥠ð[ + ð[);
23 [3 := ð ððððð[idx]; idx := idx + 1;
24 vð := vð + |A3 | Ã ð/3ð ; [3 := A3;
25 assert(q[i] + [3 = 0);
26 out := (q[i] + [3)::out;27 count := count + 1;28 else29 assert(¬(q[i] + [2 + q[i] + [2 ⥠ð[ + ð[));
30 out := false::out;31 i := i + 1;32 assert(vð †ð);
Figure 11: Numerical Sparse Vector Technique and itstransformed code, where underlined parts are added byCheckDP. The transformed code contains three alignmenttemplates for [1, [2 and [3 respectively: A1 = \ [0], A2 =
(q[i] + [2[i] ⥠ð[ ) ? (\ [1] + \ [2] Ãð[ + \ [3] à q[i]) : (\ [4] +\ [5] Ãð[ + \ [6] à q[i]), A3 = \ [7] + \ [8] Ãð[ + \ [9] à q[i])
NumSVT. Numerical Sparse Vector (NumSVT) [28] is anotherinteresting correct variant of SVT which outputs a numerical an-swer when the input query is larger than the noisy threshold. Itfollows the same procedure as Sparse Vector Technique, the dif-ference is that it draws a fresh noise [3 in the true branch, andoutputs ð [ð] + [3 instead of true. Note that this is very similarto our running example GapSVT and BadGapSVT, the key differ-ence is that the freshly-drawn random noise hides the informationabout ð[ , unlike the BadGapSVT. This variant can be obtained bymaking the following changes in Figure 1: (1) Line 1 is changedfrom Lap 2/ð to Lap 3/ð; (2) Line 5 is changed from Lap 4ð /ð toLap 6ð /ð; (3) Line 7 is change from out := (q[i] + [)::out;to â[3 := Lap (3ð /ð); out := (q[i] + [3)::out;â. CheckDP
finds the same alignment as shown in [52] with which CPACheckeris able to verify the algorithm with this generated alignment.
function BadSVT1 (T,N,size : num0 , ð : list numâ )
returns (out : list bool ), check(ð)precondition â i. â1 †q[i] †11 [1 := Lap (2/ð);2 ð[ := ð + [1;3 count := 0; i := 0;
4 while ( i < size )
5 [2 := 0 ;
6 if (q[i] + [2 ⥠ð[ ) then7 out := true::out;8 count := count + 1;9 else
10 out := false::out;11 i := i + 1;
function Transformed BadSVT1 (T,N,size,q, q, ð ððððð , \ )returns (out)12 vð := 0; idx = 0;
13 [1 := ð ððððð[idx]; idx := idx + 1;
14 vð := vð + |A1 | Ã ð/2; [1 := A1;
15 ð[ := ð + [1;
16 ð[ := [1;
17 count := 0; i := 0;18 while (i < size)19 [2 := 0;20 if (q[i] + [2 ⥠ð[ ) then21 assert(q[i] + [2 + q[i] ⥠ð[ + ð[);
22 out := true::out;23 count := count + 1;24 else25 assert(¬(q[i] + [2 + q[i] ⥠ð[ + ð[));
26 out := false::out;27 i := i + 1;28 assert(vð †ð);
Figure 12: BadSVT1 and its transformed code, where under-lined parts are added by CheckDP. The transformed codecontains a alignment template for [1: A1 = \ [0].
BadSVT1 - 3. We now study other three incorrect variants ofSVT collected from [40]. All three variants are based on the classicSVT algorithm we have seen (i.e., Line 7 in Figure 1 is out :=true::out;).
BadSVT1 [47] adds no noise to the query answers and has nobounds on the number of trueâs it can output. This variant isobtained by changing Line 4 from while (count<Nâ§i<size) towhile (i<size) and Line 5 from Lap 4ð /ð to 0. Another variantBadSVT2 [20] has no bounds on the number of trueâs it can outputas well. It keeps outputting true even if the given privacy budgethas been exhausted. Moreover, the noise added to the queries doesnot scale with parameter N. Specifically, based on BadSVT1, Line 5 ischanged to Lap 2/ð . BadSVT3 [38] is an interesting case since it triesto spend its privacy budget in a different allocation strategy betweenthe threshold T and the query answers q[i] (1 : 3 instead of 1 : 1).However, the noise added to [2 does not scale with parameter N.The 3/4 privacy budget is allocated to each of the queries where itshould be shared among them. To get this variant, based on SVT
function BadSVT2 (T,N,size : num0 , ð : list numâ )
returns (out : list bool ), check(ð)precondition â i. â1 †q[i] †11 [1 := Lap (2/ð);2 ð[ := ð + [1;3 count := 0; i := 0;
4 while ( i < size )
5 [2 := Lap (2/ð) ;
6 if (q[i] + [2 ⥠ð[ ) then7 out := true::out;8 count := count + 1;9 else
10 out := false::out;11 i := i + 1;
function Transformed BadSVT2 (T,N,size,q, q, ð ððððð , \ )returns (out)12 vð := 0; idx = 0;
13 [1 := ð ððððð[idx]; idx := idx + 1;
14 vð := vð + |A1 | Ã ð/2; [1 := A1;
15 ð[ := ð + [1;
16 ð[ := [1;
17 count := 0; i := 0;18 while (i < size)19 [2 := ð ððððð[idx]; idx := idx + 1;
20 vð := vð + |A2 | Ã ð/2; [2 := A2;
21 if (q[i] + [2 ⥠ð[ ) then22 assert(q[i] + [2 + q[i] + [2 ⥠ð[ + ð[);
23 out := true::out;24 count := count + 1;25 else26 assert(¬(q[i] + [2 + q[i] + [2 ⥠ð[ + ð[));
27 out := false::out;28 i := i + 1;29 assert(vð †ð);
Figure 13: BadSVT2 and its transformed code, where under-lined parts are added by CheckDP. The transformed codecontains two alignment templates for [1 and [2: A1 = \ [0]and A2 = (q[i] + [2 ⥠ð[ ) ? (\ [1] + \ [2] Ãð[ + \ [3] à q[i]) :(\ [4] + \ [5] Ãð[ + \ [6] à q[i]).
algorithm, the noise generation commands (Line 1 and Line 5) arechanged to [1 := Lap 4/ð and [2 := Lap 4/(3 Ã ð) , respectively.
Note that apart from BadSVT1, which does not sample [2, thegenerated templates are identical to the GapSVT since they all havesimilar typing environments.
Interestingly, since the errors are very similar among them(no bounds on number of outputs / wrong scale of added noise),CheckDP finds a common counterexample [0, 0, 0, 0, 0], [1, 1, 1, 1,â1]where ð = 0 and ð = 1 within 6 seconds, and this counterexampleis further validated by PSI.
B.3 Partial SumNext, we study a simple algorithm PartialSum (Figure 15) whichoutputs the sum of queries in a privacy-preserving manner: it di-rectly computes sum of all queries and adds a Lap 1/ð to the finaloutput sum. Note that similar to SmartSum, it has the same adja-cency requirement (only one query can differ by at most one). Thealignment is easily found for [ by CheckDP which is to âcancel
outâ the distance of sum variable (i.e., -sum). With the alignmentCPAChecker verifies this algorithm.
An incorrect variant for PartialSum called BadPartialSum is cre-ated where Line 5 is changed from 1/ð to 1/(2Ãð), therefore makingit fail to satisfy ð-differential privacy (though it actually satisfies2ð-differential privacy). A counterexample [0, 0, 0, 0, 0], [0, 0, 0, 0, 1]is found by CheckDP and further validated by PSI.
function BadSVT3 (T,N,size : num0 , ð : list numâ )
returns (out : list bool ), check(ð)precondition â i. â1 †q[i] †1
1 [1 := Lap (4/ð) ;
2 ð[ := ð + [1;3 count := 0; i := 0;4 while (count < N ⧠i < size)
5 [2 := Lap (4/3ð) ;
6 if (q[i] + [2 ⥠ð[ ) then7 out := true::out;8 count := count + 1;9 else
10 out := false::out;11 i := i + 1;
function Transformed BadSVT3 (T,N,size,q, q, ð ððððð , \ )returns (out)12 vð := 0; idx = 0;
13 [1 := ð ððððð[idx]; idx := idx + 1;
14 vð := vð + |A1 | Ã ð/4; [1 := A1;
15 ð[ := ð + [1;
16 ð[ := [1;
17 count := 0; i := 0;18 while (count < N ⧠i < size)19 [2 := ð ððððð[idx]; idx := idx + 1;
20 vð := vð + |A2 | Ã 3ð/4; [2 := A2;
21 if (q[i] + [2 ⥠ð[ ) then22 assert(q[i] + [2 + q[i] + [2 ⥠ð[ + ð[);
23 out := true::out;24 count := count + 1;25 else26 assert(¬(q[i] + [2 + q[i] + [2 ⥠ð[ + ð[));
27 out := false::out;28 i := i + 1;29 assert(vð †ð);
Figure 14: BadSVT3 and its transformed code, where under-lined parts are added by CheckDP. The transformed codecontains two alignment templates for [1 and [2: A1 = \ [0]and A2 = (q[i] + [2 ⥠ð[ ) ? (\ [1] + \ [2] Ãð[ + \ [3] à q[i]) :(\ [4] + \ [5] Ãð[ + \ [6] à q[i]).
B.4 SmartSum and BadSmartSumSmartSum [19] continually releases aggregated statistics with pri-vacy protections. For a finite sequence of queriesð [0], ð[1], · · · , ð[ð ],where ð is the length of ð, the goal of SmartSum is to release theprefix sum: ð [0], ð[0] + ð [1], · · · ,âð
ð=0 ð [ð] in a private way. Toachieve differential privacy, SmartSum first divides the sequenceinto non-overlapping blocks ðµ0, · · · , ðµð with sizeð , then maintainsthe noisy version of each query and noisy version of the block sum,
function PartialSum (size : num0 , q : list numâ )returns (out : num0 ), check(ð)precondition âi. â1 †q[i]†1â§(âi. (q[i]â 0)â (âj. q[j] = 0))1 sum := 0; i := 0;2 while (i < size)3 sum := sum + q[i];4 i := i + 1;5 [ = Lap (1/ð);6 out := sum + [;
function Transformed PartialSum (size,q,q, ð ððððð , \ )returns (out)7 vð := 0; sum := 0;
8 sum := 0; i := 0;9 while (i < size)
10 sum := sum + q[i];11 sum := sum + q[i];
12 i := i + 1;13 vð := vð + |A| à ð; [ := ðœ;
14 assert(sum + [);
15 out := sum + [;16 assert(vð †ð);
Figure 15: PartialSum and its transformation usingCheckDP, where A = \[0] + \[1] Ã sum + \[2] Ã q[i].
both by directly adding Lap 1/ð noise. Then to compute the ðth
component of the prefix sum sequenceâðð=0 ð [ð], it only has to add
up the noisy block sum that covers before ð , plus the remaining(ð + 1) modð noisy queries. The pseudo code is shown in Figure 16.The if branch is responsible for dividing the queries and summingup the block sums (stored in sum variable), where else branch addsthe remaining noisy queries.
Notably, SmartSum satisfies 2ð-differential privacy instead ofð-differential privacy. Moreover, the adjacency requirement of theinputs is that only one of the queries can differ by at most one.These two requirements are specified in the function signature(check(2ð) and precondition).
An incorrect variant of SmartSum, called BadSmartSum, is ob-tained by changing Line 4 to [1 :=0 in Figure 16. It directly releasessum + q[i] without adding any noise (since [1 = 0), where sumstores the accurate, non-noisy sum of queries (at Line 11), hencebreaking differential privacy. Interestingly, the violation only hap-pens in a rare branch if ((i + 1) mod M = 0), where the accuratesum is added to the output list out. In other words, out containsmostly private data with only a few exceptions. This rare eventmakes it challenging for sampling-based tools to find the violation.
function SmartSum (M,T,size : num0 , q : list numâ )
returns (out : list num0 ), check(2ð)precondition âi. â1 †q[i]†1â§(âi. (q[i]â 0)â (âj. q[j] = 0))1 next := 0; i := 0; sum := 0;2 while (i < size ⧠i †T)3 if ((i + 1) mod M = 0) then4 [1 := Lap (1/ð);5 next := sum + q[i] + [1;6 sum := 0;7 out := next::out;8 else9 [2 := Lap (1/ð);
10 next:= next + q[i] + [2;11 sum := sum + q[i];12 out := next::out;13 i := i + 1;
function Transformed SmartSum (M,T,size,q, q, ð ððððð , \ )returns (out)14 vð := 0; idx := 0;
15 next := 0; i := 0; sum := 0;
16 sum := 0; ï¿œnext := 0;
17 while (i < size ⧠i †T)18 if ((i + 1) mod M = 0) then19 [1 := ð ððððð[idx]; idx := idx + 1;
20 vð := vð + |A1| Ã ð; [1 := A1;
21 next := sum + q[i] + [1;
22 ï¿œnext := sum + q[i] + [1;
23 sum := 0;24 sum := 0;
25 assert(ï¿œnext = 0);
26 out := next::out;27 else28 [2 := ð ððððð[idx]; idx := idx + 1;
29 vð := vð + |A2| Ã ð; [2 := A2;
30 next := next + q[i] + [2;
31 ï¿œnext := ï¿œnext + q[i] + [2;
32 sum := sum + q[i];33 sum := sum + q[i];
34 assert(ï¿œnext = 0);
35 out := next::out;36 i := i + 1;37 assert(vð †2ð);
Figure 16: SmartSum and its transformed code. Underlinedparts are added by CheckDP. A1 = \ [0] + \ [1] Ã sum + \ [2] Ãq[i] + \ [3] Ãï¿œnext and A2 = \ [4] + \ [5] Ã sum + \ [6] Ã q[i] +\ [7] Ãï¿œnext.