learning generative adversarial representations (gap) under … · 2020-01-06 · 1.1...

Learning Generative Adversarial RePresentations (GAP)under Fairness and Censoring Constraints

Jiachun Liao [email protected] of Electrical, Computer and Energy EngineeringArizona State UniversityTempe, AZ 85281, USA

Chong Huang [email protected] of Electrical, Computer and Energy EngineeringArizona State UniversityTempe, AZ 85281, USA

Peter Kairouz [email protected] AISeattle, WA 94305, USA

Lalitha Sankar [email protected] of Electrical, Computer and Energy EngineeringArizona State UniversityTempe, AZ 85281, USA

Editor:

AbstractWe present Generative Adversarial rePresentations (GAP) as a data-driven framework forlearning censored and/or fair representations. GAP leverages recent advancements in ad-versarial learning to allow a data holder to learn universal representations that decouplea set of sensitive attributes from the rest of the dataset. Under GAP, finding the optimalmechanism? decorrelating encoder/decorrelator is formulated as a constrained minimaxgame between a data encoder and an adversary. We show that for appropriately chosenadversarial loss functions, GAP provides censoring guarantees against strong information-theoretic adversaries and enforces demographic parity. We also evaluate the performance ofGAP on multi-dimensional Gaussian mixture models and real datasets, and show how a de-signer can certify that representations learned under an adversary with a fixed architectureperform well against more complex adversaries.

Keywords: Data censoring, Fairness, Adversarial Learning, Generative Adversarial Net-works, Minimax Games, Information Theory

1. Introduction

The use of deep learning algorithms for data analytics has recently seen unprecedentedsuccess for a variety of problems such as image classification, natural language processing,and prediction of consumer behavior, electricity use, political preferences, to name a few.The success of these algorithms hinges on the availability of large datasets, that often containsensitive information, and thus, may facilitate learning models that inherit societal biases

1

leading to unintended algorithmic discrimination on legally protected groups such as raceor gender. This, in turn, has led to censoring and fairness concerns and a growing body ofresearch focused on developing representations of the dataset with fairness and/or censoringguarantees. These techniques predominantly involve designing randomizing schemes, and inrecent years, distinct approaches with provable statistical censoring or fairness guaranteeshave emerged.

In the context of censoring, preserving the utility of published datasets while simulta-neously providing provable censoring guarantees is a well-known challenge. While context-free censoring solutions, such as differential censoring (Dwork et al., 2006b,a; Dwork, 2008;Dwork and Roth, 2014), provide strong worst-case censoring guarantees, they often lead toa significant reduction in utility. In contrast, context-aware censoring solutions, e.g., mu-tual information censoring (Rebollo-Monedero et al., 2010; Calmon and Fawaz, 2012; Sankaret al., 2013; Salamatian et al., 2015; Basciftci et al., 2016), achieve improved censoring-utilitytradeoff, but assume that the data holder has access to dataset statistics.

In the context of fairness, machine learning models seek to maximize predictive accuracy.Fairness concerns arise when models learned from datasets that include patterns of societalbias and discrimination inherit such biases. Thus, there is a need for actively decorrelat-ing sensitive and non-sensitive data. In the context of publishing datasets or meaningfulrepresentations that can be “universally” used for a variety of learning tasks, modifying thetraining data is the most appropriate and is the focus of this work. Fairness can then beachieved by carefully designing objective functions which approximate a specific fairnessdefinition while simultaneously ensuring maximal utility (Zemel et al., 2013; Calmon et al.,2017; Ghassami et al., 2018). This, in turn, requires dataset statistics.

Adversarial learning approaches for context-aware censoring and fairness have been stud-ied extensively (Edwards and Storkey, 2015; Abadi and Andersen, 2016; Raval et al., 2017;Huang et al., 2017; Tripathy et al., 2017),beutel2017data,madras2018learning,zhang2018mitigating.They allow the data curator to cleverly decorrelate the sensitive attributes from the restof the dataset. These approaches overcome the lack of statistical knowledge by taking adata-driven approach that leverages recent advancements in generative adversarial networks(GANs) (Goodfellow et al., 2014; Mirza and Osindero, 2014). However, most existing effortsfocus on extensive empirical studies without theoretical verification and focus predominantlyon providing guarantees for a specific classification task.

Figure 1: Generative adversarial model for censoring and fairness

2

1.1 Our Contributions.

This work introduces a general framework for context-aware censoring and fairness thatwe call generative adversarial representation (GAP) (see Figure 1). We provide preciseconnections to information-theoretic censoring and fairness formulations and derive game-theoretically optimal decorrelation schemes to compare against those learned directly fromthe data. While our framework can be generalized to learn an arbitrary representationusing an encoder-decoder structure, this paper primarily focuses on learning private/fairrepresentations of the data (of the same dimension). We list our main contributions below.

1. We introduce GAP, a framework for creating private/fair representations of data usingan adversarially trained conditional generative model. Unlike existing works, GAP canexploit different models (e.g., Figure 2) to create representations that are useful for avariety of classification tasks, without requiring the designer to model these tasks attraining time. We validate this observation via experiments on the GENKI (Whitehilland Movellan, 2012), HAR (Anguita et al., 2013) and COMPAS (ProPublica, 2016)datasets using the two GAP architectures presented on the left of Figure 2.

2. We show that via the choice of the adversarial loss function, our framework can capture arich class of statistical and information-theoretic adversaries. This allows us to comparedata-driven approaches directly against strong inferential adversaries (e.g., a maximum aposteriori probability (MAP) adversary with access to dataset statistics). We also showthat by carefully designing the loss functions in the GAP framework, we can enforcedemographic parity.

3. We make precise comparison between data-driven censoring/fairness methods and theminimax game-theoretic GAP formulation. For Gaussian mixture data, we derive game-theoretically optimal decorrelation schemes and compare them with those that are di-rectly learned in a data-driven fashion to show that the gap between theory and practice isnegligible. Furthermore, we propose using mutual information estimators to verify thatno adversary (regardless of their computational power) can reliably infer the sensitiveattribute from the learned representation.

1.2 Related Work

In the context of publishing datasets with censoring and utility guarantees, a number ofsimilar approaches have been recently considered. We briefly review them and clarify howour work is different. DP-based obfuscators for data publishing have been considered in(Hamm, 2016; Liu et al., 2017). The author in (Hamm, 2016) considers a deterministic,compressive mapping of the input data with differentially private noise added either beforeor after the mapping. The approach in (Liu et al., 2017) relies on using deep auto-encoders todetermine the relevant feature space to add differentially private noise, thereby eliminatingthe need to add noise to the original data. These novel approaches leverage minimax filtersand deep auto-encoders to allow non-malicious entities to learn some public features from thefiltered data, while preventing malicious entities from learning other sensitive features. Bothapproaches incorporate a notion of context-aware censoring and achieve better censoring-utility tradeoffs while using DP to enforce censoring. However, DP can still incur a significant

3

Neural network

Neural network

probabilistic transition matrix

VariationalAutoencoder

Figure 2: Different GAP models

utility loss since it assumes worst-case dataset statistics. Our approach models a rich class ofrandomization-based schemes via a generative model that allows the generative decorrelatorto tailor the noise to the dataset.

Our work is closely related to adversarial neural cryptography (Abadi and Andersen,2016), learning censored representations (Edwards and Storkey, 2015), censoring preservingimage sharing (Raval et al., 2017), censoring-preserving adversarial networks (Tripathy et al.,2017), and adversarially learning fair representation (Madras et al., 2018) in which adversar-ial learning is used to learn how to protect communications by encryption or hide/removesensitive information or generate fair representation of the data. Similar to these problems,our model includes a minimax formulation and uses adversarial neural networks to learndecorrelation schemes. However, in (Edwards and Storkey, 2015; Raval et al., 2017; Madraset al., 2018), the authors use non-generative auto-encoders to remove sensitive information.Instead, we use a GANs-like approach to learn decorrelation schemes that prevent an ad-versary from inferring the sensitive variable. Furthermore, these formulations uses weightedcombination of different loss functions to balance censoring with utility. We also go beyondin formulating a game-theoretic setting subject to a distortion constraint. These approachesare not equivalent because of the non-convexity (resp. concavity) of the minimax problemwith respect to the decorrelator (resp. adversary) neural network parameters and requiresnew methods to enforce the distortion constraint during the training process. The distortionconstraint allows us to directly limit the amount of distortion added to learn the private/fairrepresentation for a variety of learning tasks, which is crucial for preserving the utility of thelearned representation. Moreover, we compare the performance of the decorrelation schemeslearned in an adversarial fashion with the game-theoretically optimal ones for canonical syn-thetic data models thereby providing formal verification of decorrelation schemes that arelearned by competing against computational adversaries. Finally, we propose using mutualinformation as a criterion to certify that the representations we learned adversarially against

4

an attacker with a fixed architecture generalize against unseen attackers with (possibly) morecomplex architecture.

Fair representations using information-theoretic objective functions and constrained op-timization have been proposed in (Calmon et al., 2017; Ghassami et al., 2018). However,both approaches require the knowledge of dataset statistics, which are very difficult to obtainfor real datasets. We overcome the issue of statistical knowledge by taking a data-drivenapproach, i.e., learning the representation from the data directly via adversarial models. Incontrast to in-processing approaches that modify learning algorithms to ensure fair predic-tions (e..g, using linear programs in (Dwork et al., 2012; Fish et al., 2016) or via adversariallearning approach in (Zhang et al., 2018)), we focus on a pre-processing approach to ensurefairness for a variety of learning tasks.

Generative adversarial networks (GANs) have recently received a lot of attention in themachine learning community (Goodfellow et al., 2014; Mirza and Osindero, 2014). Ulti-mately, deep generative models hold the promise of discovering and efficiently internalizingthe statistics of the target signal to be generated. Using GANs to generate synthetic non-sensitive attributes and labels which ensure fairness while preserving the utility of the data(predicting the label) has been studied in (Xu et al., 2018; Sattigeri et al., 2018). Thegoal here is to develop a conditional GAN-based model to ensure fairness in the system bylearning to generate a fairer synthetic dataset using an unconstrained minimax game withcarefully designed loss functions corresponding with both fairness and utility. The syntheticdata is generated by a conditional generative adversarial network (GAN) which generatesthe non-sensitive attributes-label pair given the noise variable and the sensitive attribute.The utility is preserved by generating data that is very similar to the original data. To en-sure fairness, the generator generates data samples such that an auxiliary classifier trainedto predict the sensitive attribute from the synthetic data performs as poorly as possible. Themethods presented in these papers are very different from our method since we are focusingon creating a fair/private representations of the original data while preserving the utilityof the representation for a variety of learning tasks. There are different ways for enforcingfairness, and our work presents a framework that aids in achieving this goal. More work isneeded to be done in this area.

1.3 Outline

The remainder of our paper is organized as follows. We formally present our GAP model inSection 3. In Section 4, we present results for Gaussian mixture dataset models. Finally, weshowcase the performance of GAP on the GENKI, HAR and COMPAS datasets in Section5. All proofs and algorithms are deferred to appendices in the accompanying supplementarymaterials.

2. Preliminaries

We consider a dataset D with n entries where each entry is denoted as (S,X, Y ) whereS ∈ S is a collection of sensitive features, X ∈ X is a collection of non-sensitive features,and Y ∈ Y is the collection of target (non-sensitive) features to be learned. Let Y ∈ Y be aprediction of Y . We note that S, X, and Y can be a collection of features or labels (e.g., Scan be gender, race, and sexual orientation, while Y could be age, facial expression, etc.); for

5

ease of writing, we use the term variable to denote both single and multiple features/labels.Instances of X, S, and Y are denoted by x, s and y, respectively. We assume that eachentry (X,S, Y ) is independent and identically distributed (i.i.d.) according to P (X,S, Y ).

Recent results on fairness in learning applications guarantees that for a specific targetvariable, the prediction of a machine learning model is accurate with respect to (w.r.t.) thetarget variable but unbiased w.r.t. a sensitive variable. The three oft-used fairness measuresare demographic parity, equalized odds, and equality of opportunity. Demographic parityimposes the strongest fairness requirement via complete independence of the prediction ofthe target variable and sensitive variable, and thus, least favors (for correlated target andsensitive variables) utility (Hardt et al., 2016). Equalized odds ensures this independenceconditioned on the target variable thereby ensuring equal rates for true and false positives(wherein the target variable is binary) for all demographics. Equal opportunity ensuresequalized odds for the true positive case alone (Hardt et al., 2016).

For the sake of completeness, we review these definitions briefly. We note that thesedefinitions are often aimed at sensitive (S) and target (Y ) features that are binary, and inreviewing these definitions below, we make this assumption too. However, we note that thesedefinitions can be generalized to the non-binary setting; indeed, our own generalizations ofthese definitions as applied to representation setting do not make such an assumption.

Definition 1 ((Hardt et al., 2016)) A predictor f(S,X) = Y satisfies

• demographic parity w.r.t. the sensitive variable S, if Y and S are independent, i.e.,

Pr(Y = 1|S = 1) = Pr(Y = 1|S = 0) (1)

• equalized odds w.r.t. the sensitive variable S and target variable Y , if Y and S areindependent conditional on Y , i.e.,

Pr(Y = 1|S = 1, Y = y) = Pr(Y = 1|S = 0, Y = y), y ∈ {0, 1} (2)

• equality of opportunity w.r.t. the sensitive variable S and target variable Y , if Y andS are independent conditional on Y = 1, i.e.,

Pr(Y = 1|S = 1, Y = 1) = Pr(Y = 1|S = 0, Y = 1). (3)

We begin by first defining the notions of censoring and fairness for representations.In particular, as discussed thus far, in the censoring context, our goal is to introduce adefinition that ensures that the censored representation limits leakage of sensitive variablesfrom adversaries that can potentially learn it from the released data.

Definition 2 (Censored Representations) A representation Xr of X is censored w.r.t.the sensitive variable S against a learning adversary h(·), whose performance is evaluated viaa loss function `(h(Xr), S), if for an optimal adversarial strategy h∗ = argminhE[`(h(Xr), S)],

E[`(h∗(g(X)), S)] ≤ E[`(h∗(Xr), S)], (4)

where g(·) is any (randomized) function of X and the expectation is over h, g, X, and S.

6

To motivate the generation of fair representations, we now extend the definition of de-mographic parity for representations. Indeed it is known that fair representations can beused to ensure fair classification (see, for example, HARDT BLOG (Hardt et al., 2016)).We formally define fair representation and prove that such representations ensure fair clas-sification.

Definition 3 (Fair Representations) Let Xr and S be the supports of Xr and S, respec-tively. A representation Xr of X satisfies demographic parity w.r.t. the sensitive variable Sif Xr and S are independent, i.e., for any xr ∈ Xr and s, s′ ∈ S,

Pr(Xr = xr|S = s) = Pr(Xr = xr|S = s′). (5)

Given this definition, we now prove that a fair representation in the sense of demo-graphic parity will guarantee that any downstream learning algorithm making use of thefair representation is fair (in the sense of demographic parity) w.r.t. the sensitive label S.

Theorem 4 (Fair Learning via Fair Representation) Given a dataset consisting of sen-sitive, non-sensitive, and target variables (S,X, Y ), respectively, if a fair representationXr = g(X) satisfies demographic parity w.r.t. S, then any learning algorithm f : Xr → Ysatisfies demographic parity w.r.t. S.

The proof of Theorem 4 is in Appendix A.

Remark 5 Note that the definitions of equalized odds and equality of opportunity in Def. 1explicitly involve a downstream learning application, and therefore, the design of a fair rep-resentation needs to include a classifier explicitly. In contrast to the universal representationsetting considered here, such targeted representations and the ensuing fair classifiers provideguarantees only for those targeted Y features. In this limited context, however, one can stilldefine a representation Xr as ensuring equalized odds (w.r.t. to S) in classifying Y if thepredicted output learned from Xr, i.e., Y (Xr), is independent of S conditioned on Y .

One simple approach to obtain a fair/censored representation Xr is by choosing Xr = Nwhere N is a random variable independent of X and S. However, such an Xr has nodownstream utility (quantified, for example, via downstream task accuracy). More generally,the design of Xr has to ensure utility, and thus, there is a tradeoff between guaranteeingfairness/censoring and assuring a desired level of utility. GAP enables quantifying thesetradeoffs formally as described in the next section.

3. Generative Adversarial Representations (GAP) for Censoring/Fairness

Formally, GAP involves two components, an encoder and an adversary as shown in Fig. 1.The goal of the encoder g : X × S → Xr is to actively decorrelate S from X while that ofthe adversary h : Xr → S is to infer S. Thus, in general, g(X,S) is a randomized mappingthat outputs a representation Xr = g(X,S). Note that the design of g(·) depends on bothX and S; however, we note that S may not necessarily be an input to the encoder though itwill always affect the design of g(·) via the adversarial training process. In contrast, the roleof the adversary is captured via h(Xr), the adversarial decision rule (classifier) in inferring

7

the sensitive variable S as S = h(g(X)) from the representation g(X,S). In general, thehypothesis h can be a hard decision rule under which h(g(X)) is a direct estimate of S ora soft decision rule under which h(g(X)) = Ph(·|g(X)) is a distribution over S.

To quantify the adversary’s performance, we use a loss function `(h(g(X = x)), S = s)defined for every pair (x, s). Thus, the adversary’s expected loss w.r.t. X and S is L(h, g) ,E[`(h(g(X)), S)], where the expectation is taken over P (X,S) and the randomness in g andh.

Intuitively, the generative (since it randomizes to decorrelate) encoder would like tominimize the adversary’s ability to learn S reliably from Xr. This can be trivially achievedby releasing an Xr independent of X. However, such an approach provides no utility fordata analysts who want to learn non-sensitive variables Y from Xr. To overcome thisissue, we capture the loss incurred by perturbing the original data via a distortion functiond(xr, x), which measures how far the original dataX = x is from the processed dataXr = xr.Ensuring statistical utility in turn requires constraining the average distortion E[d(g(X), X)]where the expectation is taken over P (X,S) and the randomness in g.

3.1 Theoretical Results of GAP

To publish a censored or fair representation Xr , the data curator wishes to learn an encoderg that guarantees both censoring/fairness (in the sense that it is difficult for the adversaryto learn S from Xr) as well as utility (in the sense that it does not distort the original datatoo much). In contrast, for a fixed encoder g, the adversary would like to find a (potentiallyrandomized) function h that minimizes its expected loss, which is equivalent to maximizingthe negative of the expected loss. This leads to a constrained minimax game between theencoder and the adversary given by

ming(·)

maxh(·)

− L(h, g), s.t. E[d(g(X), X)] ≤ D, (6)

where the constant D ≥ 0 determines the allowable distortion for the generative decor-relator and the expectation is taken over P (X,S) and the randomness in g and h.

Our GAP framework places no restrictions on the adversary. Indeed, different loss func-tions and decision rules lead to different adversarial models (see Table 1). In what follows,we consider a general α-loss function `(h(g(X)), s) = α

α−1

(1− Ph(s|g(X))1− 1

α

), α > 1 in-

troduced in (?). We show that α-loss can capture various information-theoretic adversariesranging from a hard-decision adversary under the 0-1 loss function `(h(g(X)), s) = Ih(g(X))6=sto a soft-decision adversary under the log-loss function `(h(g(X)), s) = − logPh(s|g(X)).

Theorem 6 Under α-loss, the optimal adversary decision rule is a ‘α-tilted’ conditional dis-tribution P ∗h (s|g(X)) = P (s|g(X))α∑

s∈SP (s|g(X))α . The optimization in (6) reduces to min

g(·)−HA

α (S|g(X)),

where HAα (·|·) is the Arimoto conditional entropy.

Under the hard-decision rules in which the adversary uses a 0-1 loss function, the optimaladversarial strategy simplifies to using a MAP decision rule that maximizes P (s|g(X)). Fora soft-decision adversary under log-loss, the optimal adversarial strategy h∗ is P (s|g(X)) andthe GAP minimax problem in (6) simplifies to ming(·) I(g(X);S) subject to E[d(g(X), X)] ≤D, where I(g(X);S) is the mutual information (MI) between g(X) and S.

8

Loss function`(h(g(X)), s)

Optimal adversarialstrategy h∗ Adversary type

Squared loss (h(g(X))− S)2 E[S|g(X)] MMSE adversary

0-1 loss{

0 if h(g(X)) = S1 otherwise argmax

s∈SP (s|g(X)) MAP adversary

Log-loss log 1Ph(s|g(X)) P (s|g(X))

Belief refining adversaryleads to MI-censoring

α-loss αα−1

(1− Ph(s|g(X))1− 1

α

)P (s|g(X))α∑

s∈SP (s|g(X))α

Generalized beliefrefining adversary

Table 1: GAP under various loss functions

Corollary 7 Using α-loss, we can obtain a continuous interpolation between a hard-decisionadversary under 0-1 loss (α → ∞) and a soft-decision adversary under log-loss function(α→ 1).

Theorem 8 For sufficiently large distortion bound D, the constrained minimax optimizationin (6) generates a GAP Xr that is censored w.r.t. to S.

Theorem 9 Under log-loss, GAP enforces fairness subject to the distortion constraint. Asthe distortion increases, the ensuing fairness guarantee approaches ideal demographic parity.

The proofs of Theorem 6 , Corollary 7 and Theorem 9 are presented in Appendix B.Many notions of fairness rely on computing probabilities to ensure independence of sensitiveand target variables that are not easy to optimize in a data-driven fashion. In Theorem 9,we propose log-loss (modeled in practice via cross-entropy) in GAP as a proxy for enforcingfairness.

3.2 Data-driven GAP

Thus far, we have focused on a setting where the data holder has access to P (X,S). WhenP (X,S) is known, the data holder can simply solve the constrained minimax optimizationproblem in (6) (game-theoretic version of GAP) to obtain a decorrelation scheme that wouldperform best against a chosen type of adversary. In the absence of P (X,S), we proposea data-driven version of GAP that allows the data holder to learn decorrelation schemesdirectly from a dataset D = {(x(i), s(i))}ni=1. Under the data-driven version of GAP, werepresent the decorrelation scheme via a generative model g(X; θp) parameterized by θp.This generative model takes X as input and outputs X. In the training phase, the dataholder learns the optimal parameters θp by competing against a computational adversary : aclassifier modeled by a neural network h(g(X; θp); θa) parameterized by θa. In the evaluationphase, the performance of the learned decorrelation scheme can be tested under a strongadversary that is computationally unbounded and has access to dataset statistics. We followthis procedure in the next section.

In theory, the functions h and g can be arbitrary. However, in practice, we need torestrict them to a rich hypothesis class. Figure 1 shows an example of the GAP model inwhich the generative decorrelator and adversary are modeled as deep neural networks. For

9

a fixed h and g, if S is binary, we can quantify the adversary’s empirical loss using cross

entropy Ln(θp, θa) = − 1n

n∑i=1

s(i) log h(g(x(i); θp); θa) + (1− s(i)) log(1− h(g(x(i); θp); θa)). It

is easy to generalize cross entropy to the multi-class case using the softmax function. Theoptimal model parameters are the solutions to

minθp

maxθa

− Ln(θp, θa), s.t. ED[d(g(X; θp), X)] ≤ D, (7)

where the expectation is over D and the randomness in g.The minimax optimization in (7) is a two-player non-cooperative game between the

generative decorrelator and the adversary with strategies θp and θa, respectively. In prac-tice, we can learn the equilibrium of the game using an iterative algorithm (see Algorithm1 in Appendix C). We first maximize the negative of the adversary’s loss function in theinner loop to compute the parameters of h for a fixed g. Then, we minimize the decorre-lator’s loss function, which is modeled as the negative of the adversary’s loss function, tocompute the parameters of g for a fixed h. Observe that the distortion constraint in (7)makes our minimax problem different from what is extensively studied in previous works.To incorporate the distortion constraint, we use the penalty method (Lillo et al., 1993) toreplace the constrained optimization problem by adding a penalty to the objective func-tion. The penalty consists of a penalty parameter ρt multiplied by a measure of violation ofthe constraint at the tth iteration. The constrained optimization problem of the generativedecorrelator can be approximated by a series of unconstrained optimization problems withthe loss function Ln(θp, θa) + ρt(max{0,E[d(g(x(i); θp), x(i))] −D})2, where ρt is a penaltycoefficient increases with the number of iterations t. The algorithm and the penalty methodare detailed in Appendix C.

4. GAP for Gaussian Mixture Models

In this section, we focus on a setting where S ∈ {0, 1} and X is an m-dimensional Gaussianmixture random vector whose mean is dependent on S. Let P (S = 1) = q. Let X|S = 0 ∼N (−µ,Σ) and X|S = 1 ∼ N (µ,Σ), where µ = (µ1, ..., µm), and without loss of generality,we assume that X|S = 0 and X|S = 1 have the same covariance Σ.

We consider a MAP adversary who has access to P (X,S) and the censoring mechanism.The privatizer’s goal is to privatize X in a way that minimizes the adversary’s probabilityof correctly inferring S from X. In order to have a tractable model for the privatizer, wemainly focus on linear (precisely affine) GAP mechanisms X = g(X) = X+Z+β, where Zis an independently generated noise vector. This linear GAP mechanism enables controllingboth the mean and covariance of the privatized data. To quantify utility of the privatizeddata, we use the `2 distance between X and X as a distortion measure to obtain a distortionconstraint EX,X‖X − X‖

2 ≤ D.

4.1 Game-Theoretical Approach

Consider the setup where both the privatizer and the adversary have access to P (X,S).Further, let Z be a zero-mean multi-dimensional Gaussian random vector. Although otherdistributions can be considered, we choose additive Gaussian noise for tractability reasons.

10

Without loss of generality, we assume that β = (β1, ..., βm) is a constant parametervector and Z ∼ N (0,Σp). Following similar analysis in Gallager (2013), we can show thatthe adversary’s probability of detection is given by

P(G)d = qQ

(−α

2+

1

αln

(1− qq

))+ (1− q)Q

(−α

2− 1

αln

(1− qq

)), (8)

where α =√

(2µ)T (Σ + Σp)−12µ. Furthermore, since EX,X [d(X,X)] = EX,X‖X − X‖2 =

E‖Z + β‖2 = ‖β‖2 + tr(Σp), the distortion constraint implies that ‖β‖2 + tr(Σp) ≤ D.To make the problem more tractable, we assume both X and Z are independent multi-dimensional Gaussian random vectors with diagonal covariance matrices.

Theorem 10 Consider GAP mechanisms given by g(X) = X + Z + β, where X|S and Zare multi-dimensional Gaussian random vectors with diagonal covariance matrices Σ andΣp. Let {σ2

1, ..., σ2m} and {σ2

p1 , ..., σ2pm} be the diagonal entries of Σ and Σp, respectively.

The parameters of the minimax optimal censoring mechanism are

βi∗ = 0, σ∗pi

2 =

(|µi|√λ∗0− σ2

i , 0

)+

,∀i = {1, 2, ...,m},

where λ∗0 is chosen such thatm∑i=1

(|µi|√λ∗0− σ2

i

)+

= D. For this optimal mechanism, the

accuracy of the MAP adversary is given by (8) with α = 2

√√√√ m∑i=1

µ2i

σ2i+

(|µi|√λ∗0−σ2

i

)+ .

The proof of Theorem 10 is provided in Appendix C.1. We observe that the when σ2i is

greater than some threshold |µi|√λ∗0, no noise is added to the data on this dimension due to the

high variance. When σ2i is smaller than |µi|√

λ∗0, the amount of noise added to this dimension

is proportional to |µi|; this is intuitive since a large |µi| indicates the two conditionallyGaussian distributions are further away on this dimension, and thus, distinguishable. Thus,more noise needs to be added in order to reduce the MAP adversary’s inference accuracy.

4.2 Data-driven Approach

For the data-driven linear GAP mechanism, we assume the privatizer only has access to thedataset D with n data samples but not the actual distribution of (X,S). Computing theoptimal censoring mechanism becomes a learning problem. In the training phase, the dataholder learns the parameters of the GAP mechanism by competing against a computationaladversary modeled by a multi-layer neural network. When convergence is reached, we eval-uate the performance of the learned mechanism by comparing with the one obtained fromthe game-theoretic approach. To quantify the performance of the learned GAP mechanism,we compute the accuracy of inferring S under a strong MAP adversary that has access toboth the joint distribution of (X,S) and the censoring mechanism.

11

(X,S) S

censoring/FairEncoder

Adversary

β0...βm

σp0...

σpm

X

Z ∼ N (0,Σ1)

Figure 3: Neural network structure of linear GAP for Gaussian mixture data

Since the sensitive variable S is binary, we measure the training loss of the adversarynetwork by the empirical log-loss function

Ln(θp, θa) = − 1

n

n∑i=1

s(i) log h(g(x(i); θp); θa) + (1− s(i)) log(1− h(g(x(i); θp); θa)). (9)

For a fixed privatizer parameter θp, the adversary learns the optimal θ∗a by maximizing(9). For a fixed θa, the privatizer learns the optimal θ∗p by minimizing−Ln(h(g(X; θp); θa), S)

subject to the distortion constraint EX,X‖X − X‖2 ≤ D.

As shown in Figure 3, the privatizer is modeled by a two-layer neural network withparameters θp = {β0, ..., βm, σp0, ..., σpm}, where βk and σpk represent the mean and standarddeviation for each dimension k ∈ {1, ...,m}, respectively. The random noise Z is drawn fromam-dimensional independent zero-mean standard Gaussian distribution with covariance Σ1.Thus, we have Xk = Xk+βk+σpkZk. The adversary, whose goal is to infer S from privatizeddata X, is modeled by a three-layer neural network classifier with leaky ReLU activations.

To incorporate the distortion constraint into the learning process, we add a penalty termto the objective of the privatizer. Thus, the training loss function of the privatizer is givenby

L(θp, θa) = Ln(θp, θa) + ρmax{0, 1

n

n∑i=1

d(g(x(i); θp), x(i))−D}, (10)

where ρ is a penalty coefficient which increases with the number of iterations. The addedpenalty consists of a penalty parameter ρ multiplied by a measure of violation of the con-straint. This measure of violation is non-zero when the constraint is violated. Otherwise, itis zero.

4.3 Illustration of Results

We use synthetic datasets to evaluate the performance of the learned GAP mechanisms.Each dataset contains 20K training samples and 2K test samples. Each data entry issampled from an independent multi-dimensional Gaussian mixture model. We consider two

12

categories of synthetic datasets with P (S = 1) equal to 0.75 and 0.5, respectively. Both theprivatizer and the adversary in the GAP framework are trained on Tensorflow Abadi et al.(2016) using Adam optimizer with a learning rate of 0.005 and a minibatch size of 1000.The distortion constraint is enforced by the penalty method as detailed in supplement B(see (18)).

Figure 4 illustrates the performance of the learned GAP scheme against a strong theoret-ical MAP adversary for a 32-dimensional Gaussian mixture model with P (S = 1) = 0.75 and0.5. We observe that the inference accuracy of the MAP adversary decreases as the distortionincreases and asymptotically approaches (as expected) the prior on the sensitive variable.The decorrelation scheme obtained via the data-driven approach performs very well whenpitted against the MAP adversary (maximum accuracy difference around 0.7% compared tothe theoretical optimal). Furthermore, the estimated mutual information decreases as thedistortion increases. In other words, for the data generated by Gaussian mixture model withbinary sensitive variable, the data-driven version of GAP can learn decorrelation schemesthat perform as well as the decorrelation schemes computed under the theoretical version ofGAP, given that the generative decorrelator has access to the statistics of the dataset.

(a) Sensitive variable classification accuracy (b) Estimated mutual information between Sand X

Figure 4: Performance of GAP for Gaussian mixture models

5. GAP for Real Datasets

We apply our GAP framework to real-world datasets to demonstrate its effectiveness. TheGENKI dataset consists of 1, 740 training and 200 test samples. Each data sample is a16×16 greyscale face image with varying facial expressions. Both training and test datasetscontain 50% male and 50% female. Among each gender, we have 50% smile and 50% non-smile faces. We consider gender as sensitive variable S and the image pixels as publicvariable X. The HAR dataset consists of 561 features of motion sensor data collected by asmartphone from 30 subjects performing six activities (walking, walking upstairs, walkingdownstairs, sitting, standing, laying). We choose subject identity as sensitive variable S andfeatures of motion sensor data as public variable X. The dataset is randomly partitionedinto 8, 000 training and 2, 299 test samples. The COMPAS dataset includes COMPAS riskscores, recidivism records, race, gender, and other relevant attributes. We consider race as

13

the sensitive variable S and age category, juvenile felony count, juvenile misdemeanor count,prior count, charge degree, and charge description to be the public variable. The dataset ispartitioned into 6, 000 training and 1, 215 test samples. We train our models based on thedata-driven GAP presented in Section 3 using TensorFlow (Abadi et al., 2016).

5.1 Generative Decorrelator and Adversary Model

For the GENKI dataset, we consider two different decorrelator architectures: the feed-forward neural network decorrelator (FNND) and the transposed convolution neural net-work decorrelator (TCNND). The FNND architecture uses a feedforward multi-layer neuralnetwork to combine the low-dimensional random noise (100 × 1) and the original imagetogether (Figure 5). The TCNND takes a low-dimensional random noise and generateshigh-dimensional noise using a multi-layer transposed convolution neural network. The gen-erated high-dimensional noise is added to each pixel of the original image to produce theprocessed image (Figure 6). For the HAR dataset, we use the FNND architecture modeledby multi-layer feedforward neural networks.

5.2 GAP Architectures for Different Datasets

Figure 5: Feedforward neural network decorrelator

Figure 6: Transposed convolution neural network decorrelator

14

Figure 7: Convolutional neural network adversary

The FNND is modeled by a four-layer feedforward neural network. We first reshape eachimage to a vector (256× 1), and then concatenate it with a 100× 1 Gaussian random noisevector. Each entry in the noise vector is sampled independently from a standard Gaussiandistribution. We feed the entire vector to a four-layer fully connected (FC) neural network.Each layer has 256 neurons with a leaky ReLU activation function. Finally, we reshape theoutput of the last layer to a 16 × 16 image. To model the TCNND, we first generate a100 × 1 Gaussian random vector and use a linear projection to map the noise vector to a4×4×256 feature tensor. The feature tensor is then fed to an initial transposed convolutionlayer (DeCONV) with 128 filters (filter size 3× 3, stride 2) and a ReLU activation, followedby another DeCONV layer with 1 filter (filter size 3 × 3, stride 2) and a tanh activation.The output of the DeCONV layer is added to the original image to generate the processeddata. For both decorrelators, we add batch normalization (Ioffe and Szegedy, 2015) on eachhidden layer to prevent covariance shift and help gradients to flow. We model the adversaryusing convolutional neural networks (CNNs). This architecture outperforms most of othermodels for image classification (Krizhevsky et al., 2012; Szegedy et al., 2015).

Figure 7 illustrates the architecture of the adversary. The processed images are fedto two convolution layers (CONV) whose sizes are 3 × 3 × 32 and 3 × 3 × 64, respectively.Each convolution layer is followed by ReLU activation and batch normalization. The outputof each convolution layer is fed to a 2 × 2 maxpool layer (POOL) to extract features forclassification. The second maxpool layer is followed by two fully connected layers, eachcontains 1024 neurons with a batch normalization and a ReLU activation. Finally, theoutput of the last fully connected layer is mapped to the output layer, which contains twoneurons capturing the belief of the subject being a male or a female.

For the HAR dataset, We first concatenate the original data with a 100 × 1 Gaussianrandom noise vector. We then feed the entire 661 × 1 vector to a Feed Forward neuralnetwork with three hidden fully connected (FC) layers. Each hidden layer has 512 neuronswith a leaky ReLU activation. Finally, we use another FC layer with 561 neurons to generatethe processed data. For the adversary, we use a five-layer feedforward neural network. Thehidden layers have 512, 512, 256, and 128 neurons with leaky ReLU activation, respectively.The output of the last hidden layer is mapped to the output layer, which contains 30 neuronscapturing the belief of the subject’s identity. For both decorrelator and adversary, we adda batch normalization after the output of each hidden layer.

Since the COMPAS dataset consists of categorical data, we first perform the onehotencoding on the data and store the mapping function from the onehot encoding to thecategorical data. Then, we concatenate the encoded data with a 100× 1 standard Gaussian

15

random noise vector. We feed the entire vector to a feedforward neural network with twohidden FC layers. Each hidden layer has 256 neurons with a leaky ReLU activation. Wefeed the output of the last hidden layer to another FC layer to generate a real-valued featurevector. For each feature vector, we select the highest values within onehot encoding indexrange corresponding to each attribute of the original data to generate the processed onehotencoding. Finally, we use the stored mapping function to map the processed onehot encodingto categorical data. For the adversary, we use a two-layer feedforward neural network. Eachhidden layer has 256 neurons with leaky ReLU activation. We first onehot encode the dataand feed the encoded data to the adversary neural network. The output of the last hiddenlayer is mapped to the output layer, which captures the belief of the subject’s race. We alsoadd batch normalization after the output of each hidden layer.

5.3 Illustration of Results

The GENKI Dataset. Figure 8a illustrates the gender classification accuracy of the ad-versary for different values of distortion. It can be seen that the adversary’s accuracy ofclassifying the sensitive variable (gender) decreases progressively as the distortion increases.Given the same distortion value, FNND achieves lower gender classification accuracy com-pared to TCNND. An intuitive explanation is that the FNND uses both the noise vectorand the original image to generate the processed image. However, the TCNND generatesthe noise mask that is independent of the original image pixels and adds the noise mask tothe original image in the final step.

Differential censoring has emerged as the gold standard for data censoring. In thisexperiment, we vary the variance of the Laplace and Gaussian noise to connect the noiseadding mechanisms to local differential censoring (Duchi et al., 2013; Kairouz et al., 2016).In particular, we compute the differential censoring guarantees provided by independentLaplace and Gaussian noise adding mechanisms for different distortion values. The detailsare provided in Appendix E. In Table 2, we observe that even if a huge amount of noise isadded to each pixel of the image, the censoring risk (ε) is still high. Furthermore, adding ahuge noise deteriorates the expression classification accuracy.

(a) Gender vs. expression classification accu-racy

(b) Normalized mutual information estima-tion

Figure 8: censoring/fairness-utility tradeoff and mutual information estimation for GENKI

To demonstrate the effectiveness of the learned GAP schemes, we compare the genderclassification accuracy of the learned GAP schemes with adding uniform or Laplace noise.

16

Distortion D 1 2 3 4 5 100 1000Laplace Mechanism

ε5792.61 4096 3344.36 2896.31 2590.53 579.26 183.17

Gaussian Mechanism(δ = 10−6)

ε1918.24 1354.08 1107.57 959.18 857.76 191.82 60.66

Table 2: Differential censoring risk for different distortion values

Expression Original Data D = 1 D = 3 D = 5Classification Male Female Male Female Male Female Male Female

False Positive Rate 0.04 0.14 0.1 0.18 0.18 0.16 0.16 0.14False Negative Rate 0.16 0.02 0.2 0.08 0.26 0.12 0.24 0.24

Table 3: Error rates for expression classification using representation learned by FNNDExpression Original Data D = 1 D = 3 D = 5

Classification Male Female Male Female Male Female Male FemaleFalse Positive Rate 0.04 0.14 0.04 0.16 0.06 0.12 0.08 0.16False Negative Rate 0.16 0.02 0.2 0.08 0.2 0.14 0.18 0.16

Table 4: Error rates for expression classification using representation learned by TCNND

Figure 8a shows that for the same distortion, the learned GAP schemes achieve much lowergender classification accuracies than using uniform or Laplace noise. Furthermore, the es-timated mutual information I(X;S) normalized by I(X;S) also decreases as the distortionincreases (Figure 8b).

To evaluate the influence of GAP on other non-sensitive variable (Y ) classification tasks,we train another CNN (see Figure 7) to perform facial expression classification on datasetsprocessed by different decorrelation schemes. The trained model is then tested on the origi-nal test data. In Figure 8a, we observe that the expression classification accuracy decreasesgradually as the distortion increases. Even for a large distortion value (5 per image), theexpression classification accuracy only decreases by 10%. Furthermore, the estimated nor-malized mutual information I(X;Y )/I(X;Y ) decreases much slower than I(X;S)/I(X;S)as the distortion increases (Figure 8b).

Table 3 and 4 present different error rates for the facial expression classifiers trainedusing data representations created by different decorrelator architectures. We observe thatas distortion increases, the error rates difference for different sensitive groups decrease. Thisimplies the classifier’s decision is less biased to the sensitive variables when trained usingthe processed data. When D = 5, the differences are already very small. Furthermore, wenotice that the FNND architecture performs better in enforcing fairness but suffers fromhigher error rate. The images processed by FNND is shown in Figure 9. The decorrelatorchanges mostly eyes, nose, mouth, beard, and hair.

The HAR Dataset. Figure 10a illustrates the activity and identity classification ac-curacy for different values of distortion. The adversary’s sensitive variable (identity) classi-fication accuracy decreases progressively as the distortion increases. When the distortion is

17

Figure 9: Perturbed images with different per pixel distortion using FNND

small (D = 2), the adversary’s classification accuracy is already around 27%. If we increasethe distortion to 8, the classification accuracy further decreases to 3.8%. Figure 10a de-picts that even for a large distortion value (D = 8), the activity classification accuracy onlydecreases by 18% at most. Furthermore, Figure 10b shows that the estimated normalizedmutual information also decreases as the distortion increases.

(a) Identity vs. activity classification accu-racy

(b) Normalized mutual information estimation

Figure 10: censoring/fairness-utility tradeoff and mutual information estimation for HAR

The COMPAS Dataset. Figure 11 illustrates the race and recidivism classificationaccuracy for different values of distortion. We observe that the adversary’s sensitive variable(race) classification accuracy decreases progressively as the distortion increases. The non-sensitive variable (recidivism) classification accuracy decreases as the distortion increases.We also observe that the utility of the decorrelated dataset drops significantly even if a smallamount of distortion is added to the data. One possible reason is that the trained adversary’sclassification on race is more resilient to small perturbations than recidivism label. On onehand, it is difficult for the decorrelator to reduce the adversary’s race classification accuracy

18

given a small amount of distortion. On the other hand, the classification accuracy on therecidivism label deceases significantly due to the distortion added to the data.

Figure 11: Race vs. recidivism classification accuracy

6. Conclusion

We have introduced a novel adversarial learning framework for creating private/fair repre-sentations of the data with verifiable guarantees. GAPF allows the data holder to learn thedecorrelation scheme directly from the dataset (to be published) without requiring access todataset statistics. Under GAPF, finding the optimal decorrelation scheme is formulated asa game between two players: a generative decorrelator and an adversary. We have shownthat for appropriately chosen loss functions, GAPF can provide guarantees against stronginformation-theoretic adversaries, such as MAP and MI adversaries. It can also enforcefairness, quantified via demographic parity by using the log-loss function. We have alsovalidated the performance of GAPF on Gaussian mixture models and real datasets. Thereare several fundamental questions that we seek to address. An immediate one is to developtechniques to rigorously benchmark data-driven results for large datasets against computabletheoretical guarantees. More broadly, it will be interesting to investigate the robustness andconvergence speed of the decorrelation schemes learned in a data-driven fashion. In thispaper, we connect our objective function in GAPF with demographic parity. Since there isno single metric for fairness, this leaves room for designing objective functions that link toother fairness metrics such as equalized odds and equal opportunity.

Acknowledgments

We would like to acknowledge support for this project from the National Science Foundationvia grants CIF-1815361, CCF-1350914, and CIF-1422358. We would also like to thank Prof.Ram Rajagopal and Xiao Chen at Stanford for their help with estimating mutual informationfor the GENKI and HAR datasets.

19

Appendix A. Proof of Theorem 4

An encoder g(X) that satisfies demographic parity ensures that Xr is independent of S, i.e.,the mutual information I(S;Xr) = 0. Further, the downstream learning algorithm acts onlyon Xr to predict Y ; combining these, we have that, (S,X)−Xr − Y form a Markov chain.From Definition 3, since Xr is independent of S, I(S;Xr) = 0. From the data processinginequality and non-negativity of mutual information, we have that,

0 ≤ I(S; Y ) ≤ I(S;Xr) = 0. (11)

Therefore, S is independent of Y , and thus, from Definition 1, Y satisfies demographic parityw.r.t. S.Note that S can be collection of sensitive features. From the fact I(S;Xr) = 0 as well as thechain rule and non-negativity of mutual information, we have I(Xr;St) = 0 for any subsetof features St ⊂ S. Therefore, Y satisfies demographic parity w.r.t. any subset of featuresin S.

Appendix B. Theoretical Results of GAP

Our GAP framework places no restrictions on the adversary. Indeed, different loss functionsand decision rules lead to different adversarial models. In what follows, we will discussa variety of loss functions under hard and soft decision rules, and show how our GAPframework can recover several popular information theoretic censoring notions. We willalso show that we can obtain a continuous interpolation between a hard-decision adversaryunder 0-1 loss function and a soft-decision adversary under log-loss function using the α-lossfunction.

Hard Decision Rules. When the adversary adopts a hard decision rule, h(g(X)) isan estimate of S. Under this setting, we can choose `(h(g(X)), S) in a variety of ways. Forinstance, if S is continuous, the adversary can attempt to minimize the difference betweenthe estimated and true sensitive variable values. This can be achieved by considering asquared loss function

`(h(g(X)), S) = (h(g(X))− S)2, (12)

which is known as the `2 loss. In this case, one can verify that the adversary’s optimaldecision rule is h∗ = E[S|g(X)], which is the conditional mean of S given g(X). Furthermore,under the adversary’s optimal decision rule, the minimax problem in (6) simplifies to

ming(·)−mmse(S|g(X)) = −max

g(·)mmse(S|g(X)),

subject to the distortion constraint. Here mmse(S|g(X)) is the resulting minimum meansquare error (MMSE) under h∗ = E[S|g(X)]. Thus, under the `2 loss, GAP provides cen-soring guarantees against an MMSE adversary. On the other hand, when S is discrete (e.g.,age, gender, political affiliation, etc), the adversary can attempt to maximize its classifi-cation accuracy. This is achieved by considering a 0-1 loss function (Nguyen and Sanner,2013) given by

`(h(g(X)), S) =

{0 if h(g(X)) = S1 otherwise . (13)

20

In this case, one can verify that the adversary’s optimal decision rule is the maximum aposteriori probability (MAP) decision rule: h∗ = argmaxs∈S P (s|g(X)), with ties brokenuniformly at random. Moreover, under the MAP decision rule, the minimax problem in (6)reduces to

ming(·)−(1−max

s∈SP (s, g(X))) = min

g(·)maxs∈S

P (s, g(X))− 1, (14)

subject to the distortion constraint. Thus, under a 0-1 loss function, the GAP formulationprovides censoring guarantees against a MAP adversary.

Soft Decision Rules. Instead of a hard decision rule, we can also consider a broaderclass of soft decision rules where h(g(X)) is a distribution over S; i.e., h(g(X)) = Ph(s|g(X))for s ∈ S. In this context, we can analyze the performance under a log-loss

`(h(g(X)), s) = log1

Ph(s|g(X)). (15)

In this case, the objective of the adversary simplifies to

maxh(·)−E[log

1

Ph(s|g(X))] = −H(S|g(X)),

and that the maximization is attained at P ∗h (s|g(X)) = P (s|g(X)). Therefore, the optimaladversarial decision rule is determined by the true conditional distribution P (s|g(X)), whichwe assume is known to the data holder in the game-theoretic setting. Thus, under the log-loss function, the minimax optimization problem in (6) reduces to

ming(·)−H(S|g(X)) = min

g(·)I(g(X);S)−H(S),

subject to the distortion constraint. Thus, under the log-loss in (15), GAP is equivalent tousing MI as the censoring metric (Calmon and Fawaz, 2012).

The 0-1 loss captures a strong guessing adversary; in contrast, log-loss or information-lossmodels a belief refining adversary.

B.1 Proof of Theorem 6

Consider the α-loss function (?)

`(h(g(X)), s) =α

α− 1

(1− Ph(s|g(X))1− 1

α

), (16)

for any α > 1. Denoting Haα(S|g(X)) as the Arimoto conditional entropy of order α, one

can verify that

maxh(·)−E[

α

α− 1

(1− Ph(s|g(X))1− 1

α

)]= −Ha

α(S|g(X)),

which is achieved by a ‘α-tilted’ conditional distribution

P ∗h (s|g(X)) =P (s|g(X))α∑

s∈SP (s|g(X))α

.

21

Under this choice of a decision rule, the objective of the minimax optimization in (6) reducesto

ming(·)−Ha

α(S|g(X)) = ming(·)

Iaα(g(X);S)−Hα(S), (17)

where Iaα is the Arimoto mutual information and Hα is the Rényi entropy.

B.2 Proof of Corollary 7

For large α (α→∞), this loss approaches that of the 0-1 (MAP) adversary in the limit. Asα decreases, the convexity of the loss function encourages the estimator S to be probabilistic,as it increasingly rewards correct inferences of lesser and lesser likely outcomes (in contrastto a hard decision rule by a MAP adversary of the most likely outcome) conditioned onthe revealed data. As α → 1, (16) yields the logarithmic loss, and the optimal belief PS issimply the posterior belief. Therefore, using α-loss, we can obtain a continuous interpolationbetween a hard-decision adversary under 0-1 loss (α → ∞) and a soft-decision adversaryunder log-loss function (α→ 1).

B.3 Proof of Proposition 9

Let’s consider an arbitrary target variable Y which a user is interested in learning from thedata. The objective of the learning task is to train a good model that takes X to predictY . Thus, we have the Markov chain: S → X → X → Y , where Y is an estimate of Yfrom the trained machine learning model. According to data processing inequality, we haveI(S; X) ≥ I(S; Y ). As we have shown in the above analysis, for the log-loss function, theobjective of GAP is equivalent to minimizing I(S; X) , which is an upperbound on I(S; Y ).Notice that demographic parity requires S and Y to be independent, which is equivalent toI(S; Y ) = 0. Since mutual information is non-negative, GAP ensures fairness by minimizingan upperbound of I(S; Y ) subject to the distortion constraint under the log-loss function.As the distortion increases, the ensuing fairness guarantee approaches ideal demographicparity by enforcing I(S; Y ) ≤ I(S; X) = 0.

Appendix C. Alternate Minimax Algorithm

In this section, we present the alternate minimax algorithm to learn the GAP scheme from adataset. The alternating minimax censoring preserving algorithm is presented in Algorithm1. To incorporate the distortion constraint into the learning algorithm, we use the penaltymethod (Lillo et al., 1993) and augmented Lagrangian method (Eckstein and Yao, 2012) toreplace the constrained optimization problem by a series of unconstrained problems whosesolutions asymptotically converge to the solution of the constrained problem. Under thepenalty method, the unconstrained optimization problem is formed by adding a penalty tothe objective function. The added penalty consists of a penalty parameter ρt multipliedby a measure of violation of the constraint. The measure of violation is non-zero when theconstraint is violated and is zero if the constraint is not violated. Therefore, in Algorithm 1,the constrained optimization problem of the decorrelator can be approximated by a series

22

Algorithm 1 Alternating minimax censoring preserving algorithmInput: dataset D, distortion parameter D, iteration number T

Output: Optimal generative decorrelator parameter θp

procedure Alernate Minimax(D, D, T )

Initialize θ1p and θ1

a

for t = 1, ..., T do

Random minibatch of M datapoints {x(1), ..., x(M)} drawn from full dataset

Generate {x(1), ..., x(M)} via x(i) = g(x(i), s(i); θtp)

Update the adversary parameter θt+1a by stochastic gradient ascend for j epochs

θt+1a = θta + αt∇θta

1

M

M∑i=1

−`(h(x(i); θta), s(i)), αt > 0

Compute the descent direction ∇θtp l(θtp, θ

t+1a ), where

`(θtp, θt+1a ) = − 1

M

M∑i=1

`(h(g(x(i), s(i); θtp); θ

t+1a ), s(i))

subject to 1M

∑Mi=1[d(g(x(i), s(i); θ

tp), x(i))] ≤ D

Perform line search along ∇θtp l(θtp, θ

t+1a ) and update

θt+1p = θtp − αt∇θtp`(θ

tp, θ

t+1a )

Exit if solution converged

return θt+1p

of unconstrained optimization problems with the loss function

`(θtp, θt+1a ) =− 1

M

M∑i=1

`(h(g(x(i); θtp); θ

t+1a ), s(i)) + ρt(max{0, 1

M

M∑i=1

d(g(x(i); θtp), x(i))−D})2,

(18)

23

where ρt is a penalty coefficient which increases with the number of iterations t. For convexoptimization problems, the solution to the series of unconstrained problems will eventuallyconverge to the solution of the original constrained problem (Lillo et al., 1993).

The augmented Lagrangian method is another approach to enforce equality constraintsby penalizing the objective function whenever the constraints are not satisfied. Differentfrom the penalty method, the augmented Lagrangian method combines the use of a La-grange multiplier and a quadratic penalty term. Note that this method is designed forequality constraints. Therefore, we introduce a slack variable δ to convert the inequalitydistortion constraint into an equality constraint. Using the augmented Lagrangian method,the constrained optimization problem of the decorrelator can be replaced by a series ofunconstrained problems with the loss function given by

`(θtp, θt+1a , δ) =− 1

M

M∑i=1

`(h(g(x(i); θtp); θ

t+1a ), s(i)) +

ρt2

(1

M

M∑i=1

d(g(x(i); θtp), x(i)) + δ −D)2

(19)

− λt(1

M

M∑i=1

d(g(x(i); θtp), x(i)) + δ −D),

where ρt is a penalty coefficient which increases with the number of iterations t and λt is

updated according to the rule λt+1 = λt − ρt( 1M

M∑i=1

d(g(x(i); θtp), x(i)) + δ −D). For convex

optimization problems, the solution to the series of unconstrained problems formulated bythe augmented Lagrangian method also converges to the solution of the original constrainedproblem (Eckstein and Yao, 2012).

C.1 Proof of Theorem 10

Since EX,X [d(X,X)] = EX,X‖X − X‖2 = E‖Z + β‖2 = ‖β‖2 + tr(Σp), the distortionconstraint implies that ‖β‖2 + tr(Σp) ≤ D. Let us consider X = X + Z + β, where β ∈ Rand Σp is a diagonal covariance whose diagonal entries is given by {σ2

p1 , ..., σ2pm}. Given the

MAP adversary’s optimal inference accuracy in (8), the objective of the decorrelator is to

minβ,Σp

P(G)d (20)

s.t. ‖β‖2 + tr(Σp) ≤ D.

24

Define 1−qq = η. The gradient of P (G)

d w.r.t. α is given by

∂P(G)d∂α

=p

(− 1√

2πe−

(−α2 + 1α ln η)

2

2

)(−1

2− 1

α2ln η

)(21)

+ (1− p)

(− 1√

2πe−

(−α2 − 1α ln η)

2

2

)(−1

2+

1

α2ln η

)

=1

2√

2π

(pe−

(−α2 + 1α ln η)

2

2 + (1− p)e−(−α2 − 1

α ln η)2

2

)(22)

+ln η

α2√

2π

(pe−

(−α2 + 1α ln η)

2

2 − (1− p)e−(−α2 − 1

α ln η)2

2

).

Note that

pe−(−α2 + 1

α ln η)2

2

(1− p)e−(−α2 − 1

α ln η)2

2

=p

1− pe

(−α2 − 1α ln η)

2−(−α2 + 1

α ln η)2

2 =p

1− pe

2 ln η2 =

p

1− peln η = 1.

(23)

Therefore, the second term in (22) is 0. Furthermore, the first term in (22) is always positive.Thus, P (G)

d is monotonically increasing in α. As a result, the optimization problem in (20)is equivalent to

minβ,Σp

(2µ)T (Σ + Σp)−12µ (24)

s.t. ‖β‖2 + tr(Σp) ≤ D.

The objective function in (24) can be written as

2[µ1 µ2 . . . µm

]

1σ21+σ2

p1

0 . . . 0

0 1σ22+σ2

p2

. . . 0

......

. . ....

0 0 . . . 1σ2m+σ2

pm

2

µ1

µ2...µm

=m∑i=1

4µ2i

σ2i + σ2

pi

.

Thus, the optimization problem in (24) is equivalent to

minβ,σ2

p1,...,σ2

pm

m∑i=1

µ2i

σ2i + σ2

pi

(25)

s.t. ‖β‖2 + tr(Σp) ≤ Dσ2pi ≥ 0 ∀i ∈ {1, 2, ...m}.

Since a non-zero β does not affect the objective function but result in positive distortion,the optimal scheme satisfies β = (0, ..., 0). Furthermore, the Lagrangian of the above opti-mization problem is given by

L(σ2p1 , ..., σ

2pm , λ) =

m∑i=1

µ2i

σ2i + σ2

pi

+ λ0(

m∑i=1

σ2pi −D)−

m∑i=1

λiσ2pi , (26)

25

where λ = {λ0, ..., λm} denotes the Lagrangian multipliers associated with the constraints.Taking the derivatives of L(σ2

p1 , ..., σ2pm , λ) with respect to σ2

pi , ∀i ∈ {1, ...,m}, we have

∂L(σ2p1 , ..., σ

2pm , λ)

∂σ2pi

= − µ2i

(σ2i + σ2

pi)2

+ λ0 − λi. (27)

Notice that the objective function in (24) is decreasing in σ2pi ,∀i ∈ {1, ...,m}. Thus, the

optimal solution σ∗pi2 satisfies

m∑i=1

σ∗pi2 = D. By the KKT conditions, we have

∂L(σ2p1 , ..., σ

2pm , λ)

∂σ2pi

∣∣∣σ2pi

=σ∗pi2,λ=λ∗

= − µ2i

(σ2i + σ∗pi

2)2+ λ∗0 − λ∗i = 0. (28)

Since λ∗i , i ∈ {0, 1, ...,m} is dual feasible, we have λ∗i ≥ 0, i ∈ {0, 1, ...,m}. Therefore

λ∗0 ≥µ2i

(σ2i + σ∗pi

2)2.

If λ∗0 >µ2iσ4i, we have λ∗0 >

µ2i(σ2i+σ∗pi

2)2. This implies λ∗i > 0. Thus, by complementary

slackness, σ∗pi2 = 0. On the other hand, if λ∗0 <

µ2iσ4i, we have σ∗pi

2 > 0. Furthermore, by the

complementary slackness condition, λ∗iσ∗pi

2 = 0, ∀σ∗pi2. This implies λ∗i = 0, ∀σ∗pi

2 > 0. As aresult, for all σ∗pi

2 > 0, we have

|µi|√λ∗0

= σ2i + σ∗pi

2. (29)

Therefore, σ∗pi2 = max{ |µi|√

λ∗0− σ2

i , 0} =

(|µi|√λ∗0− σ2

i

)+

withm∑i=1

σ∗pi2 = D. Substitute this

optimal solution into (8) with α =√

(2µ)T (Σ + Σp)−12µ, we obtain the accuracy of theMAP adversary.

We observe that the when σ2i is greater than some threshold |µi|√

λ∗0, no noise is added

to the data on this dimension due to the high variance. When σ2i is smaller than |µi|√

λ∗0,

the amount of noise added to this dimension is proportional to |µi|; this is intuitive sincea large |µi| indicates the two conditionally Gaussian distributions are further away on thisdimension, and thus, distinguishable. Thus, more noise needs to be added in order to reducethe MAP adversary’s inference accuracy.

Appendix D. Mutual Information Estimation

Our GAP framework offers a scalable way to find a (local) equilibrium in the constrainedmin-max optimization, under certain attacks (e.g. attacks based on a neural network). Yetthe privatized data, through our approach, should be immune to any general attacks andultimately achieving the goal of decreasing the correlation between the privatized data and

26

the sensitive labels. Therefore we use the estimated mutual information to certify that thesensitive data indeed is protected via our framework.

We use the nearest k-th neighbor method(Kraskov et al., 2004) to estimate the entropyH given by

H(X) = ψ(N)− ψ(k) + log(cd) +d

N

N∑i=1

log ri (30)

where ri is the distance of the i-th sample xi to its k-th nearest neighbor, ψ is the digammafunction, cd = πd/2

Γ(1+d/2) in Euclidean norm, and N is the number of samples. Notice that

X is learned representation and S is the sensitive variable. Then, we calculate the mutualinformation using I(X;S) = H(X)− H(X|S)

For a binary sensitive variable, we can simplify the empirical MI to

I(X;S) = H(X)−(P (S = 1)H(X|S = 1) + P (S = 0)H(X|S = 0)

), (31)

where P (S = 1) and P (S = 0) can be approximated by the empirical probability.One noteworthy difficulty is that X usually lives in high dimensions (e.g. each image

has 256 dimensions in GENKI dataset) which is almost impossible to calculate the empiricalentropy based on raw data due to the sample complexity. Thus, we train a neural networkthat classifies the sensitive variable from the learned data representation to reduce thedimension of the data. We choose the layer before the softmax outputs (denoted by Xf )to be the feature embedding that has a much lower dimension than original X and alsocaptures the information about the sensitive variable. We use Xf as a surrogate of X forestimating the entropy. The resulting approximate MI is

I(Xf ;S) = H(Xf )− H(Xf |S)

= H(Xf )−(P (S = 1)H(Xf |S = 1) + P (S = 0)H(Xf |S = 0)

).

Following the same manner, the MI between the learned representation X and the label Y isapproximated by I(Xf ;Y ), where Xf is the feature embedding that represents a privatizedimage X.

For the GENKI dataset, we construct a CNN initialized by two conv blocks, then followedby two fully connected (FC) layers, and lastly ended with two neurons having the softmaxactivations. In each conv block, we have a convolution layer consisting of filters with thesize equals 3× 3 and the stride equals 1, a 2× 2 max-pooling layer with the stride equals 2,and a ReLU activation function. Those two conv blocks have 32 and 64 filters respectively.We flatten out the output of second conv block yielding a 256 dimension vector. Theextracted features from the second conv layers is passed through the first FC layer withbatch normalization and ReLU activation to get a 8-dimensional vector, followed with thesecond FC layer to output a 2 dimensional vector that applied with the softmax function.The aforementioned 8-dimensional vector is the feature embedding vector Xf in our empiricalMI estimation.

Estimating mutual information for HAR dataset has a slightly different challenge, asthe size of the alphabet for the sensitive label (i.e. identity) is 30. Thus, it requires at

27

least 30 neurons prior to the output layer of the corresponding classification task. In factwe pose 128 neurons before the final softmax output layer in order to get a reasonablygood classification accuracy. Using the 128-dimensional vector as our feature embeddingto calculate mutual information is almost impossible due to the curse of dimensionality.Therefore, we apply Principal Component Analysis (PCA), shown in Figure 12, and pickthe first 12 components to circumvent this issue. The resulting 12-dimensional vector isconsidered to be an approximate feature embedding that encapsulates the major informationof the processed data.

(a) Top 32 principal components out of the 561 features withdifferent distortion D

Figure 12: PCA for processed data in HAR

Appendix E. Compute differential censoring risk for the Laplace andGaussian noise adding mechanisms

In the local differential censoring setting, we consider the value of each pixel as the outputof the query function. Since the pixel values of the GENKI dataset are normalized between0 and 1, we can use the dimension of the image to bound the sensitivities for both Laplacemechanism and Gaussian mechanism. For the Laplace mechanism, let X and X ′ be twodifferent image vectors. Since the value of X and X ′ in each dimension is between 0 and1, |Xi − X ′i| <= 1 for the ith dimension. Thus, the L1 sensitivity can be bounded by thedimension of the image d. Notice that the variance of the Laplace noise parameterized byλ is 2λ2. Since we are adding independent Laplace noise to each pixel, given the per imagedistortion D, we have

D

d= 2

d2

ε2=⇒ ε = d

√2d

D.

28

Similarly, for the Gaussian mechanism, we can bound the L2 sensitivity by√d. For a fixed

δ, the censoring risk ε for adding Gaussian noise with variance σ2 is given by

ε = 2

√d

σ

√ln(

1.25

δ).

Since we are adding Gaussian noise independently to each pixel, the variance of the per pixelnoise in given by σ = D

d . As a result, we have

ε = 2d√D

√ln(

1.25

δ).

References

Martín Abadi and David G Andersen. Learning to protect communications with adversarialneural cryptography. arXiv preprint arXiv:1610.06918, 2016.

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro,Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scalemachine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467,2016.

Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, and Jorge Luis Reyes-Ortiz.A public domain dataset for human activity recognition using smartphones. In ESANN,2013.

Y. O. Basciftci, Y. Wang, and P. Ishwar. On privacy-utility tradeoffs for constrained datarelease mechanisms. In 2016 Information Theory and Applications Workshop (ITA), pages1–6, Jan 2016. doi: 10.1109/ITA.2016.7888175.

F. P. Calmon and N. Fawaz. Privacy against statistical inference. In Communication,Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on, pages1401–1408, 2012.

Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, andKush R Varshney. Optimized pre-processing for discrimination prevention. In Advancesin Neural Information Processing Systems, pages 3992–4001, 2017.

John Duchi, Michael Jordan, and Martin Wainwright. Local privacy and statistical minimaxrates. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposiumon, pages 429–438. IEEE, 2013.

Cynthia Dwork. Differential privacy: A survey of results. In International Conference onTheory and Applications of Models of Computation, pages 1–19. Springer, 2008.

Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found.Trends Theor. Comput. Sci., 9(3–4):211–407, August 2014. ISSN 1551-305X. doi: 10.1561/0400000042. URL http://dx.doi.org/10.1561/0400000042.

29

http://dx.doi.org/10.1561/0400000042

Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor.Our data, ourselves: Privacy via distributed noise generation. In Annual InternationalConference on the Theory and Applications of Cryptographic Techniques, pages 486–503.Springer, 2006a.

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise tosensitivity in private data analysis. In Theory of cryptography conference, pages 265–284.Springer, 2006b.

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairnessthrough awareness. In Proceedings of the 3rd innovations in theoretical computer scienceconference, pages 214–226. ACM, 2012.

Jonathan Eckstein and W Yao. Augmented lagrangian and alternating direction methods forconvex optimization: A tutorial and some illustrative computational results. RUTCORResearch Reports, 32, 2012.

Harrison Edwards and Amos Storkey. Censoring representations with an adversary. arXivpreprint arXiv:1511.05897, 2015.

Benjamin Fish, Jeremy Kun, and Ádám D Lelkes. A confidence-based approach for balancingfairness and accuracy. In Proceedings of the 2016 SIAM International Conference on DataMining, pages 144–152. SIAM, 2016.

Robert G Gallager. Stochastic processes: theory for applications. Cambridge UniversityPress, 2013.

AmirEmad Ghassami, Sajad Khodadadian, and Negar Kiyavash. Fairness in supervisedlearning: An information theoretic approach. arXiv preprint arXiv:1801.04378, 2018.

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, SherjilOzair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances inneural information processing systems, pages 2672–2680, 2014.

Jihun Hamm. Minimax filter: Learning to preserve privacy from inference attacks. arXivpreprint arXiv:1610.03577, 2016.

Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of opportunity in supervised learning.In Advances in neural information processing systems, pages 3315–3323, 2016.

Chong Huang, Peter Kairouz, Xiao Chen, Lalitha Sankar, and Ram Rajagopal. Context-aware generative adversarial privacy. Entropy, 19(12):656, 2017.

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network trainingby reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.

Peter Kairouz, Sewoong Oh, and Pramod Viswanath. Extremal mechanisms for local differ-ential privacy. J. Mach. Learn. Res., 17(1):492–542, January 2016. ISSN 1532-4435. URLhttp://dl.acm.org/citation.cfm?id=2946645.2946662.

30

http://dl.acm.org/citation.cfm?id=2946645.2946662

Alexander Kraskov, Harald Stögbauer, and Peter Grassberger. Estimating mutual informa-tion. Physical review E, 69(6):066138, 2004.

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deepconvolutional neural networks. In Advances in neural information processing systems,pages 1097–1105, 2012.

Walter E Lillo, Mei Heng Loh, Stefen Hui, and Stanislaw H Zak. On solving constrainedoptimization problems with neural networks: A penalty method approach. IEEE Trans-actions on neural networks, 4(6):931–940, 1993.

Changchang Liu, Supriyo Chakraborty, and Prateek Mittal. Deeprotect: Enabling inference-based access control on mobile sensing applications. arXiv preprint arXiv:1702.06159,2017.

David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Learning adversariallyfair and transferable representations. arXiv preprint arXiv:1802.06309, 2018.

Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprintarXiv:1411.1784, 2014.

Tan Nguyen and Scott Sanner. Algorithms for direct 0–1 loss optimization in binary classi-fication. In International Conference on Machine Learning, pages 1085–1093, 2013.

ProPublica. Compas recidivism risk score data and analysis, 2016. URL https://github.com/propublica/compas-analysis. Accessed 02/07/2019.

Nisarg Raval, Ashwin Machanavajjhala, and Landon P Cox. Protecting visual secrets usingadversarial nets. In CVPR Workshop Proceedings, 2017.

D. Rebollo-Monedero, J. Forne, and J. Domingo-Ferrer. From t-Closeness-Like Privacy toPostrandomization via Information Theory. IEEE Transactions on Knowledge and DataEngineering, 22(11):1623–1636, November 2010. ISSN 1041-4347. doi: 10.1109/TKDE.2009.190.

S. Salamatian, A. Zhang, F. P. Calmon, S. Bhamidipati, N. Fawaz, B. Kveton, P. Oliveira,and N. Taft. Managing your private and public data: Bringing down inference attacksagainst your privacy. 9(7):1240–1255, 2015. doi: 10.1109/JSTSP.2015.2442227. URLhttp://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7118663.

L. Sankar, S. R. Rajagopalan, and H. V. Poor. Utility-privacy tradeoffs in databases: Aninformation-theoretic approach. IEEE Transactions on Information Forensics and Secu-rity, 8(6):838–852, 2013. doi: 10.1109/TIFS.2013.2253320.

Prasanna Sattigeri, Samuel C Hoffman, Vijil Chenthamarakshan, and Kush R Varshney.Fairness gan. arXiv preprint arXiv:1805.09910, 2018.

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov,Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, et al. Going deeper with con-volutions. Cvpr, 2015.

31

https://github.com/propublica/compas-analysis

https://github.com/propublica/compas-analysis

http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7118663

Ardhendu Tripathy, Ye Wang, and Prakash Ishwar. Privacy-preserving adversarial networks.arXiv preprint arXiv:1712.07008, 2017.

Jacob Whitehill and Javier Movellan. Discriminately decreasing discriminability withlearned image filters. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEEConference on, pages 2488–2495. IEEE, 2012.

Depeng Xu, Shuhan Yuan, Lu Zhang, and Xintao Wu. Fairgan: Fairness-aware generativeadversarial networks. arXiv preprint arXiv:1805.11202, 2018.

Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair repre-sentations. In International Conference on Machine Learning, pages 325–333, 2013.

Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. Mitigating unwanted biases withadversarial learning. arXiv preprint arXiv:1801.07593, 2018.

32

learning generative adversarial representations (gap) under … · 2020-01-06 · 1.1...

Documents