property inference attacks on neural networks using

22
Property Inference Attacks on Neural Networks using Dimension Reduction Representations Tianhao Wang Dec 2019 1 Introduction Deep learning (DL) has gained widespread adoption in a large number of application fields. For example, they have shown great promise for sensitive tasks such as facial recognition [1, 2], speech recognition [3], medical diagnosis [4, 5] and many others. However, training deep neural networks (DNNs) is expensive in terms of the computational power and the amount of carefully labeled training data. Hence, the sharing of trained models is becoming more and more important in many communities [6, 7]. In the future, digital distribution platforms for purchase and sale of trained models may appear, by analogy with Google Play or App Store. Nevertheless, the logic behind a neural network often lacks transparency and interpretabil- ity. It is still unclear why certain hyper-parameter tuning tricks work and others do not. This limitation makes DNNs being treated as black boxes who can only produce an outcome. In the meanwhile, it also draws the privacy concern in model sharing: do we really know what we are sharing when we share a trained model ? Property Inference Attack is the task of inferring properties of a machine learning model regarding its training dataset, learning algorithm or learning target using only the param- eters of the trained model as prior knowledge. The target properties could range from the brightness of image training dataset to the optimization algorithm used for model training. These information could be useful for a malicious adversary. As a motivation, recent work [8, 9, 10, 11, 12, 13] has explored different watermarking techniques to protect the pre-trained deep neural networks from potential copyright infringements. A common way of embedding digital watermarks into neural networks is by adding a special regularization term to the loss function during the training process [10, 11, 12, 13]. An adversary may wish to infer the existence of watermark using property inference attack before attempting to remove the watermark. Another example can be a ML-based virus detection model trained on the exe- cution signatures of both malicious and benign software programs. An adversary may wish to use this model to learn properties of the sandbox testing environment in order to design virus that can evade detection. 1

Upload: others

Post on 28-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Property Inference Attacks on Neural Networks using

Property Inference Attacks on Neural Networks usingDimension Reduction Representations

Tianhao Wang

Dec 2019

1 Introduction

Deep learning (DL) has gained widespread adoption in a large number of application fields.For example, they have shown great promise for sensitive tasks such as facial recognition[1, 2], speech recognition [3], medical diagnosis [4, 5] and many others. However, trainingdeep neural networks (DNNs) is expensive in terms of the computational power and theamount of carefully labeled training data. Hence, the sharing of trained models is becomingmore and more important in many communities [6, 7]. In the future, digital distributionplatforms for purchase and sale of trained models may appear, by analogy with Google Playor App Store.

Nevertheless, the logic behind a neural network often lacks transparency and interpretabil-ity. It is still unclear why certain hyper-parameter tuning tricks work and others do not. Thislimitation makes DNNs being treated as black boxes who can only produce an outcome. Inthe meanwhile, it also draws the privacy concern in model sharing: do we really know whatwe are sharing when we share a trained model?

Property Inference Attack is the task of inferring properties of a machine learning modelregarding its training dataset, learning algorithm or learning target using only the param-eters of the trained model as prior knowledge. The target properties could range from thebrightness of image training dataset to the optimization algorithm used for model training.These information could be useful for a malicious adversary. As a motivation, recent work[8, 9, 10, 11, 12, 13] has explored different watermarking techniques to protect the pre-traineddeep neural networks from potential copyright infringements. A common way of embeddingdigital watermarks into neural networks is by adding a special regularization term to theloss function during the training process [10, 11, 12, 13]. An adversary may wish to inferthe existence of watermark using property inference attack before attempting to remove thewatermark. Another example can be a ML-based virus detection model trained on the exe-cution signatures of both malicious and benign software programs. An adversary may wishto use this model to learn properties of the sandbox testing environment in order to designvirus that can evade detection.

1

Page 2: Property Inference Attacks on Neural Networks using

This type of inference attack was first formulated by Ateniese et al. [14], and thenimproved by Ganju et al. [15, 16]. The basic idea involves training another machine learningmodel, which is formally referred as meta-classifier, to infer whether the target model has aproperty P or not. The effectiveness of the proposed inference scheme has been demonstratedon Hidden Markov Models (HMMs) and Support Vector Machine (SVM) classifiers whenusing a flattened parameter vector as the feature representation for models.

However, the proposed property inference scheme does not work well when applied todeep neural networks. Because of the over-parameterization of DNNs and the complexityof their architectures, it is much more challenging to train a meta-classifier than on simplermodels such as HMM. The large amount of equivalent forms of a neural network increase thedifficulty of training meta-classifiers when using naive flattened parameter vectors as models’feature representations. Therefore, we refine the inference scheme by investigating differentdata processing and feature engineering methods for feed-forward neural networks. Our goalis to reduce the difficulty of training meta-classifiers.

In this paper, we first review and extend previous works [15, 16, 12] about equivalentneural networks. We demonstrate that a deep neural network can be transformed to another(approximately) equivalent network through simple operations such as neuron permutationor constant multiplication. These equivalent networks significantly increase the difficulty fora meta-classifier to identify common features of DNNs with property P. We then presenttwo categories of techniques to address the problem. The first, hand-engineered method usenaive flattened parameter vector as the feature representation but have the meta-datasetbeing preprocessed or augmented first. The second method aims to automatically extractfeatures and reduce the dimension of feature representations for target models by using featureextraction techniques such as principal component analysis (PCA) [17] or autoencoder [18].

We evaluate these strategies on both synthetic and read-world dataset. We show thatboth hand-engineered and automatic methods improved the performance of inferring variousmodel properties.

Our contributions can be summarized as follows:

• We review the limitations of original property inference attack scheme using flattenedparameter vector as the model feature representations.

• We survey and summarize hand-engineered data processing methods to refine propertyinference attacks; we also introduce new feature representations of neural networksusing automatic feature extraction techniques: PCA and Autoencoder.

• We evaluate these strategies for different datasets and target properties. We demon-strate that PCA and autoencoders are great alternatives for hand-engineered approach.

2

Page 3: Property Inference Attacks on Neural Networks using

2 Related Work

Recent research has shown that deep learning-based intelligent systems around us are suf-fering from a number of privacy concerns, especially for leaking sensitive information aboutthe training data. Model inversion attack [19, 20, 21] reveals possible training data sam-ples that a deep learning model could have been trained on. Membership inference attack[21, 22, 23] tries to predict whether a specific data sample was in the model’s training dataset.Further work demonstrates how to use membership inference attack to determine whether amodel was trained by using any individual user’s personal information [24]. These researchdirections focus on the privacy of individual records in the dataset and are highly impor-tant. However, the above-mentioned works are orthogonal to ours: property inference attack[14, 15] focuses on the inference of sensitive global properties of the training dataset. Besides,property inference attack can also be applied for inferring sensitive properties of the trainingprocess.

Beyond data inference, deep learning models have been shown to be subject to a varietyof other attacks, such as, adversarial attacks on object detectors [25], malicious algorithmswhich poison the training data samples [26], and Trojan attacks which mislead the modelto misbehave in presence of an input trigger [27]. Property inference attack on deep neuralnetworks adds to this growing set of risks that users of machine learning must consider.

3 Background

This section provides a formal definition of property inference attack of deep neural networks.We first provide background knowledge for deep neural networks. We then briefly review thegeneral property inference scheme proposed in [14, 15].

3.1 Neural Networks

Neural networks are the most popular models for many machine learning tasks nowadays.A neural network is composed of multiple layers of computational units that process theoutput from the previous layer and produce input for the following layer. The first layerreceives input from a special layer known as input layer. The last layer is referred as outputlayer. The layers between input and output layers are referred as hidden layers. Formally,the computation of a neural network f for input x can be represented as

f(x) = Fk(Fk−1(...F2(F1(x))))

where k is the total number of computational layers (hidden layers and output layers) andFi represents the function computed by the i-th layer of the neural network.

The most commonly used computational layers are fully-connected layers, which are com-posed of multiple nodes (also called neurons) called perceptrons [28]. We use |Fi| to representthe number of nodes on that layer. Each fully-connected layer Fi has a multiplicative weight

3

Page 4: Property Inference Attacks on Neural Networks using

Wi ∈ R|Fi|×|Fi−1|, an additive bias bi ∈ R|Fi|, and a non-linear activation function σ. Eachcolumn of weight matrix Wi is formed by the outgoing weights of the neurons in layer Fi−1,while each row of Wi is formed by the incoming weights of the neurons in layer Fi. For afully-connected layer Fi and input oi−1, the output of that layer is

oi = Fi(oi−1) = σ(Wioi−1 + bi)

The non-linear activation function σ is used to introduce non-linearity to the model, whichlargely increases the representational power of the neural network. Commonly used activationfunctions are ReLU, LeakyReLU and Sigmoid:

ReLU(x) =

{0 x < 0

x x ≥ 0(1)

LeakyReLU(x) =

{αx x < 0

x x ≥ 0(2)

Sigmoid(x) =1

1 + e−x(3)

Given a training dataset, in order find the optimal set of parameters W for the model fso that it produces meaningful output, the neural network tries to minimize a loss functionL(y, f(x)) which penalizes the mismatches between true labels y and predicted labels f(x).Given a loss function, a training algorithm A is applied reduce the loss and consequentlyoptimize the objective function. A training algorithm A might also modify the loss functionL by adding a regularization term to avoid overfitting. Stochastic gradient descent (SGD)[29] and its variants are commonly used to as the optimizer in A.

3.2 Property Inference

Consider a model owner who trains a neural network f on the training set D for someclassification task. After completing the training, the model owner sells this trained modelto some customers so that they do not need to train their own model. Often, the customerscan further fine-tune the model to improve its performance with less computational resources[30, 31]. Property inference attack wants to answer the following question: given only modelf , can an adversary infer some properties of the training dataset D or training algorithm Asuch that the model owner does not intend to share?

In this paper, we assume that the adversary has white-box access to the target model,i.e. the adversary has full knowledge of the parameters and architectures of model f . Thisassumption is reasonable in practice. With the fast development of deep learning applications,there are many online model sharing platforms nowadays, e.g. Caffe Model Zoo [7], OpenML[32], where the parameters of models are also provided. Even if the model consumers are onlygiven black-box access to the model where only the query results are provided, the internalinformation of the black-box model can potentially be recovered through reverse engineering

4

Page 5: Property Inference Attacks on Neural Networks using

methods [33, 34]. The model architectures and parameters can be exposed from a sequenceof queries, and the consumers effectively gain white-box access.

Property inference exploits the idea that machine learning models trained on similardatasets or with similar learning algorithms will represent similar functions. The similarityof these functions should reflect in some common, inherent patterns of model parameters. Weshow a general scheme of property inference attack proposed in [14] in Figure 1. The objectiveof the adversary is to recognize the special patterns within the target model parameters toreveal some properties which the model producer might not want to release. To do this, theadversary needs to train a meta-classifier fmeta, which is another machine learning model, torecognize the patterns. As Figure 1 shows, to build a meta-classifier which predicts whethera target model has the property P or not (denoted as P), the adversary first trains a set ofshadow models on the same task as the target model; each shadow model fi is trained on adataset D′ similar to that of the target model, but constructed explicitly to either have or nothave the property P. After training the shadow classifiers, the adversary obtains the featurerepresentation Fi for each of the shadow models fi, which serves as the training set for themeta-classifier. For example, the feature representation of a logistic regression model could bea vector containing all of the coefficients of the features in the learned function. In the case ofneural networks, the simplest model encoding is by flattening the weights matrix Wi of eachneural network layer into a vector, and append all of the vectors with the bias bi as the featurerepresentation for shadow models. However, we show that this simple feature representationis at fault in the next section. The meta-training set Dmeta = {(F1, l1), ..., (Fk, lk)} are used totrain the meta-classifier fmeta, where each sample label li is P or P. Algorithm 1 summarizesthe process of training the meta-classifier to infer training set property P described above.Algorithm 1 can easily be extended for inferring properties of training algorithm.

Algorithm 1 Training of meta-classifier (for inferring dataset property P)

Input: ~D: the array of training sets, each of them are constructed to have property P or P~l: the array of labels, where each li ∈ {P,P}

Output: The meta-classifier fmeta1: Initialize meta-training set Dmeta = {∅}2: for i = 1 to k do3: fi ← train(Di)4: Fi ← getFeatureVectors(fi)5: Dmeta = Dmeta ∪ {(Fi, li)}6: end for7: fmeta ← train(Dmeta)8: return fmeta

5

Page 6: Property Inference Attacks on Neural Networks using

Figure 1: General Scheme of Property Inference Attack

4 Equivalence of Neural Networks

While the prior work of property inference attack has demonstrated to work well for modelslike HMM or SVM when using parameter vectors as the model feature representation [14], theattack scheme described in Section 3.2 does not perform well on deep neural networks. Weconjecture that a simple flattened parameter vector is not a good feature representation forneural networks. For a fully-connected neural network, simple operations can transform it toanother equivalent models that will have different vector representations. These propertiesalso imply why the loss function of neural networks are non-convex and usually containsthousands of local minima. Since two equivalent neural networks can have very differentvector representations, it is difficult for a meta-classifier to capture the common patternsamong them and inferring target properties.

4.1 Neuron Permutation

Consider the two neural networks shown in Figure 2. A node permutation on a hidden layerof a neural network f does not change its functionality. However, they will have differentflattened parameter vector representations. Formally, we can define neuron permutationequivalence property of fully-connected neural networks by using Group Theory, as follows:

Proposition 1. Consider a 2-layer neural network f and let H denote the set of nodes in itshidden layer. Define a transformation on neural networks (node permutation) Φσ : f → f ′

by applying a permutation σ to the set of hidden layer nodes. Then for any σ ∈ SH (thesymmetric group over H), and for every input x, we have f(x) = f ′(x).

The intuition of this proposition is very clear and the formal proof is omitted here.Recall that the number of neurons in each layer Fi is denoted by |Fi|. Then for each

hidden layer there are |Fi|! valid permutations. Hence, for a neural network f with k layers,there will be a total of

∏k−1i=1 |Fi|! permutation equivalents of f , including itself. By Stirling’s

Approximation [35], the number of equivalents of f grows exponentially with its width. Ifpermutation equivalence property is not taken into account, all of these equivalent models

6

Page 7: Property Inference Attacks on Neural Networks using

Figure 2: Two equivalent networks whose neurons have different orders

Figure 3: Two equivalent neurons obtained by weights multiplication

will have different flattened representations making it difficult for a naive meta-classifier tolearn useful patterns from them.

4.2 Weights Multiplication

Apart from neuron permutation equivalence, network layers with ReLU [36] or LeakyReLU[37] activation functions will also have multiplicative equivalence property. This property isillustrated in Figure 3. If we divide the incoming weights of a neuron by some constant β,and multiply the outgoing weights of that neuron by the same constant, the function of theneural network will remain the same as long as the activation is ReLU or LeakyReLU.

We can easily see that in this case, the number of equivalents of a given neural networkf becomes infinite. Hence it is difficult for the meta-classifier to learn this by itself withoutspecifically training it in a way that it recognizes this invariance.

7

Page 8: Property Inference Attacks on Neural Networks using

5 Hand-engineered Methods

Training a meta-classifier fmeta that is robust to the equivalent neural networks mentionedin Section 4 can be analogues to some similar requirements in computer vision tasks. Asan example, in facial recognition tasks, the classifier should recognize the same face afterrotations or changing in image brightness. In this section, we review two commonly usedhand-engineered methods, data preprocessing and data augmentation, to tackle the problem.

5.1 Data Preprocessing

Data preprocessing is a data mining technique which is used to transform the raw data in auseful and efficient format. For example, a frequently applied approach in computer visioncommunity to deal with rotated image problem is to first align the image in a canonical pose[38]. Similarly, in our case, we can preprocess each weight matrices Wi by transforming itinto a canonical form in order to make the inference task easier.

To address node permutation equivalence problem in Section 4.1, we sort the neurons ineach of the hidden layers based on a special metric that ensures all permutation equivalentshave the same flattened parameter vector representation. In our experiments, we use themagnitude of the sum of all outgoing weights of a node as the metric for sorting; othermetrics such as the norm of weights are also valid.

To address weights multiplication equivalence problem in Section 4.2, we normalize theincoming weights of all neurons in each layer, in the order of F1, F2, ..., Fk. We multiply thecorresponding normalizing constant to the weights of all outgoing connections of those nodesafter each operation of normalization.

5.2 Data Augmentation

Data augmentation is a strategy that enables one to significantly increase the diversity ofdata available for training models, without actually collecting new samples [39, 40]. Thebasic idea is to deliberately generate different versions of the same sample to increase thetotal size of training dataset, and hope that the classifier will learn to be invariant to thesechanges. The fundamental difference between data preprocessing and data augmentation isthat in the former you try to reduce the difficulties for the classifier, while in the latter youexpect the classifier itself to be able to capture these changes by providing more examples.

In our experiments in Section 7, we augment the meta-dataset for meta-classifier Dmeta

by randomly generating 9 equivalent models for each shadow neural networks using randomnode permutation and weights multiplication. The size of Dmeta is increased by a factor of 10in this way. When generating multiplicative equivalent models, we constrain the randomlychosen multiplicative constant β in Figure 3 between 0.1 and 10 to avoid generating modelswith extremely small or large weigths.

8

Page 9: Property Inference Attacks on Neural Networks using

6 Automatic Feature Extraction Methods

In the training of deep learning models, we often run into curse of dimensionality problemwhere the number of data records are not significantly larger than the number of features.Training a neural network with a lot of parameters using a scarce data set can easily leadto overfitting and poor generalization. High dimensionality also means longer training time.In our case of training a meta-classifier fmeta, the number of available samples are oftenlimited since training shadow models requires a large amount of time and computationalresources, while the number of features are usually very large since a neural network hasmany parameters. Therefore, we introduce automatic feature extraction methods, PCA andAutoencoder, to address those problems.

6.1 Principal Component Analysis

Principal component analysis (PCA) is a common technique for many data preparation tasks[41]. It aims to decompose high dimensional data into a low dimensional subspace componentand a noise component, which is useful for data compression as well as denoising (featureextraction). The main idea of PCA is to identify the most “meaningful” basis to re-expressa dataset, and hope that this new basis will filter out the noise and reveal hidden structureof the observed dataset. According to Hotelling [42], the most common definition of PCA isthat for a given set of observed data vectors xi, i ∈ 1..n, find d orthonormal axes that explainthe most variance in the observed dataset. We explain the association between the principalcomponents and eigenvectors of covariance matrix in Section 6.1.1, and then illustrate howto efficiently implement PCA via SVD algorithm in Section 6.1.2.

6.1.1 Eigen-decomposition Explanation

The top d principal components of the dataset can be directly obtained by performing eigen-value decomposition of the covariance matrix S of data matrix X, and selecting d normalizedeigenvectors that are associated with d largest eigenvalues of S. We explain why principalcomponents are eigenvectors of S by walking through how PCA chooses the first principalcomponent U1.

Stack all observations xi into the rows of an n×p matrix, X, where each row correspondsto a p-dimentional data representation. There are n observations in total. Dataset X isthen centered by having each column being subtracted by its means. PCA assumes thefirst principal component U1 be a linear combination of columns of X with weights w =[w1, ..., wp], i.e.

U1 = Xw

We want the observed data project on this principal axe to have most variance in the observeddataset.

Var(U1) = Var(Xw) = wTVar(X)w = wTSw

9

Page 10: Property Inference Attacks on Neural Networks using

where S = XTX/(n − 1) is the p × p covariance matrix of X. Clearly, Var(U1) can bearbitrarily large by increasing the magnitude of w. To make the optimization problem well-defined, we constrain w to have unit length. Therefore, we rewrite PCA as a contrainedoptimization problem:

maxwTSw

subject to wTw = 1

To solve this optimization problem, we introduce Lagrange multiplier λ:

L(w, λ) = wTSw − λ(wTw − 1)

Differentiating L with respect to w and λ gives p+ 1 equations:

Sw = λw (4)

wTw − 1 = 0 (5)

Multiply both sides of Equation 4 by wT , we get

wTSw = λwTw = λ

Hence Var(U1) is maximized only if λ is the largest eigenvalue of covariance matrix S, andoptimal weight vector w is the corresponding normalized eigenvectors. A simple extensioncan show that the first d principal components are given by the normalized eigenvectorsassociated with the largest d eigenvalues of the covariance matrix S of the observed dataset.

6.1.2 SVD Implementation

PCA is usually explained via an eigenvalue decomposition of covariance matrix, as in theprevious section. However, Singular Value Decomposition (SVD) is usually a much moredirect, efficient method to compute the principal component vectors.

Consider the centered data matrix X be of n×p shape where n is the number of observedsamples and p is the number of features. Then the p × p covariance matrix S is calculatedby S = XTX/(n− 1). Since it is a symmetric matrix, it is diagonalizable:

S = V DV T

where D is a diagonal matrix with eigenvalue λi in decreasing order in the diagonal, and Vis the matrix of the corresponding normalized eigenvectors. Clearly, V contains the principalcomponents for X we aim to obtain. However, algorithms for eigenvalue decomposition havetime complexity close to O(p3) [43]. Since p is usually very large in practice, it would be veryexpensive to compute S’s eigenvalue decomposition directly.

Instead, we perform singular value decomposition of X and obtain a decomposition:

X = UΣV T

10

Page 11: Property Inference Attacks on Neural Networks using

where U is a unitary matrix and Σ is the diagonal matrix of singular values si. We can easilysee that

S = (UΣV T )T (UΣV T )/(n− 1) = V ΣTUTUΣV T/(n− 1) = VΣ2

n− 1V T

Hence we know that the right singular vectors V are principal directions we aim to obtain, andthat singular values are related to the eigenvalues of covariance matrix via λi = s2i /(n− 1).The implementation of PCA via SVD is much more efficient and is able to handle sparsematrices. In addition, there are reduced forms of SVD which are even more economic tocompute.

In our experiments in Section 7, we perform PCA technique on the flattened parametervector to construct new feature representations for neural networks. We use PCA packagein scikit-learn [44] to perform PCA decomposition. The library function uses SVD methoddescribed above to implement PCA feature extraction.

6.2 Autoencoder

Autoencoder [18] is a nonlinear dimension reduction method that uses an adaptive, multilayerencoder network fe to transform the high-dimensional data into a low-dimensional encodingand a similar decoder network fd to recover the full data from the encoding, as visualized inFigure 4. We can take an unlabeled data x and construct the task of autoencoder training asa supervised learning task where the label is exact same as the instance x. The autoencoderoutputs x̂, a reconstruction of the original input x.

xe = fe(x) = Fk(Fk−1(...F2(F1(x))))

x̂ = fd(xe) = F ′k(F′k−1(...F

′2(F

′1(xe)))) = fd(fe(x))

where xe denotes the encoded representation for data x. This network can then be trainedby minimizing the reconstruction error L(x, x̂), which measures the differences between theoriginal input and the reconstruction. The required gradients are easily obtained by usingthe chain rule to backpropagate error derivatives first through the decoder network and thenthrough the encoder network. In our experiment, we use mean squared error (MSE) tomeasure the discrepancy between original and reconstructed parameter vector.

Neural autoencoder can be thought of as a more powerful nonlinear generalization of PCA.It is capable of learning nonlinear relationships because of the nonlinear activation functionssuch as ReLU or Sigmoid. In fact, if we were to construct a linear network, i.e. without usingnonlinear activation functions such as ReLU or Sigmoid at all of layer, we would observe asimilar dimensionality reduction as observed in PCA, as discussed by Hinton [45].

In our experiment, we use autoencoder to construct new feature representations for neuralnetworks from their original flattened parameter vector. We choose the dimension of newencodings through extensive trials.

11

Page 12: Property Inference Attacks on Neural Networks using

Figure 4: Detailed visualization of an autoencoder. First the input of dimension 6 passesthrough the encoder fe, which is a fully-connected NN, to produce the compressed codedimension 2. The decoder fd, which has the similar NN structure, reconstructs the outputonly using the code. The goal is to get an output identical with the input.

12

Page 13: Property Inference Attacks on Neural Networks using

7 Experiment

We evaluate the performance of the different adjustments to the original property inferenceattack algorithm. We describe the datasets and the properties to be inferred, followed bytraining settings and results in this section.

7.1 Dataset

7.1.1 Synthetic Dataset

We create a dataset where each data point contains 20 values independently drawn fromGaussian distribution G(0, σ) where the standard deviation σ of either 1, 3 or 10. The20 datapoints can be sorted or unsorted for each data item. Each binary classifier predictswhether the input comes from a Gaussian distribution with σ less than c. In the experiments,c is either set as 2 (i.e. try to distinguish G(0, 1) sample from others) or 5 (i.e. try todistinguish G(0, 10) from others). The following are the properties of the trained neuralnetworks we try to inference:

• Property of training: predicting if the target model was trained for 5 or 20 epochs.

• Property of data: predicting if the training data are sorted or not.

• Property of target problem: predicting if the classifier are trying to set the thresholdas 2 or 5 for the variance.

7.1.2 MNIST Dataset

MNIST [46] is a widely used hand-written digit recognition dataset. Each data item in thedataset is a 28x28 grayscale image. For each trained model, we want to infer whether or notit was trained on images whose brightness is randomly adjusted. To do this, for each digitimage in MNIST, we randomly choose the brightness factor from 0.1 to 1.9 and change thebrightness of the image. Figure 5 shows the same digit ‘5’ with different brightness afterprocessing. If the brightness factor is less than 1, the image becomes darken (Figure 5(b));the image will be brighten (Figure 5(c)) if the brightness factor is greater than 1.

7.2 Experiment Settings

In the evaluation, we compare the 4 approaches (data preprocessing, data augmentation,PCA, Autoencoder) to improve the original attack. We train all the neural network modelsusing PyTorch [47]. The experiments were conducted on Google Colaboratory with 25 GBRAM and a Tesla K80 GPU.

13

Page 14: Property Inference Attacks on Neural Networks using

Target Model For the synthetic dataset, we use a 2-layer fully-connected neural networkas the architecture of the target model with 20 neurons in the hidden layer. All of theparameters are used for inference task. For MNIST dataset, we use neural networks havingtwo convolutional layers followed by two fully-connected layers of sizes 320 and 50. Theparameters on the last two fully-connected layers are used for inference attack. We use ReLUas the activation function for all hidden fully-connected layers. For all of the experiments,we generate 400 shadow models for each property (800 in total) as the meta-training dataset,and 100 shadow models for each property (200 in total) serve as the meta-test dataset. Inall of the model training, we use Adam [48] optimizer with learning rate of 0.001. Both thesynthetic and MNIST dataset are easy to be trained on. All of the shadow models reach 90%accuracy after 5 epochs of training.

Meta-Classifier In the inference attack, the meta-classifers used are simple feed-forwardneural networks. The exact number of nodes in each layer are determined by hyper-parametertuning, and different tasks and adjustments have different structures of meta-classifiers. Table3 shows the size of meta-classifiers in different settings. As we can see, MNIST task requires aslightly larger meta-classifier networks since the dimension of parameter vector is larger andinference task is more difficult. The dimension of PCA and Autoencoder are also determinedby hand-tuning.

7.3 Attack Effectiveness

We report the inference accuracy with the different adjustments for target models trained onsynthetic dataset in Table 1, where one column for each property to be inferred. As we cansee, different strategies of inference attack method have different performances on differentkinds of inference task. For example, Augmentation technique has best performance ininferencing the number of training epochs; Autoencoder wins in the task of inferring theordering of training data; Augmentation and PCA performs best on inferring the trainingtarget. Besides, all the adjustments perform better than the baseline.

Inferring the brightness of MNIST image is clearly much tougher than the previous taskson synthetic dataset. As Table 2 shows, no adjustments have absolute better performanceover other techniques, while they all provide marginal gains over the baseline. We can alsosee from Table 3 that PCA and Autoencoder method requires far less parameters for meta-classifiers to achieve better results than hand-engineered methods. This is mainly due to thereduction in the dimension of feature representations from PCA and Autoencoder technique.

Figure 6 shows the inference performance using PCA and Autoencoder with differentdimension of feature representation. As the figure illustrates, PCA performs better when thenumber of features is small, while Autoencoder performs better when number of features islarger.

14

Page 15: Property Inference Attacks on Neural Networks using

(a) Original (b) Darken (c) Brighten

Figure 5: Change of Brightness for MNIST image.

Table 1: Comparison of Different Modifications on Synthetic Dataset

Meta-classifier Training Process Training Data Training Target

Baseline 65.0% 82.0% 64.0%Preprocessing 88.5% 89.0% 67.0%Augmentation 91.5% 88.5% 71.5%

PCA 86.5% 86.0% 72.0%Autoencoder 79.0% 92.0% 68.0%

7.4 Attack Efficiency

As described in Section 3.2, we need to train a set of shadow models to build a trainingdataset for the meta-classifier. This will cost a large amount of time and computationalresources. In Figure 7, we show the accuracy of the meta-classifier while varying the sizeof training dataset for meta-classifiers. We can see that Data Augmentation approach cantrain a meta-classifier with high accuracy using far less training data than the other threeapproaches. This might due to the fact that the size of training set increase for 10 timesafter augmentation.

Table 2: Comparison of Different Modifications on MNIST Dataset

Meta-classifier Accuracy of Inferring Jittered Brightness

Baseline 69%preprocessing 77.5%augmentation 75%

autoencoder 75.5%PCA 76.5%

15

Page 16: Property Inference Attacks on Neural Networks using

Table 3: Number of Parameters in Meta-classifier

Meta-classifier Synthetic Dataset MNIST Dataset

Hand-engineered 54.0K 21.1MAutoencoder 33.3K 263.1K

PCA 8449 131.5K

Figure 6: Prediction Accuracy for PCA and Autoencoder adjustments with different dimen-sions (Synthetic Dataset, Training Process)

Figure 7: Inference Accuracy with varying meta-training set size (Synthetic Dataset, TrainingProcess)

16

Page 17: Property Inference Attacks on Neural Networks using

8 Discussion

8.1 Comparison

On both synthetic and MNIST dataset, all of our discussed extensions prove to performbetter than the baseline in the explored tasks. Automatic feature extraction (PCA and Au-toencoder) method performs well and requires much smaller meta-classifiers to achieve goodinference accuracy, compared with hand-engineered methods. Besides, automatic methodsare usually simpler to implement than the two hand-engineering methods. We recommendautomatic feature extraction methods for anyone performing inference tasks on neural net-works who does not want to spend too much time hand processing the training data formeta-classifier. Interestingly, we find that combining data processing methods and auto-matic feature extraction methods (e.g. apply both data preprocessing and PCA) does notimprove the attack performance.

8.2 Limitations

Training data for Shadow Models To train the meta-classifier, the attacker first needstraining data to train the shadow classifier. In this work, we assume that the adversary hasthe access to the same or similar training dataset D′ that are used for the target model f ,which might not be the case in practice. However, there are many existing approaches tofacilitate the generation of training data. For example, the attacker could generate synthetictraining data for the shadow models using model inversion attacks.

Computational Cost for Generating Shadow Models To perform model inferenceattack, the attacker needs to train hundreds of shadow models, which requires huge compu-tational resources. Augmentation is one of the approaches to increase the size of data formeta-classifier. We plan to further study the method of efficiently generating new modelrepresentations to facilitate inference attack.

8.3 Potential Defense

Model Compression Model compression, i.e. the removal of connections between someneurons in the neural network, is a common post-processing operation of DNNs. We use theparameter pruning approach proposed in [49] to compress the target model, which is done bysetting α% of the parameters in W with the smallest absolute values to zeros. Table 4 showsthe inference accuracy on 10% compressed models. We can see that the inference accuracydrops significantly and very close to 50% (random guess), while the model accuracy remainsabove 95%.

Fine-tuning Deep neural networks have huge capacity for “memorizing” arbitrary infor-mation [30]. Fine-tuning involves updating the model parameters to learn some other dataset.

17

Page 18: Property Inference Attacks on Neural Networks using

Table 4: Compression Attack (Synthetic Dataset, inferring training process)

Baseline Preprocessing Augmentation Autoencoder PCA

Original 56.5% 88.5% 91.5% 86.5% 79.0%Compression Attack 50.0% 56.5% 62.0% 53.5% 50.5%

We think it could be employed instead as a potential defense against property inference at-tack. Model producers could use fine-tuning techniques to encode some other information(or noise) into the model, making the parameters look different from those of the shadowclassifiers, which might increase the difficulty of property inference task.

9 Conclusion and Future Work

We considered the problem of a property inference attack on fully-connected neural networkmodels. We review the hand-engineered data processing methods introduced in [15]. Wethen developed both automatic feature extraction methods to address the complexity offeature representations of neural networks. Specifically, we applied PCA and Autoencodermethod to leverage the complex structures of neural networks. We showed that automaticapproaches are as effective as hand-engineered method at inferring various data properties onboth synthetic and real-world datasets, while requiring less parameters in the meta-classifiers.

We also discussed the limitations and potential defense strategies to property inferenceattack, which provides some directions for future work, including generating training data fortarget models, reducing the computational burden to generate datasets for meta-classifiers,and developing countermeasures such as model compression and fine-tuning to the propertyinference attacks.

10 Acknowledgements

I would like to express my gratitude to Professor Chris Rycroft, for his valuable advice andsupport throughout the whole semester. This project will be impossible without his brilliantsuggestion of combining scientific computing, machine learning and computer security.

I also would like to thank my friends, especially, Wil Tan and Hui Li, for all the discussionswe have had, which have aided me a lot in my research.

18

Page 19: Property Inference Attacks on Neural Networks using

References

[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deepconvolutional neural networks,” in Advances in neural information processing systems,pp. 1097–1105, 2012.

[2] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June2016.

[3] G. Hinton, L. Deng, D. Yu, G. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke,P. Nguyen, B. Kingsbury, et al., “Deep neural networks for acoustic modeling in speechrecognition,” IEEE Signal processing magazine, vol. 29, 2012.

[4] J. Powles and H. Hodson, “Google deepmind and healthcare in an age of algorithms,”Health and technology, vol. 7, no. 4, pp. 351–367, 2017.

[5] A. Esteva, A. Robicquet, B. Ramsundar, V. Kuleshov, M. DePristo, K. Chou, C. Cui,G. Corrado, S. Thrun, and J. Dean, “A guide to deep learning in healthcare,” Naturemedicine, vol. 25, no. 1, pp. 24–29, 2019.

[6] M. Ribeiro, K. Grolinger, and M. A. Capretz, “Mlaas: Machine learning as a service,”in 2015 IEEE 14th International Conference on Machine Learning and Applications(ICMLA), pp. 896–902, IEEE, 2015.

[7] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, andT. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in Proceed-ings of the 22Nd ACM International Conference on Multimedia, MM ’14, (New York,NY, USA), pp. 675–678, ACM, 2014.

[8] Y. Adi, C. Baum, M. Cisse, B. Pinkas, and J. Keshet, “Turning your weakness intoa strength: Watermarking deep neural networks by backdooring,” in 27th {USENIX}Security Symposium ({USENIX} Security 18), pp. 1615–1631, 2018.

[9] J. Zhang, Z. Gu, J. Jang, H. Wu, M. P. Stoecklin, H. Huang, and I. Molloy, “Protectingintellectual property of deep neural networks with watermarking,” in Proceedings ofthe 2018 on Asia Conference on Computer and Communications Security, pp. 159–172,ACM, 2018.

[10] Y. Uchida, Y. Nagai, S. Sakazawa, and S. Satoh, “Embedding watermarks into deepneural networks,” in Proceedings of the 2017 ACM on International Conference on Mul-timedia Retrieval, ICMR ’17, (New York, NY, USA), pp. 269–277, ACM, 2017.

[11] T. Wang and F. Kerschbaum, “Attacks on digital watermarks for deep neural networks,”in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and SignalProcessing (ICASSP), pp. 2622–2626, May 2019.

19

Page 20: Property Inference Attacks on Neural Networks using

[12] T. Wang and F. Kerschbaum, “Robust and undetectable white-box watermarks for deepneural networks,” arXiv preprint arXiv:1910.14268, 2019.

[13] B. Darvish Rouhani, H. Chen, and F. Koushanfar, “Deepsigns: An end-to-end water-marking framework for ownership protection of deep neural networks,” in Proceedings ofthe Twenty-Fourth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems, pp. 485–497, ACM, 2019.

[14] G. Ateniese, G. Felici, L. V. Mancini, A. Spognardi, A. Villani, and D. Vitali, “Hack-ing smart machines with smarter ones: How to extract meaningful data from machinelearning classifiers,” arXiv preprint arXiv:1306.4447, 2013.

[15] K. Ganju, Q. Wang, W. Yang, C. A. Gunter, and N. Borisov, “Property inference attackson fully connected neural networks using permutation invariant representations,” inProceedings of the 2018 ACM SIGSAC Conference on Computer and CommunicationsSecurity, pp. 619–633, ACM, 2018.

[16] K. Ganju, Inferring properties of neural networks with intelligent designs. PhD thesis,2018.

[17] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” Chemometricsand intelligent laboratory systems, vol. 2, no. 1-3, pp. 37–52, 1987.

[18] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neuralnetworks,” science, vol. 313, no. 5786, pp. 504–507, 2006.

[19] M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit confi-dence information and basic countermeasures,” in Proceedings of the 22nd ACM SIGSACConference on Computer and Communications Security, pp. 1322–1333, ACM, 2015.

[20] M. Fredrikson, E. Lantz, S. Jha, S. Lin, D. Page, and T. Ristenpart, “Privacy inpharmacogenetics: An end-to-end case study of personalized warfarin dosing,” in 23rd{USENIX} Security Symposium ({USENIX} Security 14), pp. 17–32, 2014.

[21] S. Basu, R. Izmailov, and C. Mesterharm, “Membership model inversion attacks fordeep networks,” arXiv preprint arXiv:1910.04257, 2019.

[22] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership inference attacksagainst machine learning models,” in 2017 IEEE Symposium on Security and Privacy(SP), pp. 3–18, IEEE, 2017.

[23] S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha, “Privacy risk in machine learning:Analyzing the connection to overfitting,” in 2018 IEEE 31st Computer Security Foun-dations Symposium (CSF), pp. 268–282, IEEE, 2018.

20

Page 21: Property Inference Attacks on Neural Networks using

[24] C. Song and V. Shmatikov, “The natural auditor: How to tell if someone used yourwords to train their model,” arXiv preprint arXiv:1811.00513, 2018.

[25] J. Lu, H. Sibai, and E. Fabry, “Adversarial examples that fool detectors,” arXiv preprintarXiv:1712.02494, 2017.

[26] X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks on deep learningsystems using data poisoning,” arXiv preprint arXiv:1712.05526, 2017.

[27] Y. Liu, S. Ma, Y. Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attackon neural networks,” 2017.

[28] F. Rosenblatt, “The perceptron: a probabilistic model for information storage and or-ganization in the brain.,” Psychological review, vol. 65, no. 6, p. 386, 1958.

[29] L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proceed-ings of COMPSTAT’2010, pp. 177–186, Springer, 2010.

[30] A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “Cnn features off-the-shelf: an astounding baseline for recognition,” in Proceedings of the IEEE conference oncomputer vision and pattern recognition workshops, pp. 806–813, 2014.

[31] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deepneural networks?,” in Advances in neural information processing systems, pp. 3320–3328,2014.

[32] J. Vanschoren, J. N. Van Rijn, B. Bischl, and L. Torgo, “Openml: networked science inmachine learning,” ACM SIGKDD Explorations Newsletter, vol. 15, no. 2, pp. 49–60,2014.

[33] Z. Lin, X. Zhang, and D. Xu, “Automatic reverse engineering of data structures frombinary execution,” in Proceedings of the 11th Annual Information Security Symposium,p. 5, CERIAS-Purdue University, 2010.

[34] S. J. Oh, B. Schiele, and M. Fritz, “Towards reverse-engineering black-box neural net-works,” in Explainable AI: Interpreting, Explaining and Visualizing Deep Learning,pp. 121–144, Springer, 2019.

[35] R. Kreminski and J. Graham-Eagle, “Simpson’s rule for estimating n!(and proving stir-ling’s formula, almost),” International Journal of Mathematical Education in Scienceand Technology, vol. 32, no. 3, pp. 466–475, 2001.

[36] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann ma-chines,” in Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807–814, 2010.

21

Page 22: Property Inference Attacks on Neural Networks using

[37] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neuralnetwork acoustic models,” in Proc. icml, vol. 30, p. 3, 2013.

[38] Wikipedia contributors, “Data pre-processing — Wikipedia, the free encyclopedia,”2019. [Online; accessed 8-December-2019].

[39] Wikipedia contributors, “Data preparation — Wikipedia, the free encyclopedia,” 2019.[Online; accessed 8-December-2019].

[40] L. Perez and J. Wang, “The effectiveness of data augmentation in image classificationusing deep learning,” arXiv preprint arXiv:1712.04621, 2017.

[41] T. P. Minka, “Automatic choice of dimensionality for pca,” in Advances in neural infor-mation processing systems, pp. 598–604, 2001.

[42] H. Hotelling, “Analysis of a complex of statistical variables into principal components.,”Journal of educational psychology, vol. 24, no. 6, p. 417, 1933.

[43] B. Ghojogh, F. Karray, and M. Crowley, “Eigenvalue and generalized eigenvalue prob-lems: Tutorial,” arXiv preprint arXiv:1903.11240, 2019.

[44] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon-del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,”Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.

[45] G. Hinton, “Neural networks for machine learning, lecture 15a,” 2019.

[46] Y. LeCun, “The mnist database of handwritten digits,” http://yann. lecun.com/exdb/mnist/, 1998.

[47] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison,L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.

[48] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprintarXiv:1412.6980, 2014.

[49] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neuralnetworks with pruning, trained quantization and huffman coding,” arXiv preprintarXiv:1510.00149, 2015.

22