improving the adversarial robustness of convnets by...

8
Improving the adversarial robustness of ConvNets by reduction of input dimensionality Akash V. Maharaj Department of Physics, Stanford University [email protected] Abstract We show that the adversarial robustness of deep neu- ral networks can be improved by reducing the dimension- ality of input data. Adversarial examples are generated by small perturbations of regular input data, and while these changes are typically imperceptible to the human eye, they can result in catastrophic losses of accuracy in neural net- works. This has been shown to be a generic property of quasi-linear classifiers in high dimensional spaces; thus, it is intuitively possible that reducing the dimensionality of the inputs leaves less ‘space’ for adversarial examples to exist. We present detailed results demonstrating this im- proved robustness (with small sacrifices in accuracy) for 5 layer networks on the CIFAR10 dataset, as well as for the deeper CaffeNet and GoogLeNet architectures on the Tiny-Imagenet-200 dataset. In particular, a reduction in the number of allowed pixel intensities from 256 to 4, which corresponds to robustness to perturbations of up to 32 units of intensity, results in an accuracy reduction of only 6% for CIFAR10 and 8% for Tiny-Imagenet. 1. Introduction and related work Convolutional neural networks (ConvNets) have by now exhibited better-than-human performance[1] in vision clas- sification problems. It was therefore both surprising, and a bit depressing, when Szegedy et al.[2] discovered that these classifiers are remarkably vulnerable to so called adversarial examples - small, deliberate changes of the pixel intensities, designed to fool the classifier. While such small changes are virtually imperceptible to humans, state-of-the-art Con- vNets typically classify such images incorrectly, but with very high confidence. Even more worryingly, ConvNets can be convincingly fooled by tuning the pixel intensities of random noise [3]. Furthermore, adversarial examples tend to be similarly misclassified by different, separately trained ConvNets leading to fears for both the security and general- izability of deep neural networks. From the subsequent explosion of literature investigating this problem, several themes have emerged. First, suscepti- bility to adversity has been shown to be a property of high dimensional dot products[4] and so will plague any suffi- ciently linear model, as many ConvNets are thought to be. Second, while including adversarial examples in the train- ing phase can act as a form of regularization, there are rel- atively few successful strategies for combatting adversarial examples in vanilla ConvNets, trained on natural images.[4] Finally, as emphasized by [3], at the heart of the issue is the difference between discriminative and generative mod- els. ConvNets are discriminative models, which for a label vector y and input vector x learn the probability distribution p(y|x). Essentially, these models partition a very high di- mensional input space into some set number of classes, with class boundaries partitioning the space into hyper-volumes that are much larger than the volume occupied by the train- ing examples, as demonstrated in Fig. 1(a). Because natu- ral input images occupy a very restricted manifold in this high dimensional space, the class predictions for vast re- gions of the input space are completely arbitrary. Thus, ad- versarial examples are essentially synthetic images which end up ‘far’ from typical data, and will be incorrectly clas- sified with a high degree of confidence. What is needed is a generative model, where the complete joint density p(y, x) is known; in such a case the prior probability p(x) may be used to rule out adversarial examples as being too improb- able. These considerations have led to the development of Generative Adversarial Nets (GANs) [5]. Given the above discussions of dimensionality, an alternative—and we believe hitherto unexplored— possibility is to discretize the manifold from which images are classified. Eliminating portions of the input manifold can be viewed as an inflexible prior; we are essentially reducing the regions into which data can be perturbed and misclassified by reducing the hyper-volume which must be partitioned. The resulting robustness to adversarial examples is then trivial: there is now a minimum allowed distortion of the image (due to the discretization, as shown in Fig. 1(b)), and so a proposed adversarial image will be 1

Upload: others

Post on 10-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving the adversarial robustness of ConvNets by …cs231n.stanford.edu/reports/2016/pdfs/103_Report.pdfwe also include visualizations of the effect of these adver-sarial perturbations

Improving the adversarial robustness of ConvNets by reduction of inputdimensionality

Akash V. MaharajDepartment of Physics, Stanford University

[email protected]

Abstract

We show that the adversarial robustness of deep neu-ral networks can be improved by reducing the dimension-ality of input data. Adversarial examples are generated bysmall perturbations of regular input data, and while thesechanges are typically imperceptible to the human eye, theycan result in catastrophic losses of accuracy in neural net-works. This has been shown to be a generic property ofquasi-linear classifiers in high dimensional spaces; thus,it is intuitively possible that reducing the dimensionality ofthe inputs leaves less ‘space’ for adversarial examples toexist. We present detailed results demonstrating this im-proved robustness (with small sacrifices in accuracy) for5 layer networks on the CIFAR10 dataset, as well as forthe deeper CaffeNet and GoogLeNet architectures on theTiny-Imagenet-200 dataset. In particular, a reduction in thenumber of allowed pixel intensities from 256 to 4, whichcorresponds to robustness to perturbations of up to 32 unitsof intensity, results in an accuracy reduction of only 6% forCIFAR10 and 8% for Tiny-Imagenet.

1. Introduction and related work

Convolutional neural networks (ConvNets) have by nowexhibited better-than-human performance[1] in vision clas-sification problems. It was therefore both surprising, and abit depressing, when Szegedy et al.[2] discovered that theseclassifiers are remarkably vulnerable to so called adversarialexamples - small, deliberate changes of the pixel intensities,designed to fool the classifier. While such small changesare virtually imperceptible to humans, state-of-the-art Con-vNets typically classify such images incorrectly, but withvery high confidence. Even more worryingly, ConvNetscan be convincingly fooled by tuning the pixel intensities ofrandom noise [3]. Furthermore, adversarial examples tendto be similarly misclassified by different, separately trainedConvNets leading to fears for both the security and general-izability of deep neural networks.

From the subsequent explosion of literature investigatingthis problem, several themes have emerged. First, suscepti-bility to adversity has been shown to be a property of highdimensional dot products[4] and so will plague any suffi-ciently linear model, as many ConvNets are thought to be.Second, while including adversarial examples in the train-ing phase can act as a form of regularization, there are rel-atively few successful strategies for combatting adversarialexamples in vanilla ConvNets, trained on natural images.[4]

Finally, as emphasized by [3], at the heart of the issue isthe difference between discriminative and generative mod-els. ConvNets are discriminative models, which for a labelvector y and input vector x learn the probability distributionp(y|x). Essentially, these models partition a very high di-mensional input space into some set number of classes, withclass boundaries partitioning the space into hyper-volumesthat are much larger than the volume occupied by the train-ing examples, as demonstrated in Fig. 1(a). Because natu-ral input images occupy a very restricted manifold in thishigh dimensional space, the class predictions for vast re-gions of the input space are completely arbitrary. Thus, ad-versarial examples are essentially synthetic images whichend up ‘far’ from typical data, and will be incorrectly clas-sified with a high degree of confidence. What is needed is agenerative model, where the complete joint density p(y,x)is known; in such a case the prior probability p(x) may beused to rule out adversarial examples as being too improb-able. These considerations have led to the development ofGenerative Adversarial Nets (GANs) [5].

Given the above discussions of dimensionality, analternative—and we believe hitherto unexplored—possibility is to discretize the manifold from which imagesare classified. Eliminating portions of the input manifoldcan be viewed as an inflexible prior; we are essentiallyreducing the regions into which data can be perturbed andmisclassified by reducing the hyper-volume which mustbe partitioned. The resulting robustness to adversarialexamples is then trivial: there is now a minimum alloweddistortion of the image (due to the discretization, as shownin Fig. 1(b)), and so a proposed adversarial image will be

1

Page 2: Improving the adversarial robustness of ConvNets by …cs231n.stanford.edu/reports/2016/pdfs/103_Report.pdfwe also include visualizations of the effect of these adver-sarial perturbations

: Adversarial examples

Minimum distortion:

(a) (b)

Figure 1: A cartoon demonstration of the intuition behinddiscretizing inputs. True data (dots) lie on a narrow mani-fold in the high dimension space, and the classifier is trainedto separate this data into categories (shaded regions). How-ever the decision boundaries in the vast swathes of “unpopu-lated” space are arbitrary and untrained. Adversarial exam-ples are images perturbed into this naturally empty regionand are thus, misclassified. Discretizing inputs introduces aminimum allowable distortion, and helps prevent small ad-versarial perturbations.

perceptibly different from a natural one.From a human perspective, this discretization is perhaps

more intuitive than one may at first imagine. For exam-ple, humans can typically classify grayscale images, sug-gesting that all three color channels may not be necessaryfor categorical data. Indeed, the higher convolutional lay-ers of ConvNets have been shown to be efficient featureextractors[6] which should be robust to color. Furthermore,there is a coarse graining procedure that is inherent to ourvision - the presence of displaced pixels is irrelevant to ourperception of an image. Thus, while color is certainly im-portant to classification tasks, knowledge of all 2563 RGBcolors is probably unnecessary. Both these ‘intuitive’ obser-vations lead to the hypothesis that not every single dimen-sion of the input space is relevant.

Discretization has the added attraction of improved com-putational efficiency and decreased memory requirements,which is naturally a desirable feature for any real worlddeployment of deep neural networks. Indeed, in a veryrecent preprint, Courbariaux et al.[7] introduced and dis-cussed deep binary neural networks (BNNs), where all theweights and activations are binary. Despite the severe non-analyticities introduced to the loss function by such ‘bina-rization’, the authors report near state of the art results withimpressive speedups in the training of such networks. WhileCourbariaux et al.[7] did not discuss discretizing of inputsas we do here, this report can be viewed as a natural gener-alization of the BNN concept.

In Sec. 2.1 of this report, we describe how we achievethe reduction in dimensionality of the input by quantizing

RGB pixel intensities into a variety of allowed discretiza-tions, ranging from 22 to the full 28 = 256 allowed intensi-ties. For the CIFAR10 dataset[8], we train a 5 layer neuralnetwork from scratch for each discretization as is describedin Sec. 2.4.1, and report in detail on the changes in accuracyand robustness of the network in Sec. 3.2. We then gener-alize these methods to the deeper, pretrained CaffeNet[9] (amodified AlexNet[10]) and GoogLeNet architectures[11].We employ transfer learning[12] to fine tune these networks(Sec. 2.4.2) for discretizations of 2{2,5,8}, and report the re-sults in Sec.3.3. In addition to these quantitative metrics,we also include visualizations of the effect of these adver-sarial perturbations on images from Tiny-Imagenet[13] inSec. 3.3. We end with a broader discussion of the potentialrole for dimensional reduction techniques in image classifi-cation, and discuss future avenues of research in Sec. 4.

2. MethodsHere we discuss the discretization of inputs as well as the

different network architectures and training methods usedthroughout this report.

2.1. Discretization of pixel intensities

While there are 2563 ≈ 2 × 107 possible colors whichcan be represented by an image array, it is thought that thehuman eye can only perceive around half of these. Fur-thermore, everyday experience suggests that there is a hugeamount of redundancy in these colors - one does not needto know the difference between cobalt and azure (shades ofblue) to recognize an image of the ocean. A simple way torepresent this redundancy is by discretizing the intensitiesof each color channel. Throughout this work, we convertthe raw integer pixel intensities 0 ≤ I ≤ 255 to δ possiblevalues Iδ using the equation

Iδ =

⌊I

(256/δ)

⌋(256

δ

)+

1

2

(256

δ

), (1)

where b c denotes integer (floored) division. For an RGBimage of sizeN ×N pixels, this reduces the input manifoldfrom 2563N

2

to δ3N2

. Some examples of what discretiza-tion does to input images is shown in Fig. 2

2.2. Generating adversarial examples

In this work, we generate adversarial examples by amethod known as the ‘fast gradient method’ [4] which com-putes the gradient of the loss function L(θ,y,x), with re-spect to the input image x,

x′ → x− (ε× 256) sign[∇xL(θ,y,x)] (2)

where y is the classification vector and θ are the parameters(weights, biases etc.) of the network, and L is the loss func-tion. Note ∇xL(θ,y,x) is direction of steepest descent,

2

Page 3: Improving the adversarial robustness of ConvNets by …cs231n.stanford.edu/reports/2016/pdfs/103_Report.pdfwe also include visualizations of the effect of these adver-sarial perturbations

(a) δ = 256 (b) δ = 32 (c) δ = 4

Figure 2: The effect of increasing discretization of pixelintensities (δ is the number of allowed intensities) on tworandomly chosen (upscaled) 224 × 224 RGB images fromthe Tiny-Imagenet dataset

(a) δ = 256 (b) δ = 32 (c) δ = 4

Figure 3: Same as Fig 2 but for the CIFAR10 dataset (32×32 RGB images)

and as such a distortion of the input image in this directioncauses the fastest change in the loss function (and hencemisclassification). The parameter ε quantifies the magni-tude of the distortion, and we will plot accuracies as a func-tion of this parameter to quantify the robustness of the net-work. Note that ε’s of ∼ 0.1 have been found to generateerrors rates of 87% on CIFAR10 [4].

Adversarial examples x′ are quantified by their magni-tude of distortion |r| from an original image x (both ofwhich are vectors of dimension D) using the expression

|r| = 1

D

D∑n=1

||x′n − xn||22 (3)

Because a given distortion of the image involves changingall pixel values by ε×256 according to Equation 2, this mea-sure of the distortion corresponds well to a human’s percep-tion of the distortion to the image (as opposed to displacing

a single pixel by a huge amount, which would not affect ourperception of the image, but would correspond to large |r|).

2.3. Dataset

We use two standard computer vision datasets through-out this project. Tests were initially carried out on the CI-FAR10 data set [8] which contains 50 000 training and 10000 test images, corresponding to 10 categories. Each im-age has a resolution of 32 × 32 pixels, and some examplesare shown in Fig. 3.

For subsequent analysis using deeper networks, we usedthe Tiny-Imagenet dataset [13] which consists of 100 000training and 10 000 validation images corresponding to 200categories. For use with CaffeNet and GoogLeNet archi-tectures, we first upscaled all images to a resolution of224× 224 using SciPy’s bilinear interpolation scheme[14].Some example images with different discretizations areshown in Fig. 2.

2.4. Network architectures and training

All networks were implemented in Caffe[9], and withtraining done using Nvidia GPUs with 4GB memory (onAWS servers). The details of each network are presentedhere.

2.4.1 5-Layer ConvNet on CIFAR10

For the CIFAR10 dataset we use a five layer net-work, consisting of three repeats of a [CONV3-32→BATCHNORM→RELU→POOL2] layer, followedby two fully connected layers. The outputs of the topfully connected layer are 10 numbers fj , j ∈ [0, 10),which we treat as unnormalized log probabilities, and sowe use a softmax classifier to compute the loss functionL(y,x) = 1

M

∑i Li where i indexes each of M images,

and

Li(yi, xi) = − log

(eyi∑j efj

), (4)

where yi is the correct class label of image xi.For each discretization of the input images, we use batch

gradient descent with the ‘Adam’ update rule[15]. Thisis currently an industry recommended default learning ratemethod[16] because of its per-parameter tuning of the learn-ing rate, and successful combination of several previous up-date rules like RMS Prop and Momentum. Batch sizes of100 images were used, and we decreased the learning rateby 90% every 2 epochs. Good results were obtained aftertraining for 10 epochs (5000 iterations).

2.4.2 Transfer learning with CaffeNet and GoogLeNet

For the richer Tiny-Imagenet dataset, a deeper network ar-chitecture is necessary. While training deep networks usu-

3

Page 4: Improving the adversarial robustness of ConvNets by …cs231n.stanford.edu/reports/2016/pdfs/103_Report.pdfwe also include visualizations of the effect of these adver-sarial perturbations

0 1000 2000 3000 4000 5000Iterations

0

1

2

3

4

5

6

7

8

Loss

25632 (Offset by +1)4 (Offset by +2)

(a) Evolution of the loss for different discretizations of the input.

0 1000 2000 3000 4000 5000Iterations

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Top-5 Acc

urac

y

256324

(b) Top-5 Accuracy for different discretizations.

Figure 4: Loss and accuracy evolution of CaffeNet in thetraining phase.

ally takes on the order of weeks, a commonly used strategyis transfer learning. We essentially take a pre-trained net-work and replace the data and final fully connected layerswith our data set and a new fully connected layer. The pre-trained layers are used as feature extractors, and within acouple epochs of training we are able to achieve high top-5accuracies on the 10k validation images of Tiny-Imagenet.

We first experimented CaffeNet, a modified version ofAlexNet where pooling is done before normalization. Thereare five convolutional layers and 2 fully connected layers,with a softmax classifier once more. We implement transferlearning for 5 epochs (5000 iterations at a batch size of 100),using the Adam update rule once again.

Next, we applied transfer learning to GoogLeNet, whichis a much deeper network. We trained only the (3) fullyconnected layers corresponding to the three different outputaccuracies of the network, for a just over 1 epoch (5000iterations with a batch size of 24).

Examples of the loss functions and accuracy vs. itera-tions for both CaffeNet and GoogLeNet are shown in Fig. 4and 5

We note that due to the significant expense of training

0 1000 2000 3000 4000 5000Iterations

0

1

2

3

4

5

6

7

8

9

Loss

25632 (Offset by +1)4 (Offset by +2)

(a) Evolution of the loss for different discretizations of the input.

0 1000 2000 3000 4000 5000Iterations

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Top-5 Acc

urac

y

256324

(b) Top-5 Accuracy for different discretizations.

Figure 5: Loss and accuracy evolution of GoogLeNet in thetraining phase.

these networks on AWS clusters, there was little room forexperimentation with many different hyperparameters. Thestrategy for picking the overall learning rate (typically nearto 0.001) was to tune the learning rate magnitude until adecreasing loss function was obtained. Limited experimen-tation on randomly chosen learning rates produced roughlythe same final top-5 accuracies, and the best results are re-ported here. For the same reasons, we only tested three dif-ferent discretizations, corresponding to 22,5,8 allowed pixelintensities. No cross-validation was employed.

3. Results and Discussion3.1. Metrics used

Because we are focused on the adversarial robustness ofclassifiers, our primary metric is the accuracy of the variousnetworks used, as a function of both the discretization, δ,and the strength of the adversarial perturbation. For the CI-FAR10 data set we present the top-1 (regular) accuracy onthe test set of 10 000 images, while for the Tiny-Imagenetwe use both top-5 and top-1 accuracies on the validation setof 10 000 images. For qualitative results (where we show

4

Page 5: Improving the adversarial robustness of ConvNets by …cs231n.stanford.edu/reports/2016/pdfs/103_Report.pdfwe also include visualizations of the effect of these adver-sarial perturbations

Figure 6: Evolution of the accuracy of the ConvNet as thediscretization δ is changed. A remarkable outcome is thefact that with only 16 allowed intensities, the classificationaccuracy has only dropped by 1%. This corresponds to aminimum allowed perturbation of 8 units of intensity.

adversarial distortions), we also include the magnitude ofthe distortion |r| as defined in Eq. 3

3.2. Fully trained ConvNets on CIFAR10

The results on CIFAR10 are encouraging. As shown inFig. 6, the accuracy of the classifier only drops by 6% on go-ing from all 256 allowed intensities to just 4. Furthermore,there is a broad regime from 16 to 256 allowed intensitieswhere the accuracy varies by less than 1%. This clearly val-idates the hypothesis that input dimension of images typi-cally used is unnecessarily large, and can be reduced signif-icantly.

In Fig. 7 we show the classification accuracy on the testset as the strength of the distortion in a fast gradient methodis increased. As explained in the caption of this figure, thediscretized images are ‘trivially’ robust to small perturba-tions - this is an explicit feature of our discretization of theinputs. One obvious feature of these plots is the immedi-ate collapse of the accuracy down to 10% (consistent withrandom draws) upon perturbing beyond the minimum al-lowed distortion. While unfortunate, this can be expectedon general grounds, following our earlier discussion of thedifference between discriminative and generative models.This is still a discriminative method, and so there are stillvast regions of the input space which are being classified(despite their lack of natural data). Thus, as soon as imagesare perturbed into these regions (with a minimum allowedperturbation), the classifier fails completely. A more sub-tle feature of Fig. 7 is the seeming recovery of the accuracy(jumps in accuracy) just beyond the minimum allowed per-turbation. This makes little physical sense, and will be the

0.00 0.05 0.10 0.15 0.20 0.25Distortion Strength, ǫ

0.0

0.2

0.4

0.6

0.8

1.0

Test Accurac

y

22

23

24

(a) Allowed pixel intensities δ = 4, 8, 16

0.00 0.05 0.10 0.15 0.20 0.25Distortion Strength, ǫ

0.0

0.2

0.4

0.6

0.8

1.0

Test Accurac

y

25

26

Original, 28

(b) δ =32, 64, 256

Figure 7: The adversarial robustness upon discretiza-tion: evolution of the accuracy of the classifiers as theparameter ε (magnitude of the distortion strength in Eq. 2is increased. Note that the discretized images are triv-ially robust. For example for δ = 4, it is only whenε = 0.125 ∗ 256 = 32 that a perturbation is allowed. Be-low this, any small pixel perturbation is clamped back tozero. Note that above ε = 0.125, the classifier’s all failcompletely, with the accuracy of 10% being equivalent torandom luck.

subject of future investigation.

Perhaps the most intriguing feature of the CIFAR10 datais shown in Fig. 8. Here, we show the accuracies of Con-vNets trained on images with one discretization and testedon images with another discretization. There is a broad re-gion from 16 allowed intensities upwards, where the accu-racies are roughly the same. This is further confirmation ofthe unnecessarily large dimension of images - for classifica-tion tasks, only a few colors are necessary, and discretiza-tion acts as form of regularization.

5

Page 6: Improving the adversarial robustness of ConvNets by …cs231n.stanford.edu/reports/2016/pdfs/103_Report.pdfwe also include visualizations of the effect of these adver-sarial perturbations

4 8 16 32 64 128 256Discretization of test set

256

128

6432

168

4Discretization of training

set

65.32 74.23 79.42 80.42 80.69 80.77 80.74

60.11 71.20 78.38 80.26 80.40 80.37 80.32

63.20 74.00 79.15 80.76 80.78 80.63 80.64

62.94 74.52 79.49 80.34 80.28 80.37 80.39

66.95 77.03 79.77 80.01 79.82 79.78 79.82

70.79 78.45 78.47 77.77 77.37 77.39 77.34

75.17 74.78 73.48 72.90 72.84 72.77 72.79

64

68

72

76

80

Figure 8: The accuracies for classifiers trained on one levelof dicretization and tested on another. Classifiers trained oninputs with a 16 or more allowed intensities all appear toproduce similar accuracies (indicating that the weights andbiases in these models are similar!)

3.3. Transfer learning on Tiny-ImageNet 200

For the Tiny-ImageNet data set, the results are remark-ably similar, despite the 20 fold increase in the number ofcategories being classified. In Fig. 9 we show the top-5 andtop-1 accuracies for both CaffeNet and GoogLeNet trainedvia transfer learning. First note that GoogLeNet outper-forms CaffeNet in both top-5 and top-1 accuracies as canbe expected - it is a deeper network, and when trained onImagenet achieved a top-5 accuracy of 89% (as opposed to80% for CaffeNet).

However, note that there is a remarkable increase in theaccuracies for CaffeNet upon going from 256 to 32 pixels,while the eventual decrease in top 5 accuracies at 4 allowedpixel intensities is moderate. While a more comprehensiveexploration of training hyperparameters may change this re-sult, initial attempts to change it were unsuccessful. Thispossibly supports the notion of discretization as a form ofregularization. On the other hand, GoogLeNet’s top-5 andtop-1 accuracies were barely different between 256 and 32allowed intensities, with a moderate (but not catastrophic)drop in accuracy upon going to 4 allowed intensities.

Finally, a useful qualitative evaluation of the effective-ness of discretization to adversarial examples is shown inFig. 10, where we show the raw images, the minimum per-turbation as defined by Eq. 2 and the resulting distorted im-age. It is clear that the minimum distortion is largest for thediscretized dataset, and the resulting distorted image is leastrecognizable (to humans) for this large discretization. This

22 23 24 25 26 27 28

Number of allowed pixel intensities

0.78

0.79

0.80

0.81

0.82

0.83

0.84

0.85

0.86

0.87

Top-5 Accurac

y

GoogleNetCaffeNet

(a) Top-5 accuracies

22 23 24 25 26 27 28

Number of allowed pixel intensities

0.52

0.54

0.56

0.58

0.60

0.62

0.64

0.66

Top-

1 A

ccur

acy

GoogleNetCaffeNet

(b) Top-1 accuracies

Figure 9: Accuracies for CaffeNet and GoogLeNet uponchanging the discretization of input images.

is a concrete manifestation of the adversarial robustness wehave achieved by discretization.

4. Conclusions and Future WorkWe have demonstrated that the accuracies of deep neu-

ral networks are not adversely affected by discretization ofinput data. For a reduction of input pixel intensities from256 to 16, there is a minimal drop in accuracy ( 1% for CI-FAR10), and for 4 allowed intensities the accuracy drops byless than 10%. This type of discretization then renders im-ages trivially robust to adversarial perturbations - becauseonly discrete values of intensities are permitted, there is aminimum allowed distortion to each image before it canchange. There is then a natural tradeoff between increas-ing the level of discretization (i.e. reducing the number ofallowed pixel intensities), and the decreasing accuracy ofthe ConvNet. The broad regime of similar performance thatis evident in CIFAR10 results (and supported by the Tiny-Imagenet experiments) suggests that a ‘sweet-spot’ of dis-

6

Page 7: Improving the adversarial robustness of ConvNets by …cs231n.stanford.edu/reports/2016/pdfs/103_Report.pdfwe also include visualizations of the effect of these adver-sarial perturbations

(a)

(b)

(c)

Figure 10: The minimum allowed distortion for differ-ent discretizations of a randomly chosen Tiny-Imagenetvalidation image, with the probabilities as assigned byGoogLeNet. The values of δ are (a): 256, (b):32 and (c):4.The distortion magnitude |r| is respectively 1, 4 and 32. Thenoise images have been right shifted by 128.

cretization around 16-32 allowed pixel intensities will resultin minimal loss of accuracy and maximal robustness to ad-versarial perturbations.

More broadly, it is clear that this discretization proce-dure has acted as a form of regularization. By reducing thedimensionality of the input data, we have even improvedthe top-5 accuracy of CaffeNet and GoogLeNet on Tiny-ImageNet. This reduction in input dimensionality may alsobe viewed as an inflexible prior distribution on the inputdata distribution. We note that unlike other forms of inputdimensional reduction (e.g. PCA) the discrete form of thedata is crucial, as it results in stability to perturbations.

Future areas of investigation include completing this

analysis on Imagenet data, and investigating more values ofthe discretization to search for an optimal value which max-imizes robustness without sacrificing accuracy. This formof discrete compression and improved robustness of imagesmay also be valuable for industries where security and highthroughput of data are important (e.g. in self driving cars).More generally, techniques for learning the true underlyingdistributions of image data in high dimensional spaces (andperhaps with the inclusion of dimensional reduction tech-niques such as that presented here) appears to be an impor-tant avenue for research into deep neural networks.

References[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.

Deep residual learning for image recognition. arXiv preprintarXiv:1512.03385, 2015.

[2] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, JoanBruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus.Intriguing properties of neural networks. arXiv preprintarXiv:1312.6199, 2013.

[3] Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neu-ral networks are easily fooled: High confidence predictionsfor unrecognizable images. In Computer Vision and PatternRecognition (CVPR), 2015 IEEE Conference on, pages 427–436. IEEE, 2015.

[4] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy.Explaining and harnessing adversarial examples. arXivpreprint arXiv:1412.6572, 2014.

[5] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, BingXu, David Warde-Farley, Sherjil Ozair, Aaron Courville, andYoshua Bengio. Generative adversarial nets. In Advances inNeural Information Processing Systems, pages 2672–2680,2014.

[6] Karen Simonyan and Andrew Zisserman. Very deep convo-lutional networks for large-scale image recognition. arXivpreprint arXiv:1409.1556, 2014.

[7] Matthieu Courbariaux and Yoshua Bengio. Binarynet: Train-ing deep neural networks with weights and activations con-strained to+ 1 or-1. arXiv preprint arXiv:1602.02830, 2016.

[8] Alex Krizhevsky and Geoffrey Hinton. Learning multiplelayers of features from tiny images, 2009.

[9] Yangqing Jia, Evan Shelhamer, Jeff Donahue, SergeyKarayev, Jonathan Long, Ross Girshick, Sergio Guadarrama,and Trevor Darrell. Caffe: Convolutional architecture for fastfeature embedding. In Proceedings of the ACM InternationalConference on Multimedia, pages 675–678. ACM, 2014.

[10] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton.Imagenet classification with deep convolutional neural net-works. In Advances in neural information processing sys-tems, pages 1097–1105, 2012.

7

Page 8: Improving the adversarial robustness of ConvNets by …cs231n.stanford.edu/reports/2016/pdfs/103_Report.pdfwe also include visualizations of the effect of these adver-sarial perturbations

[11] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet,Scott Reed, Dragomir Anguelov, Dumitru Erhan, VincentVanhoucke, and Andrew Rabinovich. Going deeper withconvolutions. In Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, pages 1–9, 2015.

[12] Yoshua Bengio. Learning deep architectures for ai. Founda-tions and trends® in Machine Learning, 2(1):1–127, 2009.

[13] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San-jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy,Aditya Khosla, Michael Bernstein, Alexander C. Berg, andLi Fei-Fei. ImageNet Large Scale Visual Recognition Chal-lenge. International Journal of Computer Vision (IJCV),115(3):211–252, 2015.

[14] Eric Jones, Travis Oliphant, Pearu Peterson, et al. SciPy:Open source scientific tools for Python, 2001–.

[15] Diederik Kingma and Jimmy Ba. Adam: A method forstochastic optimization. arXiv preprint arXiv:1412.6980,2014.

[16] Andrej Karpathy, Justin Johnson, and L. Fei-Fei. Cs231nlecture notes, 2015.

8