convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · sergey nikolenko,...
TRANSCRIPT
![Page 1: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/1.jpg)
convolutional neural networksMaster's Computer Vision by Neuromation
Sergey Nikolenko, Alex DavydowHarbour Space University, Barcelona, SpainMay 27, 2019
Random facts:
• on May 27, 1703, Peter the Great founded Saint Petersburg, soon to be capital of the Russianempire and still a wonderful city
• on May 27, 1931, Auguste Piccard and Paul Kipfer took off on a balloon from Augsburg andbecame the first human beings to enter the stratosphere, reaching a record altitude of 15,781m
• on May 27, 1933, Walt Disney released the cartoon Three Little Pigs, with its hit song Who'sAfraid of the Big Bad Wolf?
• on May 27, 1960, a military coup removed the Turkish President Celâl Bayar and the rest ofthe democratic government from office
• on May 27, 1977, Virgin released a Sex Pistols single God Save the Queen; the song wasimmediately banned on British radio but still reached #1 on the charts
![Page 2: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/2.jpg)
Modern CNN architectures
![Page 3: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/3.jpg)
ResNet
• Residual learning: let’s train the differences (residues) betweenone layer and the next.
• Then the gradients will be able to flow with no obstacle.• A function implemented by a residual unit looks like
y(𝑘) = 𝐹(x(𝑘)) + x(𝑘),
where x(𝑘) is the input vector of layer 𝑘, 𝐹(𝑥) is the functioncomputed by the layer, and y(𝑘) is the output of the residuallayer that will then become x(𝑘+1) for the next layer.
• Now the gradient can pass through and does not vanish when 𝐹becomes saturated:
𝜕y(𝑘)
𝜕x(𝑘) = 1 + 𝜕𝐹(x(𝑘))𝜕x(𝑘) .
3
![Page 4: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/4.jpg)
ResNet
• This has allowed for very deep networks.• Another similar approach – highway networks by JürgenSchmidhuber.
• We again represent y(𝑘), output of layer 𝑘, as a linearcombination of x(𝑘) and 𝐹(x(𝑘)), but in a different way:
y(𝑘) = 𝐶(x(𝑘))x(𝑘) + 𝑇 (x(𝑘))𝐹(x(𝑘)),
where 𝐶 is the carry gate, and 𝑇 is the transform gate; usuallyit’s a convex combination, 𝐶 = 1 − 𝑇 .
• Practice shows that the residual connections should be as“straight” as possible.
3
![Page 5: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/5.jpg)
ResNet: variations
4
![Page 6: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/6.jpg)
Revolution of Depth (Kaiming He)
5
![Page 7: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/7.jpg)
Revolution of Depth (Kaiming He)
6
![Page 8: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/8.jpg)
Revolution of Depth (Kaiming He)
7
![Page 9: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/9.jpg)
ResNeXt
• ResNeXt (Xie et al., 2016): let’s replace ResNet units with“split-transform-merge” units, similar to Inception.
• The input is divided into blocks w.r.t. channels, and every blockgets its own convolutions.
8
![Page 10: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/10.jpg)
ResNeXt
• The idea is similar to group convolutions, used already inAlexNet for parallelization:
• They do yield a kind of a specialization in the results:
8
![Page 11: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/11.jpg)
Inception v4 and Inception ResNet
• Another classic paper (Szegedy et al., 2016) introduced Inceptionv4 and Inception ResNet.
• Inception v4 – let’s standardize everything and simplify theunits. First, the “stem”:
9
![Page 12: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/12.jpg)
Inception v4 and Inception ResNet
• Second, we now have three basic blocks A, B, and C:
9
![Page 13: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/13.jpg)
Inception v4 and Inception ResNet
• And special reduction blocks to reduce the dimensions:
9
![Page 14: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/14.jpg)
Inception v4 and Inception ResNet
• Inception ResNet adds residual connections to these blocks:
9
![Page 15: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/15.jpg)
Inception v4 and Inception ResNet
• There is no pooling now but there are still reduction blocks:
9
![Page 16: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/16.jpg)
Inception v4 and Inception ResNet
• As a result, the architecture has become even simpler; Inceptionv4 (top), Inception ResNet (bottom):
9
![Page 17: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/17.jpg)
Inception v4 and Inception ResNet
• Inception ResNet v2:
9
![Page 18: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/18.jpg)
Inception v4 and Inception ResNet
• And it works quite well:
9
![Page 19: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/19.jpg)
SqueezeNet
• SqueezeNet (Iandola et al., 2017) – how to reduce the number ofparameters:
• replace 3 × 3 filters with 1 × 1;• reduce the number of inputs for 3 × 3 convolutions;• delay downsampling as late as possible to increase the size ofactivation maps.
10
![Page 20: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/20.jpg)
SqueezeNet
• Fire module:• squeeze convolutional layer (1 × 1 only);• expand layer (1 × 1 and 3 × 3).
10
![Page 21: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/21.jpg)
SqueezeNet
• General SqueezeNet architecture:
10
![Page 22: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/22.jpg)
SqueezeNet
• We get 50x fewer parameters than AlexNet. But:
10
![Page 23: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/23.jpg)
MobileNet
• MobileNet (Howard et al., 2017): networks for mobile devices.• Depthwise separable convolutions: let’s decompose aconvolution into a depthwise convolution (one filter for eachchannel) and a 1 × 1 convolution.
11
![Page 24: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/24.jpg)
MobileNet
• Then the structure of a layer will be more complex (but withfewer weights), and the overall architecture is not so deep:
11
![Page 25: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/25.jpg)
MobileNet
• We see that we can save a lot of parameters at the price of asmall decrease in quality:
11
![Page 26: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/26.jpg)
Adversarial examples
![Page 27: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/27.jpg)
Adversarial examples
• Interesting feature of neural networks: you can fool any networkwith a picture completely indistinguishable for the naked eye.
• But how? Any ideas?..
13
![Page 28: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/28.jpg)
Adversarial examples
• Let’s do gradient descent not along the weights 𝜃 but along theinput x!
• We only need to control that the new example x̂ remains similarto the original x, e.g., ‖x̂ − x‖∞ ≤ 𝜖 (or some other condition).How?
• Moreover, we can try to make x̂ stable to transformations suchas rotation.
• How would we do that?
13
![Page 29: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/29.jpg)
Adversarial examples
• Intriguing properties of neural networks (Szegedy et al., 2013). Avery intriguing paper indeed...
• For instance, we have analyzed the activations of neurons.• I.e., supposedly, if we analyze the last layer neurons, they willform a nice basis in the latent space where it is easy to find thesemantics.
• Right?..
13
![Page 30: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/30.jpg)
Adversarial examples
• ...not really:
• I.e., regular CNNs don’t have any reasonable disentanglement,the latent space is good but the basis is as good as random.
13
![Page 31: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/31.jpg)
Adversarial examples
• The same paper introduced adversarial attacks; for AlexNeteverything on the right is an ostrich:
13
![Page 32: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/32.jpg)
Adversarial examples
• Further in (Goodfellow, Shlens, Szegedy, 2014); all highlightedpictures are airplanes:
13
![Page 33: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/33.jpg)
Adversarial examples
• Conclusions (Goodfellow, Shlens, Szegedy, 2014):• for a linear classifier it’s clear what to do: for x̂ = x + z we want toshi t w⊤x̂ = w⊤x + w⊤z, i.e., we take z = sign(w) and applyconstraints on the norm of x̂;
• the same can be done in any network; by taking the gradient wedo a linear approximation in a neighborhood:
z = 𝜖sign(∇x𝐿(𝜃, x, 𝑦));
• i.e., this is not because our models are very nonlinear, it’s becausethey are too linear;
• the shi t direction is important, not any specific point; i.e., we caneven generalize adversarial shi t to different examples;
• and we can try to regularize against it by adding adversarial shi tto the objective function:
𝐿′(𝜃, x, 𝑦) = 𝛼𝐿(𝜃, x, 𝑦) + (1 − 𝛼)𝐿(𝜃, x + 𝜖sign(∇x𝐿(𝜃, x, 𝑦), 𝑦).
• But that’s not the end of the story either... 13
![Page 34: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/34.jpg)
Adversarial examples
• Lots of different attacks:• Deep Fool attack (Bastani et al., 2016): shi t the example to thehyperplane that divides classes, z = 𝑓(x0)
‖w‖22
w for a linear classifierand z𝑖 = 𝑓(x𝑖)
‖∇𝑓(x𝑖)‖22
∇𝑓(x𝑖) for any function;• (Carlini, Wagner, 2016): find minimal changes based on 𝐿0, 𝐿2, and
𝐿∞-norms, still some of the best attacks;• one can also look not for a direction but for specific features;• (Papernot et al., 2016): find out which pixels are the mostimportant and shi t them;
• and a lot more, hundreds of papers already...
13
![Page 35: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/35.jpg)
Adversarial examples
• There are different approaches to defense too:• (Bastani et al., 2016): formalized the notion of robustness toadversarial attacks, proposed methods for evaluating it;
• (Lyu et al., 2015; Roth et al., 2018): other variations on gradientregularization;
• (Shabam et al., 2015; Madry et al., 2017): let’s train on “adversarial”examples, choosing the worst example in a neighborhood;
• (Brendel, Bethge, 2017): the more we have nonzero (small)gradients, the worse for attacks, so we can use simple numericalinstability as a regularizer;
• DeepCloak defense (Gao et al., 2017): let’s remove features that arenot needed for classification;
• and a lot more, hundreds of papers already...
13
![Page 36: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/36.jpg)
Adversarial examples
• (Kurakin, Goodfellow, Bengio, 2016): attacks in the real world!Moreover, black box attacks: we attack one model and test onanother.
• There is an app that changes a photo adversarially:
13
![Page 37: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/37.jpg)
Adversarial examples
• Even better – you can print out an adversarial example, and itstill works!
• It’s still unclear how realistic all this is, but quite possibly animportant direction for AI security in the future.
13
![Page 38: convolutional neural networkssergey/teaching/harbourcv19/... · 2019. 6. 5. · Sergey Nikolenko, Alex Davydow-.5cm Created Date: 5/28/2019 2:16:47 PM](https://reader033.vdocuments.site/reader033/viewer/2022051808/600c2126d5c4ad495b023089/html5/thumbnails/38.jpg)
Thank you!
Thank you for your attention!
14