efficient representation learning › ~shapiro › ee596 › notes › cnn_19.pdfguidelines for...

40
Sachin Mehta Joint work with Mohammad Rastegari, Linda Shapiro, and Hannaneh Hajishirzi Efficient Machine Learning for Visual and Textual Data https://github.com/sacmehta/ https://sacmehta.github.io/ https://h2lab.cs.washington.edu/

Upload: others

Post on 07-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Sachin Mehta

Joint work with Mohammad Rastegari, Linda Shapiro, and Hannaneh Hajishirzi

Efficient Machine Learning for Visual and Textual Data

https://github.com/sacmehta/

https://sacmehta.github.io/

https://h2lab.cs.washington.edu/

Page 2: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

IntroductionAccuracy improves

Speed reduces

Energy consumption increases

8 Layers 22 Layers 151 Layers

Network Depth

Page 3: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Introduction

80

85

90

95

100

AlexNet GoogLeNet Human ResNet

Top

-5 A

ccu

racy

ImageNet Classification

20142012 2016

Year

8 Layers 22 Layers 151 Layers

Depth

Accuracy improves

Speed reduces

Energy consumption increases

These models cannot be used on Resource-Constrained Devices

Resource-constrained devices

Embedded Devices Mobile phones

Limited energy

overhead

Restrictive memory

constraints

Limited compute

Page 4: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

My Research

Light-weight, Low latency, and SOTA Neural Networks

Page 5: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

My Research

Light-weight, Low latency, and SOTA Neural Networks

Different Tasks

Object Detection Semantic Segmentation Language Modeling

ESPNets: ECCV’18, CVPR’19PRU: EMNLP’18

Page 6: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

My Research

Light-weight, Low latency, and SOTA Neural Networks

Different Tasks

• Object Detection• Semantic Segmentation• Language Modeling• …..

Medical Imaging• JAMA Network Open’19• MICCAI’18 • WACV’18

Different Modalities

Natural Images Whole Slide Images 3D MRIsSketches (Iconary@AI2)

Page 7: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

My Research

Light-weight, Low latency, and SOTA Neural Networks

Different Tasks

• Object Detection• Semantic Segmentation• Language Modeling• …..

Different Modalities• Natural Images• Medical Images• Text• …..

Different Devices

NVIDIA TX2 Mobile Phone

Page 8: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

My Research

Light-weight, Low latency, and SOTA Neural Networks

Different Tasks

• Object Detection• Semantic Segmentation• Language Modeling• …..

Different Modalities• Natural Images• Medical Images• Text• …..

Different Devices• Desktop• Embedded Devices• Mobile Devices• …..

Faster Training Ours: 1 - 1.5 days

SOTA: 4 - 5 days

Page 9: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Outline

• Brief overview about convolutions

• Image Classification• VGG Unit

• ResNet Unit

• ESPNet Unit

• Semantic Segmentation• Vanilla Encoder-Decoder

• U-Net Encoder-Decoder

• Results

Page 10: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Convolutions • A widely used operation in computer vision

• Traditionally, kernels are manually designed• Sobel filter for edge detection• Gaussian filter

• Nowadays, we learn kernels from the data• Convolutional neural networks (CNNs)

• A 𝑛 × 𝑛 kernel• Has 𝑛2 parameters• Performs 𝑛2𝑊𝐻 multiplication-addition

operations

Page 11: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Convolutions• Higher receptive field

• Inserts zeros between kernel elements

• A 𝑛 × 𝑛 kernel with a dilation rate of 𝑟• Has a receptive field of

𝑛 − 1 𝑟 + 1 2

• Learns 𝑛2 parameters

• Performs 𝑛2𝑊𝐻 multiplication-addition operations

Page 12: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Convolution in 3D

12

• Both Input and Kernel are 3D

• A 𝑛 × 𝑛 × 𝐷 kernel • Learns 𝑛2𝐷 parameters

• Performs 𝑛2𝐷𝑊𝐻 operations to produce an output plane

Page 13: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Convolution in 3D

13

• Both Input and Kernel are 3D

• A 𝑛 × 𝑛 × 𝐷 kernel • Learns 𝑛2𝐷 parameters

• Performs n2D𝑊𝐻 operations to produce an output plane

• Multiple independent kernels are applied to produce high-dimensional output

• 𝐾 𝑛 × 𝑛 × 𝐷 kernels• Learns 𝑛2𝐷𝐾 parameters

• Performs 𝑛2𝑊𝐻𝐷𝐾 operations

Page 14: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Depth-wise Convolution

• Improves the efficiency of standard convolutions

• Each convolutional filter is applied per spatial plane

• 𝐷 𝑛 × 𝑛 kernels• Learns 𝑛2𝐷 parameters

• Performs 𝑛2𝑊𝐻𝐷 operations

Page 15: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Image Classification

Page 16: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Image ClassificationCU

DU GAP

FCConvolutional

Unit

Down-sampling Unit

Fully-connectedOr Linear Layer

Global Avg.Pooling

Page 17: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Image Classification

28 x 28 = [28]2

CU DU CU

[28]2 [14]2 [14]2

DU CU

[7]2 [7]2

GAP FC

[1]2

Cat

CU

DU GAP

FCConvolutional

Unit

Down-sampling Unit

Fully-connectedOr Linear Layer

Global Avg.Pooling

Page 18: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Convolutional Unit (CU) - VGG

3x3 Conv Layer

3x3 Conv Layer

VGG: Simonyan, Karen, and Andrew Zisserman. "Verydeep convolutional networks for large-scale imagerecognition." ICLR, 2015.

Page 19: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Convolutional Unit (CU) - VGG

3x3 Conv Layer

3x3 Conv Layer

Image Source (Inception): Szegedy, Christian, et al. "Rethinkingthe inception architecture for computer vision." CVPR. 2016.

Page 20: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Convolutional Unit (CU) - ResNet

3x3 Conv Layer

3x3 Conv Layer

ResNet: He, Kaiming, et al. "Deep residual learning forimage recognition." CVPR. 2016.

Page 21: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Convolutional Unit (CU) - ResNet

3x3 Conv Layer

3x3 Conv Layer

• Element-wise addition of input and output

• Often referred as Residual Connection

• Improves gradient flow and accuracy

• Computationally expensive• Hard to train very deep networks (101-

151 layers)

ResNet: He, Kaiming, et al. "Deep residual learning forimage recognition." CVPR. 2016.

Page 22: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Convolutional Unit (CU) - ResNet

3x3 Conv Layer

1x1 Conv Layer

• Bottleneck unit

• Exercise?• Validate this unit is efficient

1x1 Conv Layer

ResNet: He, Kaiming, et al. "Deep residual learning forimage recognition." CVPR. 2016.

Page 23: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Convolutional Unit (CU) - ResNet

3x3 Depth-Conv Layer

1x1 Conv Layer

• Bottleneck unit with Depth-wise convs• MobileNetv2• ShuffleNetv2

1x1 Conv Layer

• MobileNetv2: Sandler, Mark, et al. "Mobilenetv2: Invertedresiduals and linear bottlenecks." CVPR, 2018.

• ShuffleNetv2: Ma, Ningning, et al. "Shufflenet v2: Practicalguidelines for efficient cnn architecture design." ECCV, 2018.

Page 24: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Convolutional Unit - ESPNet

1x1 Conv Layer

3x3 DilatedConv Layer

3x3 DilatedConv Layer

3x3 DilatedConv Layer

Concat

ESPNet: Mehta, Sachin, et al. “ESPNet: Efficient spatial pyramidof dilated convolutions for semantic segmentation." ECCV, 2018.

Page 25: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

ESPNet Unit

Page 26: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Hierarchical Feature Fusion in ESPNet

1x1 Conv Layer

3x3 DilatedConv Layer

3x3 DilatedConv Layer

3x3 DilatedConv Layer

Concat

Input

Output of ESP module

Page 27: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Gridding Artifact in Dilated Convolutions

Standard convolution

Dilated convolution27

Page 28: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Hierarchical Feature Fusion in ESPNet

1x1 Conv Layer

3x3 DilatedConv Layer

3x3 DilatedConv Layer

3x3 DilatedConv Layer

Concat

Input

Output of ESP module

AddAdd

Page 29: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

• ESPNetv2• 3x3 dilated convolutions are replaced

by 3x3 depth-wise dilated convolutions

Convolutional Unit – ESPNetv2

• ESPNet: Mehta, Sachin, et al. “ESPNet: Efficient spatial pyramid of dilatedconvolutions for semantic segmentation." ECCV, 2018.

• ESPNetv2: Mehta, Sachin, et al. "Espnetv2: A light-weight, power efficient,and general purpose convolutional neural network." CVPR, 2019.

Page 30: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Semantic Segmentation

Page 31: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Image Classification

CU DU CU DU CU GAP FC Cat

CU DU GAPFCConvolutional

UnitDown-sampling

UnitFully-connectedOr Linear Layer

Global Avg.Pooling

Page 32: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Encoder-Decoder

CU DU CU DU CU

CU DU GAPFCConvolutional

UnitDown-sampling

UnitFully-connectedOr Linear Layer

Global Avg.Pooling

Page 33: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Encoder-Decoder

CU DU CU DU CU

CU DU GAPFCConvolutional

UnitDown-sampling

UnitFully-connectedOr Linear Layer

Global Avg.Pooling

UUCUUUCU

UUUp-sampling

Unit

Page 34: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Encoder-Decoder with Skip-Connections (U-Net)

CU DU GAPFCConvolutional

UnitDown-sampling

UnitFully-connectedOr Linear Layer

Global Avg.Pooling

UUUp-sampling

Unit

CU DU CU DU CU

UUCUUUCU

Page 35: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Papers for Semantic Segmentation

• FCN: Long et al. "Fully convolutional networks for semantic segmentation." CVPR, 2015.

• SegNet: Badrinarayanan et al. "Segnet: A deep convolutional encoder-decoder architecture for image segmentation”, PAMI, 2017

• DeepLab: • Chen, Liang-Chieh, et al. "Deeplab: Semantic image segmentation with deep

convolutional nets, atrous convolution, and fully connected crfs." PAMI, 2017.• Chen, Liang-Chieh, et al. "Encoder-decoder with atrous separable convolution for

semantic image segmentation." ECCV, 2018

• UNet: Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." MICCAI, 2015.

Page 36: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Results

Page 37: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Semantic Segmentation

Page 38: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Semantic SegmentationWSIs 3D MRIs

Page 39: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Object Detection

Page 40: Efficient Representation Learning › ~shapiro › EE596 › notes › cnn_19.pdfguidelines for efficient cnn architecture design." ECCV, 2018. Convolutional Unit - ESPNet 1x1 Conv

Thanks!!

https://github.com/sacmehta/

https://sacmehta.github.io/