deepstylecam: a real-time style transfer app on...

・We modified [1] can train multiple styles at the same time − adding a fusion layer and a style input stream(inspired by [3]) ・Training − We input sample images to the content stream and style images to the style stream. (The training method is the same as [1]) ・ We shrunk the ConvDeconvNetwork compared to [1] − added one down-sampling layer and up-sampling layer − replaced 9x9 kernels with smaller 5x5 kernels in the first and last convolutional layers − reduced five Residual Elements into three 2. Proposed System ・Multi Style Transfer App (only iOS) ・Recommend only for iPhone 7/6s/SE and iPad Pro DeepStyleCam: A Real-time Style Transfer App on iOS Ryosuke Tanno, Shin Matsuo, Wataru Shimoda, Keiji Yanai The University of the Electro-Communications, Tokyo, Japan 1. Introduction 5. Multiple Style Transfer App [1] J. Johnson et al.: Perceptual Losses for Real-Time Style Transfer and Super-Resolution, ECCV, 2016. [2] L. A. Gatys et al.: Image style transfer using convolutional neural networks, CVPR, 2016. [3] S. Iizuka et al.: Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification, SIGGRAPH, 2016. [4] K. Yanai et al.: Efficient mobile implementation of a cnn-based object recognition system, ACM MM, 2016. [5]L. A. Gatys et al.: Preserving color in neural artistic style transfer, ArXiv:1606.05897, 2016. 4. Demo Video Ex. Image Size: 250x250 ， Computation: 1,303,800,800 times(13billion) Parameter num: 1,250,835 175ms (iPhone7+) 180ms (iPad Pro) 200ms (iPhone SE) Reference conv_1 5x5x16 stride=1 conv_2 4x4x32 stride=2 conv_3 4x4x64 stride=2 conv_5,6 3x3x128 conv_6,7 3x3x128 conv_8,9 3x3x128 conv_7 4x4x64 stride=1/2 conv_8 4x4x32 stride=1/2 conv_9 4x4x16 stride=1/2 conv_4 4x4x128 stride=2 conv_10 5x5x3 stride=1 short-cut short-cut short-cut conv2_1 4x4x32 stride=2 conv2_2 4x4x64 stride=2 conv2_3 4x4x64 stride=2 conv2_4 4x4x128 stride=2 Global Average Pooling (1x1x128) Fusion layer Duplicate 1x1x128 ⇒ NxNx128 Content image Style image ConvDeconvNet with Style Input Unpooling = stride 1/2 Nomal mode Color Preserving mode[5] ・In 2015, Gatys et al.[2] proposed an algorithm on neural artistic style transfer − synthesizes an image which has the style of a given style image and the contents of a given content image using CNN ・However, since the method proposed by Gatys et al.[2] required forward and backward computation many times − the processing time tends to become longer (several seconds) even using GPU ・ Then, several methods using only feed-forward computation of CNN to realize style transfer have been proposed so far. − Johnson et al.[1] proposed perceptual loss functions to train the ConvDeconvNetwork as a feed-forward style transfer network ・ However, the ConvDeconvNetwork trained by their method can treat only one fixed style. − If transferring ten kinds of styles, we have to train ten different ConvDeconvNetwork independently. − This is not good for mobile implementation in terms of required memory size. Then, we modified Johnson et al.’s method so that one ConvDeconvNetwork can train multiple styles at the same time 3. Example Style Style Content Content Style Content Mixed Style Mixed Style Mixed Style Style image 2 Training Phase weighted sum Partial Style Mixed Style Partial Style Transfer We create the style image by horizontally concatenating the 4 images on the left side. Partial Style Transfer Output Images Partial Style Mixed Style Single Style

Upload: others

Post on 16-Aug-2020

4 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: DeepStyleCam: A Real-time Style Transfer App on iOSimg.cs.uec.ac.jp/pub/conf16/170103tanno_7_ppt.pdf · ・Recommend only for iPhone 7/6s/SE and iPad Pro x x 2 DeepStyleCam: A Real-time

・We modified [1] can train multiple styles at the same time − adding a fusion layer and a style input stream(inspired by [3])

・Training− We input sample images to the content stream

and style images to the style stream.(The training method is the same as [1])

・We shrunk the ConvDeconvNetwork compared to [1]− added one down-sampling layer and up-sampling layer− replaced 9x9 kernels with smaller 5x5 kernels

in the first and last convolutional layers− reduced five Residual Elements into three

2. Proposed System

・Multi Style Transfer App (only iOS)

・Recommend only for iPhone 7/6s/SE and iPad Pro

DeepStyleCam: A Real-time Style Transfer App on iOS

Ryosuke Tanno, Shin Matsuo, Wataru Shimoda, Keiji YanaiThe University of the Electro-Communications, Tokyo, Japan

1. Introduction

5. Multiple Style Transfer App

[1] J. Johnson et al.: Perceptual Losses for Real-Time Style Transfer and Super-Resolution, ECCV, 2016. [2] L. A. Gatys et al.: Image style transfer using convolutional neural networks, CVPR, 2016.[3] S. Iizuka et al.: Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification, SIGGRAPH, 2016.[4] K. Yanai et al.: Efficient mobile implementation of a cnn-based object recognition system, ACM MM, 2016.[5]L. A. Gatys et al.: Preserving color in neural artistic style transfer, ArXiv:1606.05897, 2016.

4. Demo Video

Ex. Image Size: 250x250 ，Computation: 1,303,800,800 times(13billion)Parameter num: 1,250,835

175ms (iPhone7+)180ms (iPad Pro) 200ms (iPhone SE)

Reference

conv_15x5x16 stride=1

conv_24x4x32 stride=2

conv_34x4x64 stride=2

conv_5,63x3x128

conv_6,73x3x128

conv_8,9 3x3x128 conv_7

4x4x64 stride=1/2 conv_8

4x4x32stride=1/2

conv_94x4x16

stride=1/2

conv_44x4x128 stride=2

conv_10 5x5x3

stride=1

short-cut short-cut short-cut

conv2_14x4x32 stride=2

conv2_24x4x64 stride=2

conv2_34x4x64 stride=2

conv2_44x4x128stride=2

Global Average Pooling

(1x1x128)

Fusionlayer

Duplicate1x1x128 ⇒NxNx128

Content image

Style image

ConvDeconvNetwith Style Input

Unpooling= stride 1/2

Nomal mode

Color Preserving mode[5]

・In 2015, Gatys et al.[2] proposed an algorithm on neural artistic style transfer

− synthesizes an image which has the style of a given style image

and the contents of a given content image using CNN・However, since the method proposed by Gatys et al.[2] required forward and backward computation many times

− the processing time tends to become longer (several seconds)

even using GPU

・ Then, several methods using only feed-forward computation of CNN to realize style transfer have been proposed so far.− Johnson et al.[1] proposed perceptual loss functions to train the

ConvDeconvNetwork as a feed-forward style transfer network・ However, the ConvDeconvNetwork trained by their method can treat only

one fixed style. − If transferring ten kinds of styles, we have to train ten different

ConvDeconvNetwork independently. − This is not good for mobile implementation in terms of required

memory size.

Then, we modified Johnson et al.’s method so that one ConvDeconvNetwork can train multiple styles at the same time

3. Example

Styl

eSt

yle

Content

DownLoad

https://goo.gl/vmkg2j