hiroshi ishikawa - nthu
TRANSCRIPT
![Page 1: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/1.jpg)
Satoshi Iizuka Edgar Simo‐Serra Kazuma Sasaki
Department of Computer Science & EngineeringWaseda University
Hiroshi Ishikawa
Joint work with:
![Page 2: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/2.jpg)
![Page 3: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/3.jpg)
3
Colorization of Black‐and‐white Pictures
![Page 4: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/4.jpg)
4
Fully‐automatic colorization
![Page 5: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/5.jpg)
5
Colorization of Old Films
![Page 6: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/6.jpg)
6
Related Work Scribble‐based [Levin+ 2004; Yatziv+ 2004; An+ 2009; Xu+ 2013; Endo+ 2016]Specify colors with scribblesRequire manual inputs
Reference image‐based [Chia+ 2011; Gupta+ 2012]Transfer colors of reference imagesRequire very similar images
[Levin+ 2004]
[Gupta+ 2012]Input Reference Output
![Page 7: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/7.jpg)
7
Related WorkAutomatic colorization with hand‐crafted features [Cheng+ 2015]Uses existing multiple image featuresComputes chrominance via a shallow neural networkDepends on the performance of semantic segmentationOnly handles simple outdoor scenes
Image features
Chroma OutputInput Neural Network
![Page 8: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/8.jpg)
8
In our work End‐to‐end network jointly learns global and local features for automatic image colorizationFusion layer merges the global and local featuresClassification labels are exploited for learning
Iizuka, Simo‐serra, Ishikawa: “Let there be Color!: Joint End‐to‐end Learning of Global and Local Image Priorsfor Automatic Image Colorization with Simultaneous Classification”, SIGGRAPH2016.
![Page 9: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/9.jpg)
9
Network model
Two branches: local features and global featuresComposition of four networks
Scaling
Chrominance
Low-Level FeaturesNetwork Global Features Network
Mid-Level FeaturesNetwork
Colorization Network
Upsampling
Luminance
Fusion Layer
InputOutput
![Page 10: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/10.jpg)
Low‐Level Features Network
Convolutional layers (3x3 kernel)No pooling Lower resolution with strides > 1
# of feature maps
Convolutional layer
Sharedweights
2 2
4 4
8 8
112 11228 2856 56
512256128
512
![Page 11: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/11.jpg)
Scaling by Strides
Down-convolution
Flat-convolution
Up-convolution
stride
stride
stride
(Not used in colorization. Used in the sketch simplification)
![Page 12: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/12.jpg)
12
Network model
Scaling
Chrominance
Low-Level FeaturesNetwork Global Features Network
Mid-Level FeaturesNetwork
Colorization Network
Upsampling
Luminance
Fusion Layer
Sharedweights
![Page 13: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/13.jpg)
13
Global Features Network
Compute a global 256‐dimensional vector representation of the image
28 2851214 14 7 7
512
1024256
Fully Connected
256‐dim vector
![Page 14: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/14.jpg)
14
Mid‐Level Features Network
Extracts mid‐level features such as texture
Scaling
Chrominance
Low-Level FeaturesNetwork Global Features Network
Mid-Level FeaturesNetwork
Colorization Network
Upsampling
Luminance
Sharedweights
512 256
8 8
![Page 15: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/15.jpg)
15
Fusion Layer
Scaling
Chrominance
Low-Level FeaturesNetwork Global Features Network
Mid-Level FeaturesNetwork
Colorization Network
Upsampling
Luminance
Fusion Layer
Sharedweights
![Page 16: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/16.jpg)
16
Fusion Layer
Combines the global and mid‐level features
256‐dim vector
8 8
256
8 8
512
Global Features Network
Low-Level FeaturesNetwork
Fusion LayerColorization
Network
![Page 17: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/17.jpg)
17
Colorization Network
Scaling
Low-Level FeaturesNetwork Global Features Network
Mid-Level FeaturesNetwork
Colorization Network
Upsampling
Luminance
Fusion Layer
Sharedweights
Compute chrominance from the fused featuresRestore the image to the input resolution
![Page 18: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/18.jpg)
18
TrainingMean Squared Error (MSE) and as loss function for chrominanceCross‐entropy loss for classification networkOptimization using ADADELTA [Zeiler 2012]Adaptively sets a learning rate
Model
Forward
Backward
MSE
Input Output Ground truth
![Page 19: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/19.jpg)
19
Joint Training with Classification
Scaling
Chrominance
Low-Level FeaturesNetwork Global Features Network
Mid-Level FeaturesNetwork
Colorization Network
Upsampling
Luminance
Fusion Layer
Sharedweights
Training for classification jointly with the colorizationClassification network connected to the global features
20.60% Formal Garden16.13% Arch13.50% Abbey
7.07% Botanical Garden6.53% Golf Course
Predicted labels
Classification Network
![Page 20: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/20.jpg)
20
DatasetMIT Places Scene Dataset [Zhou+ 2014]2.3M training images with 205 scene labels pixels
Abbey Airport terminal Aquarium Baseball field
Dining room Forest road Gas station Gift shop
![Page 21: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/21.jpg)
![Page 22: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/22.jpg)
22
Computational Time Colorize within a few seconds
80ms
![Page 23: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/23.jpg)
23
Colorization of MIT Places Dataset
![Page 24: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/24.jpg)
24
Comparisons
Input [Cheng+ 2015] Ours(w/ global features)
Ours(w/o global features)
![Page 25: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/25.jpg)
25
Effectiveness of Global Features
w/ global featuresInput w/o global features
![Page 26: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/26.jpg)
26
Colorization of Historical Photographs
Mount Moran, 1941 Scott's Run, 1937 Youngsters, 1912 Burns Basement, 1910
![Page 27: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/27.jpg)
27
Colorization of Historical Photographs
California National Park, 1936 Homes, 1936 Spinners, 1910 Doffer Boys, 1909
![Page 28: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/28.jpg)
28
Colorization of Historical Photographs
Norris Dam, 1933North Dome, 1936 Miner,1937 Community Center, 1936
![Page 29: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/29.jpg)
29
Limitations Saturation tend to be low
Does not recover true colorsInput Ground truth Output
Input Ground truth Output
![Page 30: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/30.jpg)
![Page 31: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/31.jpg)
31
Rough Sketch Simplification
![Page 32: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/32.jpg)
32
Rough Sketch SimplificationInput: Output:
Simo‐serra, Iizuka, Sasaki, Ishikawa: “Learning to Simplify: Fully Convolutional Networks for Rough Sketch Cleanup”, SIGGRAPH2016.
![Page 33: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/33.jpg)
Rough Sketch SimplificationRough Target Rough Target Rough Target
![Page 34: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/34.jpg)
Related work Sketch simplification
Progressive Online Modification Igarashi et al. 1997, Bae et al. 2008, Grimm and
Joshi 2012, Fišer et al. 2015 Stroke Reduction
Deussen and Strothotte 2000, Wilson and Ma 2004, Grabli et al. 2004, Cole et al. 2006
Stroke Grouping Barla et al. 2005, Orbay and Kara 2011, Liu et al.
2015 Vector input
Vectorization Model Fitting (Bezier, …) Gradient‐based approaches Require fairly clean input sketches
Liu et al. 2015
Noris et al. 2015
![Page 35: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/35.jpg)
Network model 23 convolutional layers Output has the same resolution as the input
Flat-convolution
Up-convolution
Down-convolution
2 2
4 4
8 82 2
4 4
![Page 36: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/36.jpg)
Learning Trained from scratch (about 3 weeks) Using 424x424px patchesWeighted Mean Square Error loss Batch Normalization [Ioffe and Szegedy 2015] is critical Optimized with ADADELTA [Zeiler 2012]
Input Output Target
![Page 37: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/37.jpg)
Sketch dataset 68 pairs of rough and target sketches 5 illustrators
・・・
Extracted patchesSketch dataset
・・・
![Page 38: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/38.jpg)
Inverse Dataset Creation Data quality is critical Creating target sketches from rough sketches has misalignments Creating rough sketches from target sketches properly aligns
Standard Inverse Creation
![Page 39: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/39.jpg)
Data augmentation 68 pairs is insufficient Scaling, random cropping, flipping, and rotation Additional augmentation: tone, slur, and noise
Input NoiseTone Slur
![Page 40: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/40.jpg)
![Page 41: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/41.jpg)
41
Sketch Simplification Experimental Environment Intel(R) Core i7‐5960X CPU @ 3.00GHzNVIDIA GeForce TITAN X GPU Torch framework
About 3 weeks for training Simplification time
![Page 42: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/42.jpg)
42
Comparison
Input OursAdobe Live Trace Potrace
![Page 43: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/43.jpg)
43
Comparison
Input OursAdobe Live Trace Potrace
![Page 44: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/44.jpg)
44
ComparisonInput
[Liuetal.2015]
Ours
(a) Fairy (b) Mouse (c) Duck (d) Car
Vector Input
Raster Input
![Page 45: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/45.jpg)
45
Results
![Page 46: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/46.jpg)
46
Results
![Page 47: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/47.jpg)
47
Results
![Page 48: Hiroshi Ishikawa - NTHU](https://reader031.vdocuments.site/reader031/viewer/2022020620/61e413f28d8e5b56f008f510/html5/thumbnails/48.jpg)
48
ConclusionAutomatic Image colorization Fuse global and local information Joint training of colorization and classification
Automatic Rough Sketch Simplification Convolutional networks are suited to image processing
Proper data is crucial for training