unsupervised visual representation learning by context prediction
TRANSCRIPT
![Page 1: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/1.jpg)
UnsupervisedVisualRepresentationLearningbyContextPrediction
Mostslidesinthisrepresentationareadoptedfromauthors'originalpresentationatICCV2015
Berkan Demirel
![Page 2: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/2.jpg)
ImageNet +DeepLearning
Beagle
- ImageRetrieval- Detection(RCNN)- Segmentation(FCN)- DepthEstimation- …
![Page 3: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/3.jpg)
ImageNet +DeepLearning
Beagle
Dowe needsemanticlabels?Pose?
Boundaries?Geometry?
Parts?Materials?
![Page 4: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/4.jpg)
ContextasSupervision[Collobert&Weston2008;Mikolov etal.2013]
DeepNet
![Page 5: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/5.jpg)
ContextPredictionforImages
A B
? ? ?
??
? ? ?
![Page 6: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/6.jpg)
Semanticsfromanon-semantictask
![Page 7: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/7.jpg)
RandomlySamplePatchSampleSecondPatch
CNN CNN
Classifier
RelativePositionTask8possiblelocations
![Page 8: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/8.jpg)
CNN CNN
Classifier
PatchEmbedding
Input NearestNeighbors
CNN Note:connectsacross instances!
![Page 9: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/9.jpg)
Architecture
Patch2Patch1
Fullyconnected
MaxPoolingLRN
MaxPoolingLRN
ConvolutionConvolutionConvolution
Convolution
Convolution
MaxPooling
MaxPoolingLRN
MaxPoolingLRN
Fullyconnected
ConvolutionConvolutionConvolution
Convolution
Convolution
MaxPooling
Softmax loss
Fullyconnected
Fullyconnected
TiedWeights
![Page 10: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/10.jpg)
AvoidingTrivialShortcuts
Includeagap
Jitterthepatchlocations
![Page 11: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/11.jpg)
PositioninImage
ANot-So“Trivial”Shortcut
![Page 12: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/12.jpg)
ChromaticAberration
![Page 13: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/13.jpg)
Solutions
ColorDroppingRandomlydrop2ofthe3colorchannelsfromeachpatch.Then,replacingthedroppedcolorswithGaussianNoise(standarddeviation~1/100thestandard
deviationoftheremainingchannel).
ProjectionShiftgreenandmagenta(red+blue)towardsgray
![Page 14: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/14.jpg)
ImplementationDetails• TrainontheImageNet2012trainingset(1.3Mimages),usingonlytheimagesanddiscarding
thelabels.• Resizeeachimagetobetween150Kand450Ktotalpixels,preservingtheaspect-ratio.• Samplepatchesatresolution96-by-96.• Samplethepatchesfromagridlikepattern.Eachsampledpatchcanparticipateinasmanyas
8separatepairings.• Allowagapof48pixelsbetweenthesampledpatchesinthegrid,butalsojitterthe location
ofeachpatchinte gridby–7to7pixelsineachdirection.• Preprocesspatchesby(1)meansubstraction,(2)projectingordroppingcolors,(3)randomly
downsamplingsomepatchestoaslittleas100totalpixels,andthenupsamplingit,tobuildrobustness topixelation.
• Usebatchnormalization,without thescaleandshift.
![Page 15: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/15.jpg)
Experiments• ChromaticAberration• Nearest-NeighborMatching• ObjectDetection• GeometryEstimation• VisualDataMining• LayoutPrediction
![Page 16: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/16.jpg)
ChromaticAberration
CNN
![Page 17: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/17.jpg)
ChromaticAberration
CNN
![Page 18: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/18.jpg)
Nearest-NeighborMatching• fc6layerfeaturesandonlyoneofthetwostacksareused.• fc7andhigherlayersareremoved.• Normalizedcrosscorrelationisusedtofindsimilarpatches• Randomlyselected96x96patchesareusedinthecomparison.
![Page 19: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/19.jpg)
Ours
Whatislearned?
Input RandomInitialization ImageNet AlexNet
![Page 20: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/20.jpg)
Stilldon’tcaptureeverythingInput Ours RandomInitialization ImageNet AlexNet
Youdon’talwaysneedtolearn!Input Ours RandomInitialization ImageNet AlexNet
![Page 21: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/21.jpg)
ObjectDetection
Pre-trainonrelative-positiontask,w/olabels
[Girshick etal.2014]
![Page 22: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/22.jpg)
ObjectDetection
[Girshick etal.2014]
![Page 23: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/23.jpg)
ObjectDetection
[Girshick etal.2014]
![Page 24: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/24.jpg)
Multi-TaskTraining?
![Page 25: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/25.jpg)
Surface-normalEstimation
Error (LowerBetter) %GoodPixels(HigherBetter)
NoPretraining 38.6 26.5 33.1 46.8 52.5Unsup.Track. 34.2 21.9 35.7 50.6 57.0Ours 33.2 21.3 36.0 51.2 57.8ImageNet Labels 33.3 20.8 36.7 51.7 58.1
![Page 26: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/26.jpg)
VisualDataMining• Sampleaconstellationoffouradjacentpatchesfroman
image(weusefourtoreducethelikelihoodofamatchingspatialarrangementhappeningbychance).
• Findtop100imageswhichhavethestrongestmatchesforallfourpatches,ignoringspatiallayout.
• Useatypeofageometricverificationtofilterawaytheimageswherethefourmatchesarenotgeometricallyconsistent.
• ApplythedescribedminingalgorithmtoPascalVOC2011.
![Page 27: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/27.jpg)
VisualDataMining
…
ViaGeometricVerification
Simplifiedfrom[Chumetal2007]
![Page 28: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/28.jpg)
MinedfromPascalVOC2011
![Page 29: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/29.jpg)
LayoutPredictionVisualDataMiningAlgorithmresultsfor15,000StreetViewimagesfromParis
![Page 30: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/30.jpg)
Purity Test
![Page 31: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/31.jpg)
So,doweneedsemanticlabels?
![Page 32: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/32.jpg)
SourceCode&SupplementaryMaterials
• MagicInit• UnsupervisedVisualRepresentationLearningbyContextPrediction• VisualDataMiningResultsonunlabeledPASCALVOC2011Images• NearestNeighborsonPASCALVOC2007• More
![Page 33: Unsupervised Visual Representation Learning by Context Prediction](https://reader034.vdocuments.site/reader034/viewer/2022052219/58a2d1241a28ab02228b6745/html5/thumbnails/33.jpg)
THANKYOU!