flownet: learning optical flow with convolutional...
TRANSCRIPT
FlowNet:LearningOpticalFlowwithConvolutionalNetworks
PhilippFischer,AlexeyDosovitskiy,EddyIlg,PhilipHausser,CanerHazirbas,VladimirGolkov,PatrickvanderSmagt,DanielCremers,ThomasBrox
Presentedby:NimishSrivastavaFor:CSE291- Advancesin3Dreconstruction,Winter2017,UCSD
Thebigpicture:
• ModelingopticalflowfromCNNsisnotstraightforwardasalthoughCNNsaregoodatlearninghighlevelfeatures,opticalflowtaskneedsfiner(pixelwise)computations.• Tothisendtheauthorspropose2differentCNNarchitectureswhichlearnopticalflow.• Theyalsopointoutthelackofbigdatasetsforopticalflow,whichmayberequiredtotrainneuralnetworksandprovideasyntheticdatasetFlyingChairstoaccountforthisshort-fall.• Fortestingtheirarchitecturetheyuseamorerealisticdataset:MPI-Sintelanditissurprisingtoobservehowanetworktrainedonlyonsyntheticdatageneralizessowellforreallifedatasets
SupplementaryMaterial
• FlowFieldColorCoding:(Figure1)direction:color magnitude:intensity
• ConvolutionalFilters:FirstLayer:notcompletelyconverged,coarsegradientsarestillvisible(Figure3)Filtersappliedtooutputofcorrelationlayer:veryvisiblestructure(Figure5)
• DemoVideo:http://goo.gl/YmMOkR
SupplementaryMaterial• GeneratingFlyingChairs• Simple:Throwchairsateachotherinfrontofahighspeedcamera.Warning:Mayhurt!!
• Mathematical(alsolessrisky,hopefully):-964,1024X768pixelimagesfromFlickrforbackground-809chairmodels:62views(31azimuth2elevations):Sampleduniformly-numberofchairssampleduniformly:[16,24]-locationofchairssampleduniformlyoverthewholeimage-chairsizes:Gaussian(200,2002)-transformations:zoom,rotate,translate(roughlymatchesthatofSintel)-transformationssampledfromamixtureof2distributions:constantmuwithprobability1-pandpowerofGaussianwithprobabilityp(modeledbyaBernoullibeta)
• Givenalltheseparametersitisstraightforwardtogenerateimagepairsfromoneimage,thegroundtruthflowfieldsandocclusionmaps.
• Latereachimage isfurtherdividedinto4quarterswithcorrespondingpairs.
FlyingChairs:Example
RelatedWork
• Variational Approaches• DeepMatching andDeepFlow-convolutionsandmaxpooling-doesnotperformanylearning
• Thisworksonlyusevariationalapproachtorefineontheopticalflowfromthenetwork
• MachineLearningAprroaches• GaussianMixtureModels• PrincipalComponentAnalysis:-basisflows• Unsupervisedlearning:-multiplicativeinteractionsbetweenimagepairs-BoltzmanMachines-Autoencoders:“synchronyautoencoder”
OpticalFlowandCNNs
• Needspreciseperpixellocalizations• Requiresfindingcorrespondence• Learningtomatchimagefeaturesatdifferentlocations• Notmentioned:Localizationofpixelstodifferentlayers/objects
• ToexploittheabilityofCNNstolearnstringfeaturesatmultiplelevelsofscaleandabstraction• NetworkArchitecurewithCorrelationLayer• StandardArchitecture(imageonnextslidewithexplanationdiscussed)
OpticalFlowandCNNs(discussarchitecture)
NetworkArchitecture• ConvolutionalNetworks• Weneednetworkssimilartoonesusedfordepthprediction/edgedetectionetc.forperpixelprediction• ‘Slidingwindow’:computeasinglepredictionforeachinputimagepatchbyapplyingaCNNoneachpatchlikeaslidingwindow-computationalcosts-doesn’taccountglobalproperties• ‘Upsample’featuremaps,stackthemtogetaperpixelfeaturevector• Useof‘upconvolution layer’-coarseprediction+featuremapfromcontractivepartofthenetwork• End-to-endlearning
NetworkArchitecture• FlowNetSimple:-stackbothimagestogetherandfeedthemthroughagenericnetwork(combineandthenproducerepresentations)-largenetwork,localgradientoptimizationlikeSGDcanwork• FlowNetCorr:-twoseparate,yetidenticalprocessingstreams(almostlikeSiamese)-firstproducemeaningfulrepresentations,thencombinethem.-correlationlayer:canbethoughtofconvolvingdatawithafilterwhichisnothingbutthedatacorrespondingtofirstimage-correlationlayerhasnotrainableweights(itcanalsobethoughtofasadirectdotproductof256vectors)
NetworkArchitecture• FlowNetCorr (contd):-Correlationisfoundinasquarewindowcenteredaboutx1 andx2 inthefirstandthesecondimagerespectivelyandhasedgeofK=2k+1
• ForeachwindowittakesO(K2)computationsanddoingthisoverentireplanewillinvolve(w2xh2)wherewandharewidthandheightoftheplanes.Thus,toreducecomputations,limit thisoperationtoarangeofD2 (Dinx&Diny)overthesecondimage,whereD=2d+1• WeendupwithO(D2)computationforeach256featurevectoratx1,ORwxhxD2 computationsintotal(441layersofwxh,whenD=21)• Furtherstridess1 ands2 areusedtoquantizex1 andx2.
NetworkArchitecture• Refinement:• Poolingresultsinreducedresolution• Weneedtorefinethecoarsepooledimagetogetadenseflow• Upconvolution:unpooling+convolution• Concatenatethiswithfeaturemapfromthecontractivepartandcoarserflowprediction• Preservesbothcoarserinformationfromfinallayer(upsampled)andfinerinformationfromlowernetworklayer• Resolutionincreasetwiceaftereachstep,andthisisapplied4times=>¼ofinputresolution• Furtherincreaseinresolutiondoesn’thelpmuchcomparedtobilinearupsampling fromthis¼resolution
• Avariational approachcanbeusedinsteadofbilinearupsampling,whichiscomputationallymoreexpensiveasitrunsiteratively(20iterationstogainfullresolutionand5more)
• Givessmooth,subpixelaccurateflowfields.
• Resultsreportedwith‘+v’
TrainingData• Middleburry-only8imagepairsfortraining-displacementsareverysmall
• KITTI-veryspecialmotiontypes-assumesrigidsceneandmovingobserver
• MPI Sintel-realisticallyrenderedartificialscenes-densegroundtruthflow-stillsmall
• Flying Chairs (discussed)
• DataAugmentation-augmentationwasobservedtobecrucialevenforthelargeflyingchairdataset.-augmentationwasdoneonlinewhiletraining-geometrictransformations,Gaussiannoise,changesinimagepropertieslikebrightness,contrastetc.-thesewereappliedtobothimages-additionalsmallertransformationswereappliedindependentlytoimages,withtheflowfieldbeingadaptedaccordingly-translation:[-20%,20%],rotation:[-170,170],scaling:[0.9,2.0],contrast:[-0.8,0.4],colorchanges[0.5,2],gamma:[0.7,1.5],brightness:Gaussian(0,0.22)
Experiments:NetworkandTrainingDetails
• Network:-9convolutionallayers,6havingstrideof2andReLU non-linearityaftereachlayer-nofullyconnectedlayers:inputimagescouldbearbitrarysizes-Covolutional filter:7X7(firstlayer),5X5(secondandthirdlayer),3X3-numberoffeaturemapsincreases indeeperlayers,roughlydoubleaftereachlayerofstride2-correlationlayer:k=0,d=20,s1=1,s2=2-EndpointError(EPE):Euclideandistancebetweenpredictedandgroundtruthflow-Adamoptimizers-learningrate:1exp(-4)(FlowNetSimple,annealedbyafactorof2after100kiters)
1exp(-6)(FlowNetCorr,forthefirst10kiters andthenFlowNetSimple schedule)
• Data:-FlyingChairs:22,232(training)640(test)-Sintel:908(training)133(test)-Upscalinginputimagesmayimproveperformance:wasusedonlyforFlowNetCorrwithscale=1.25
• Fine-tuning:-finetunetotargetdataset-KITTI:small,sofinetuningonlyonSintel (CleanandFinal)-learningrate:1exp(-6)forseveralthousanditer-validationsettofindoptimalnumberofaboveiterationscanbeused.-resultsreportedwith‘+ft’
Experiments:Results
Experiments:Results• MPI-Sintel:-FlowNetCorr isbetteronSintelClean-FlowNetSimple isbetteronSintelFinal-Keepinmind:trainingwasdoneonsyntheticdataset,itcanbethecasethatFlowNetCorroverfitthedatawhichlackedrealfeatures-noisynon-smoothoutput->largerendpointerror(althoughresultsarevisuallyappealing)
• KITTI:-transformationsareverydifferentfromSintel/FlyingChairs-rawnetworkoutputisfairlygood-Sintelfine-tunednetimprovesresultsforKITTI-variational refinementalsobooststheresult-FlowNetSimple performedbetter
• FlyingChairs:-FlowNetCorr outperformsFlowNetSimpleandtheybothoutperformothers-showssignificanceoftrainingonacloserdatasettowhatisusedfortesting-noimprovementbyvariational refinement(maybebecausethenetworklearntbetter)
• Timings:-bestamongreal-timemethods(twiceasfastatmosttimes)-errorratesslightlybelowstateoftheart
Experiments:Results
Experiments:Analysis
• TrainingData:-trainingwasdoneonSintel(notFlyingChairs)-aggressivedataaugmentationrequired-roughlyverysmalldeviationof1pixelhigherEPEthantheonetrainedonFlyingChairsandfine-tunesonSintel
-trainingonFlyingChairswithoutaugmentation,resultsinanEPEofalmost2pixelshigherwhentestingonSintel.
• ComparingArchitectures:-FlowNetCorr slightlyoverfitsthesyntheticdatasetascomparedtoFlowNetSimple (performanceonSintelFinal)-FlowNetCorr seemstohaveproblemswithlargedisplacements(performanceonKITTI)-s40+:43.3px(FlowNetSimple),48px(FlowNetCorr)-explanation:maximumdisplacementofcorrelationisboundedinthecorrelationlayer,whichcanbeincreased(requiresfurtherstudy)
KeyTakeaways:
• CNNscanbetrainedtomodelopticalflow• UseofcorrelationlayerintheCNNsandvariational refinement• CNNstrainedonsyntheticdatasetscangeneralizeoverrealdatasets(butthiscouldbebecausethetransformationsinFlyingChairswerekeptsimilartoSintel)• Needforlargeopticalflowdatasetsfromrealworld
EpicFlow:EdgePreservingInterpolationofCorrespondencesforOpticalFlow
JeromeRevaud,PhilippeWeinzaepfel,ZaidHarchaoui,CordeliaSchmid
Presentedby:NimishSrivastavaFor:CSE291- Advancesin3Dreconstruction,Winter2017,UCSD
Thebigpicture:• Whatisopticalflow?->patternofapparentmotionofobjects/surfaces/edgesinavisualscenecausedbyrelativemotionbetweencameraandscene
• CoarsetoFineOpticalFlow:
• Demo:StaticScene&MovingCameraStaticCamera&MovingObject
(from:CSE252A,Fall2016,UCSD)usesLucas-KanadeAlgorithmateachleveltoestimateflow
Thebigpicture:
• Coarsetofineopticalflowhaschallengesinmodelingfinerdetailsandthiserrorpropagatesbackwardsfromcoarsertofinerlevels.
ALSO,sinceitsbasedoncorrespondencematching,itdoesnotworkwellwithlargedisplacements/occlusions/discontinuities!!
• Theauthorsproposeanalgorithmforopticalflowestimationwhichisrobusttolargedisplacementswithocclusions.
Itworksinastepwiseapproach,beginningfromadenseedge-preservingapproximationoftheflowandusingthisforvariational energyminimization.(detailslater)
RelatedWork
• EnergyMinimization:-cangetstuckinlocalminima-susceptibletolargedisplacements
• CoarsetoFine:-variational approach,usingdescriptormatching-errorpropagationtofinerscales:detailslost-notheoreticalguaranteeofconvergence
• Penalization:(totackleaboveissues)-DifferencebetweenflowandHOGmatches-DeepMatching:similaritiesofnon-rigidpatches-Segmentfeaturesandkeypoints-Coarse-to-fine:detailslost,errorpropagates
• PatchMatch usingSLICsuperpixel:-betterrespectimageboundaries-nearestneighborfields-SLICsuperpixels onlylocallyaware
• NNFandRANSAC:-motionsegmentation/layeredmodel-multi-labelgraphcut-smallerrorinassignment:drastic
• EdgeBasedAffinities:-piecewiseaffineflow-discretizationvalidonlyforsmalldisplacement
• GlobalNon-convexMatching-independentaffinetransformcalculationforeachpixelbasedonneighborhoodmatches-weightedbyestimationofoccludedareas:binaryclassifier(learning)-expensiveminimization;Contrast:approximateedgeawaregeodesicdistanceusedinthiswork
MajorContributions:• Proposal:EpicFlow-novel,sparse-to-denseinterpolationbasedonedgeawaredistance-robusttoboundaries/occlusions/largedisplacements
• ApproximateGeodesicDistance-significantspeedupwithoutlossofaccuracy
• EmpiricalEvidence
• 2stepsinEpicFlow:1).Sparsetodenseinterpolationofflowusinganapproximategeodesicdistance2).Energyminimizationusingthisestimatedflowasstartingpointtoobtainfinalflow.
SparsetoDenseInterpolation
• SparseCorrespondences:Offtheshelve,stateofthearta).DeepMatchingb).Nearestneighborfield-1024X436image=>5000matches(1matchper90pixels)-Eachmatchdefinedaswherepm isonfirstimageandp’m isonsecondimage
• Sparse-to-DenseInterpolationWeestimateadensecorrespondenceF:I- >I’a).NadarayaWatsonEstimation:sumofmatchesweightedbytheirproximitytoapixelp
SparsetoDenseInterpolation• Sparse-to-DenseInterpolationb).LocallyWeightedAffine(LA)Estimation:Ap andtp arederivedastheleastsquarefittooverdetermindsystem
c).LocalInterpolation:RestrictthesetofmatchesusedintheinterpolationatapixelptoitsKnearestneighborsaccordingtodistancemetricD.
• EdgePreservingDistance
whereC(ps )isthecostofcrossingpixelpsandintegrationisonallpossiblepathsbetweenpandq
Thisworksontheprinciplethatapixelbelongingtoonemotionlayerisclosertoallpixelsonthatlayerandfarofffrompixelsonotherlayers(usingthecosfunction)
SparsetoDenseInterpolation
• EdgePreservingDistanceassumption:imageedgesareasupersetofmotionboundaries
usinggeodesicdistanceDG,neighborsarefoundonthesameobject/parts:
SparsetoDenseInterpolation• FastApproximation:-neighboringpixelsareofteninterpolatedsimilarlya).GeodesicVoronoi Diagram:- clusterthepixelsaroundthematches.L(p)=argmin pm (DG (p,pm ))b).ApprooximateGeodesicDistance:-usingtheneighborhoodgraphofpixelsinmatches,wheretheedgeweightisthecorrespondinggeodesicdistancecalculatedbyDijkstra’salgorithm,andedgesarepresentbetweenpixelswithadjacentclusterswegetanapproximategeodesicdistanceforanypixelp:
c).PiecewiseField:-thedistancebetweenapixelpandamatchpn isthesameasthedistancebetweenpm andpnupto aconstantindependentofpn-alsonearestneighborsofparethenearestneighborsofpm (explain.)so,asdistancesareaddedintheexponentterm:
SparsetoDenseInterpolation• FastApproximation:c).PiecewiseField:Byfollowing,wecanseethatitsufficestocomputethefieldsonlyforthematchesandtopropagateittootherpixels (similarcalculationsforLA,i.e.multiplywithacoeff.onLHSandRHS)
Thisleavesuswith:
OpticalFlowEstimation
• Coarse-to-finevs.EpicFlow:Q).Whyaren’tweusingCoarsetofineopticalflowasthestartingestimation?• Notheoreticalguaranteeaboutconvergenceorenergyminimization• CostmapCwillbecomeirrelevantatcoarserscale,thusedgeawaregeodesiccalculationswilltakeahit• Wealreadyoperateatfullimagescale,withefficientapproximations,sonocoarsetofineerrorpropagation.
OpticalFlowEstimation
• Variational EnergyMinimization:-energy:dataterm+smoothnessdataterm:colorconstancyandgradientconstancysmoothness:penalizeflowofgradientnorm-initialize:sparse-to-denseopticalflowestimatedpreviously-5fixedpointiterations:flowupdates5timesiteratively-eachiterationhas30iterationsofsuccessiveover-relaxationmethod(allthiswasderivedfrompreviousliteraturewhichshowsthatthesemethodswork)
Experiments:DatasetsUsed
• MPI-Sintel: onlyfinalsequencewhichhasrealisticrenderingofblurs/motion/atmospherice effectslikefog.• KITTI:largedisplacements,non-Lambertiansurfaces,lightingconditions• Middlebury: limitedrangeofdisplacements,complexmotion• Optimizeparameters:20%ofSinteltraining• ErrorReported:AverageEndpointError(averageoverdimensionsofflowvector)
Experiments:InputMatches
• Subsample(doesn’tgetlossy):5000matchesper1024X436image
• DeepMatching(Figure6):-imagesdownscaledbyfactor2-implicitreciprocalverification
• Kd-treesandLocalPropagation:-computesdensecorrespondence-noisy:smallpatcheswithoutglobalregularization-explicitreciprocalmatching
• PruningofMatches:-reciprocalmatching:occlusions-lowsaliencypatches:eigenvaluesofautocorrelationmatrix-outliers:consistencycheck- How?
• NWestimatoroninitialmatchesandpruningthosewithdifferencemorethan5pixels.
Experiments:ImpactofDifferentParameters
• MatchesandInterpolators:• DeepMatchingoutperforms(exceptforMiddleburry)
• LocallyAffinegeodesicapproximationperformsbetter(KITTI:planarsurfaces- >affinetransformations)
• Robusttoneighborhoodsize(resultsnotdisplayedexplicitlyinthistable)
Experiments:ImpactofDifferentParameters• Sparse-to-densevs.EpicFlow:• Resultspostenergyminimizationarebetterforeverydataset
• Figure7:resultslooksimilar,butoncloseobservation,minimizationgivesamoresmoothandrefinedflow,preservingfinedetials
Experiments:ImpactofDifferentParameters
• EdgeAwareandEuclideanDist.:• Negligibleimpactofapproximategeodesicdistancecomparedtoexactgeodesicdistance
• SignificantperformancegaincomparedtoEuclideanDistance(AEEaswellasruntimeforapproximategeodesicdistance)
• MixedApproach:NeighborsfoundusingEuclideandistanceandthengeodesicdistanceusedSlightlyworseoff:Figure6:Euclideanneighborsarenotonsameobject/region
Experiments:ImpactofDifferentParameters
• ImpactofContourDetector:• gPb :dropinperf.,holesincontours• CannyEdge:likeEuclideandistance• Normofimage’sgradient• GroundTruthBoundaries:OnlyforMPI-Sintel(KITTIandMiddleburryGTvaluesweren’tdenseenough):performanceimprovement
Experiments:EpicFlowvsCoarse-to-fine• EpicFlow:leadsonKITTI(affineapproximations)andMPI-Sintel• Coarse-to-fine:leadsonMiddleburry (smalldisplacements)• Epicflowpreservesmotionboundaries,detailslikelimbs,occlusions(geodesicdistance)Figure7.
Experiments:EpicFlowvsCoarse-to-fine
• SensitivitytoMatchingQuality• Tothisendsyntheticmatcheswerecreatedfromgroundtruthflows,removingocclusions,subsamplingtoobtaindesireddensityandthenaddingnoisetogetdesirederrorinmatching
• ParameterOptimization:MPI-Sintel(20%)• EpicFlowyieldsbetterresultswithsufficientlydensematching
• Interpolationbasedheuristicalsorecoversfrommatchingfailuresforsufficientlydensematching(toprightofFig8)
• Density:#matches/#non-occludedpixels• MatchingError:#false-matches/#matches
Experiments:ComparisontoStateoftheArt
• MPI-Sintel:AEEimprovementforlargedisplacementsandocclusionsPerformanceimprovement
• KITTIandMiddleburry:Competitiveperformance
• ThiscouldbebecauseKITTIlackeddenseopticalflowregionsinthegroundtruthitself
Experiments:ComparisontoStateoftheArt
• Timing:• DeepMatching:15s(91%)• ExtractingSEDEdges:0.15s• DenseInterpolation:0.25s• VariationalMinimization:1s
• Improvements/fastermatchingalgorithmsrequired
• FailureCases:• Errorsinsparsematches:missingonthinelements(horns&spear)
• ErrorsinContourExtraction:incorrectcontourextraction(hand)
KeyTake-aways
• Coarsetofineopticalflow:noconvergenceguarantee,errorpropagation,failsforlargeoroccludeddisplacements• EpicFlow• Betterinitialheuristic(/matches)improvesresults• Useofgeodesicinformationiskeytoopticalflowestimation• It’snotalwaystheneuralnetsthatspawnmagic!!