flownet: learning optical flow with convolutional...

FlowNet:LearningOpticalFlowwithConvolutionalNetworks

PhilippFischer,AlexeyDosovitskiy,EddyIlg,PhilipHausser,CanerHazirbas,VladimirGolkov,PatrickvanderSmagt,DanielCremers,ThomasBrox

Presentedby:NimishSrivastavaFor:CSE291- Advancesin3Dreconstruction,Winter2017,UCSD

Thebigpicture:

• ModelingopticalflowfromCNNsisnotstraightforwardasalthoughCNNsaregoodatlearninghighlevelfeatures,opticalflowtaskneedsfiner(pixelwise)computations.• Tothisendtheauthorspropose2differentCNNarchitectureswhichlearnopticalflow.• Theyalsopointoutthelackofbigdatasetsforopticalflow,whichmayberequiredtotrainneuralnetworksandprovideasyntheticdatasetFlyingChairstoaccountforthisshort-fall.• Fortestingtheirarchitecturetheyuseamorerealisticdataset:MPI-Sintelanditissurprisingtoobservehowanetworktrainedonlyonsyntheticdatageneralizessowellforreallifedatasets

SupplementaryMaterial

• FlowFieldColorCoding:(Figure1)direction:color magnitude:intensity

• ConvolutionalFilters:FirstLayer:notcompletelyconverged,coarsegradientsarestillvisible(Figure3)Filtersappliedtooutputofcorrelationlayer:veryvisiblestructure(Figure5)

• DemoVideo:http://goo.gl/YmMOkR

SupplementaryMaterial• GeneratingFlyingChairs• Simple:Throwchairsateachotherinfrontofahighspeedcamera.Warning:Mayhurt!!

• Mathematical(alsolessrisky,hopefully):-964,1024X768pixelimagesfromFlickrforbackground-809chairmodels:62views(31azimuth2elevations):Sampleduniformly-numberofchairssampleduniformly:[16,24]-locationofchairssampleduniformlyoverthewholeimage-chairsizes:Gaussian(200,2002)-transformations:zoom,rotate,translate(roughlymatchesthatofSintel)-transformationssampledfromamixtureof2distributions:constantmuwithprobability1-pandpowerofGaussianwithprobabilityp(modeledbyaBernoullibeta)

• Givenalltheseparametersitisstraightforwardtogenerateimagepairsfromoneimage,thegroundtruthflowfieldsandocclusionmaps.

• Latereachimage isfurtherdividedinto4quarterswithcorrespondingpairs.

FlyingChairs:Example

RelatedWork

• Variational Approaches• DeepMatching andDeepFlow-convolutionsandmaxpooling-doesnotperformanylearning

• Thisworksonlyusevariationalapproachtorefineontheopticalflowfromthenetwork

• MachineLearningAprroaches• GaussianMixtureModels• PrincipalComponentAnalysis:-basisflows• Unsupervisedlearning:-multiplicativeinteractionsbetweenimagepairs-BoltzmanMachines-Autoencoders:“synchronyautoencoder”

OpticalFlowandCNNs

• Needspreciseperpixellocalizations• Requiresfindingcorrespondence• Learningtomatchimagefeaturesatdifferentlocations• Notmentioned:Localizationofpixelstodifferentlayers/objects

• ToexploittheabilityofCNNstolearnstringfeaturesatmultiplelevelsofscaleandabstraction• NetworkArchitecurewithCorrelationLayer• StandardArchitecture(imageonnextslidewithexplanationdiscussed)

OpticalFlowandCNNs(discussarchitecture)

NetworkArchitecture• ConvolutionalNetworks• Weneednetworkssimilartoonesusedfordepthprediction/edgedetectionetc.forperpixelprediction• ‘Slidingwindow’:computeasinglepredictionforeachinputimagepatchbyapplyingaCNNoneachpatchlikeaslidingwindow-computationalcosts-doesn’taccountglobalproperties• ‘Upsample’featuremaps,stackthemtogetaperpixelfeaturevector• Useof‘upconvolution layer’-coarseprediction+featuremapfromcontractivepartofthenetwork• End-to-endlearning

NetworkArchitecture• FlowNetSimple:-stackbothimagestogetherandfeedthemthroughagenericnetwork(combineandthenproducerepresentations)-largenetwork,localgradientoptimizationlikeSGDcanwork• FlowNetCorr:-twoseparate,yetidenticalprocessingstreams(almostlikeSiamese)-firstproducemeaningfulrepresentations,thencombinethem.-correlationlayer:canbethoughtofconvolvingdatawithafilterwhichisnothingbutthedatacorrespondingtofirstimage-correlationlayerhasnotrainableweights(itcanalsobethoughtofasadirectdotproductof256vectors)

NetworkArchitecture• FlowNetCorr (contd):-Correlationisfoundinasquarewindowcenteredaboutx1 andx2 inthefirstandthesecondimagerespectivelyandhasedgeofK=2k+1

• ForeachwindowittakesO(K2)computationsanddoingthisoverentireplanewillinvolve(w2xh2)wherewandharewidthandheightoftheplanes.Thus,toreducecomputations,limit thisoperationtoarangeofD2 (Dinx&Diny)overthesecondimage,whereD=2d+1• WeendupwithO(D2)computationforeach256featurevectoratx1,ORwxhxD2 computationsintotal(441layersofwxh,whenD=21)• Furtherstridess1 ands2 areusedtoquantizex1 andx2.

NetworkArchitecture• Refinement:• Poolingresultsinreducedresolution• Weneedtorefinethecoarsepooledimagetogetadenseflow• Upconvolution:unpooling+convolution• Concatenatethiswithfeaturemapfromthecontractivepartandcoarserflowprediction• Preservesbothcoarserinformationfromfinallayer(upsampled)andfinerinformationfromlowernetworklayer• Resolutionincreasetwiceaftereachstep,andthisisapplied4times=>¼ofinputresolution• Furtherincreaseinresolutiondoesn’thelpmuchcomparedtobilinearupsampling fromthis¼resolution

• Avariational approachcanbeusedinsteadofbilinearupsampling,whichiscomputationallymoreexpensiveasitrunsiteratively(20iterationstogainfullresolutionand5more)

• Givessmooth,subpixelaccurateflowfields.

• Resultsreportedwith‘+v’

TrainingData• Middleburry-only8imagepairsfortraining-displacementsareverysmall

• KITTI-veryspecialmotiontypes-assumesrigidsceneandmovingobserver

• MPI Sintel-realisticallyrenderedartificialscenes-densegroundtruthflow-stillsmall

• Flying Chairs (discussed)

• DataAugmentation-augmentationwasobservedtobecrucialevenforthelargeflyingchairdataset.-augmentationwasdoneonlinewhiletraining-geometrictransformations,Gaussiannoise,changesinimagepropertieslikebrightness,contrastetc.-thesewereappliedtobothimages-additionalsmallertransformationswereappliedindependentlytoimages,withtheflowfieldbeingadaptedaccordingly-translation:[-20%,20%],rotation:[-170,170],scaling:[0.9,2.0],contrast:[-0.8,0.4],colorchanges[0.5,2],gamma:[0.7,1.5],brightness:Gaussian(0,0.22)

Experiments:NetworkandTrainingDetails

• Network:-9convolutionallayers,6havingstrideof2andReLU non-linearityaftereachlayer-nofullyconnectedlayers:inputimagescouldbearbitrarysizes-Covolutional filter:7X7(firstlayer),5X5(secondandthirdlayer),3X3-numberoffeaturemapsincreases indeeperlayers,roughlydoubleaftereachlayerofstride2-correlationlayer:k=0,d=20,s1=1,s2=2-EndpointError(EPE):Euclideandistancebetweenpredictedandgroundtruthflow-Adamoptimizers-learningrate:1exp(-4)(FlowNetSimple,annealedbyafactorof2after100kiters)

1exp(-6)(FlowNetCorr,forthefirst10kiters andthenFlowNetSimple schedule)

• Data:-FlyingChairs:22,232(training)640(test)-Sintel:908(training)133(test)-Upscalinginputimagesmayimproveperformance:wasusedonlyforFlowNetCorrwithscale=1.25

• Fine-tuning:-finetunetotargetdataset-KITTI:small,sofinetuningonlyonSintel (CleanandFinal)-learningrate:1exp(-6)forseveralthousanditer-validationsettofindoptimalnumberofaboveiterationscanbeused.-resultsreportedwith‘+ft’

Experiments:Results

Experiments:Results• MPI-Sintel:-FlowNetCorr isbetteronSintelClean-FlowNetSimple isbetteronSintelFinal-Keepinmind:trainingwasdoneonsyntheticdataset,itcanbethecasethatFlowNetCorroverfitthedatawhichlackedrealfeatures-noisynon-smoothoutput->largerendpointerror(althoughresultsarevisuallyappealing)

• KITTI:-transformationsareverydifferentfromSintel/FlyingChairs-rawnetworkoutputisfairlygood-Sintelfine-tunednetimprovesresultsforKITTI-variational refinementalsobooststheresult-FlowNetSimple performedbetter

• FlyingChairs:-FlowNetCorr outperformsFlowNetSimpleandtheybothoutperformothers-showssignificanceoftrainingonacloserdatasettowhatisusedfortesting-noimprovementbyvariational refinement(maybebecausethenetworklearntbetter)

• Timings:-bestamongreal-timemethods(twiceasfastatmosttimes)-errorratesslightlybelowstateoftheart

Experiments:Results

Experiments:Analysis

• TrainingData:-trainingwasdoneonSintel(notFlyingChairs)-aggressivedataaugmentationrequired-roughlyverysmalldeviationof1pixelhigherEPEthantheonetrainedonFlyingChairsandfine-tunesonSintel

-trainingonFlyingChairswithoutaugmentation,resultsinanEPEofalmost2pixelshigherwhentestingonSintel.

• ComparingArchitectures:-FlowNetCorr slightlyoverfitsthesyntheticdatasetascomparedtoFlowNetSimple (performanceonSintelFinal)-FlowNetCorr seemstohaveproblemswithlargedisplacements(performanceonKITTI)-s40+:43.3px(FlowNetSimple),48px(FlowNetCorr)-explanation:maximumdisplacementofcorrelationisboundedinthecorrelationlayer,whichcanbeincreased(requiresfurtherstudy)

KeyTakeaways:

• CNNscanbetrainedtomodelopticalflow• UseofcorrelationlayerintheCNNsandvariational refinement• CNNstrainedonsyntheticdatasetscangeneralizeoverrealdatasets(butthiscouldbebecausethetransformationsinFlyingChairswerekeptsimilartoSintel)• Needforlargeopticalflowdatasetsfromrealworld

EpicFlow:EdgePreservingInterpolationofCorrespondencesforOpticalFlow

JeromeRevaud,PhilippeWeinzaepfel,ZaidHarchaoui,CordeliaSchmid

Presentedby:NimishSrivastavaFor:CSE291- Advancesin3Dreconstruction,Winter2017,UCSD

Thebigpicture:• Whatisopticalflow?->patternofapparentmotionofobjects/surfaces/edgesinavisualscenecausedbyrelativemotionbetweencameraandscene

• CoarsetoFineOpticalFlow:

• Demo:StaticScene&MovingCameraStaticCamera&MovingObject

(from:CSE252A,Fall2016,UCSD)usesLucas-KanadeAlgorithmateachleveltoestimateflow

Thebigpicture:

• Coarsetofineopticalflowhaschallengesinmodelingfinerdetailsandthiserrorpropagatesbackwardsfromcoarsertofinerlevels.

ALSO,sinceitsbasedoncorrespondencematching,itdoesnotworkwellwithlargedisplacements/occlusions/discontinuities!!

• Theauthorsproposeanalgorithmforopticalflowestimationwhichisrobusttolargedisplacementswithocclusions.

Itworksinastepwiseapproach,beginningfromadenseedge-preservingapproximationoftheflowandusingthisforvariational energyminimization.(detailslater)

RelatedWork

• EnergyMinimization:-cangetstuckinlocalminima-susceptibletolargedisplacements

• CoarsetoFine:-variational approach,usingdescriptormatching-errorpropagationtofinerscales:detailslost-notheoreticalguaranteeofconvergence

• Penalization:(totackleaboveissues)-DifferencebetweenflowandHOGmatches-DeepMatching:similaritiesofnon-rigidpatches-Segmentfeaturesandkeypoints-Coarse-to-fine:detailslost,errorpropagates

• PatchMatch usingSLICsuperpixel:-betterrespectimageboundaries-nearestneighborfields-SLICsuperpixels onlylocallyaware

• NNFandRANSAC:-motionsegmentation/layeredmodel-multi-labelgraphcut-smallerrorinassignment:drastic

• EdgeBasedAffinities:-piecewiseaffineflow-discretizationvalidonlyforsmalldisplacement

• GlobalNon-convexMatching-independentaffinetransformcalculationforeachpixelbasedonneighborhoodmatches-weightedbyestimationofoccludedareas:binaryclassifier(learning)-expensiveminimization;Contrast:approximateedgeawaregeodesicdistanceusedinthiswork

MajorContributions:• Proposal:EpicFlow-novel,sparse-to-denseinterpolationbasedonedgeawaredistance-robusttoboundaries/occlusions/largedisplacements

• ApproximateGeodesicDistance-significantspeedupwithoutlossofaccuracy

• EmpiricalEvidence

• 2stepsinEpicFlow:1).Sparsetodenseinterpolationofflowusinganapproximategeodesicdistance2).Energyminimizationusingthisestimatedflowasstartingpointtoobtainfinalflow.

SparsetoDenseInterpolation

• SparseCorrespondences:Offtheshelve,stateofthearta).DeepMatchingb).Nearestneighborfield-1024X436image=>5000matches(1matchper90pixels)-Eachmatchdefinedaswherepm isonfirstimageandp’m isonsecondimage

• Sparse-to-DenseInterpolationWeestimateadensecorrespondenceF:I- >I’a).NadarayaWatsonEstimation:sumofmatchesweightedbytheirproximitytoapixelp

SparsetoDenseInterpolation• Sparse-to-DenseInterpolationb).LocallyWeightedAffine(LA)Estimation:Ap andtp arederivedastheleastsquarefittooverdetermindsystem

c).LocalInterpolation:RestrictthesetofmatchesusedintheinterpolationatapixelptoitsKnearestneighborsaccordingtodistancemetricD.

• EdgePreservingDistance

whereC(ps )isthecostofcrossingpixelpsandintegrationisonallpossiblepathsbetweenpandq

Thisworksontheprinciplethatapixelbelongingtoonemotionlayerisclosertoallpixelsonthatlayerandfarofffrompixelsonotherlayers(usingthecosfunction)

SparsetoDenseInterpolation

• EdgePreservingDistanceassumption:imageedgesareasupersetofmotionboundaries

usinggeodesicdistanceDG,neighborsarefoundonthesameobject/parts:

SparsetoDenseInterpolation• FastApproximation:-neighboringpixelsareofteninterpolatedsimilarlya).GeodesicVoronoi Diagram:- clusterthepixelsaroundthematches.L(p)=argmin pm (DG (p,pm ))b).ApprooximateGeodesicDistance:-usingtheneighborhoodgraphofpixelsinmatches,wheretheedgeweightisthecorrespondinggeodesicdistancecalculatedbyDijkstra’salgorithm,andedgesarepresentbetweenpixelswithadjacentclusterswegetanapproximategeodesicdistanceforanypixelp:

c).PiecewiseField:-thedistancebetweenapixelpandamatchpn isthesameasthedistancebetweenpm andpnupto aconstantindependentofpn-alsonearestneighborsofparethenearestneighborsofpm (explain.)so,asdistancesareaddedintheexponentterm:

SparsetoDenseInterpolation• FastApproximation:c).PiecewiseField:Byfollowing,wecanseethatitsufficestocomputethefieldsonlyforthematchesandtopropagateittootherpixels (similarcalculationsforLA,i.e.multiplywithacoeff.onLHSandRHS)

Thisleavesuswith:

OpticalFlowEstimation

• Coarse-to-finevs.EpicFlow:Q).Whyaren’tweusingCoarsetofineopticalflowasthestartingestimation?• Notheoreticalguaranteeaboutconvergenceorenergyminimization• CostmapCwillbecomeirrelevantatcoarserscale,thusedgeawaregeodesiccalculationswilltakeahit• Wealreadyoperateatfullimagescale,withefficientapproximations,sonocoarsetofineerrorpropagation.

OpticalFlowEstimation

• Variational EnergyMinimization:-energy:dataterm+smoothnessdataterm:colorconstancyandgradientconstancysmoothness:penalizeflowofgradientnorm-initialize:sparse-to-denseopticalflowestimatedpreviously-5fixedpointiterations:flowupdates5timesiteratively-eachiterationhas30iterationsofsuccessiveover-relaxationmethod(allthiswasderivedfrompreviousliteraturewhichshowsthatthesemethodswork)

Experiments:DatasetsUsed

• MPI-Sintel: onlyfinalsequencewhichhasrealisticrenderingofblurs/motion/atmospherice effectslikefog.• KITTI:largedisplacements,non-Lambertiansurfaces,lightingconditions• Middlebury: limitedrangeofdisplacements,complexmotion• Optimizeparameters:20%ofSinteltraining• ErrorReported:AverageEndpointError(averageoverdimensionsofflowvector)

Experiments:InputMatches

• Subsample(doesn’tgetlossy):5000matchesper1024X436image

• DeepMatching(Figure6):-imagesdownscaledbyfactor2-implicitreciprocalverification

• Kd-treesandLocalPropagation:-computesdensecorrespondence-noisy:smallpatcheswithoutglobalregularization-explicitreciprocalmatching

• PruningofMatches:-reciprocalmatching:occlusions-lowsaliencypatches:eigenvaluesofautocorrelationmatrix-outliers:consistencycheck- How?

• NWestimatoroninitialmatchesandpruningthosewithdifferencemorethan5pixels.

Experiments:ImpactofDifferentParameters

• MatchesandInterpolators:• DeepMatchingoutperforms(exceptforMiddleburry)

• LocallyAffinegeodesicapproximationperformsbetter(KITTI:planarsurfaces- >affinetransformations)

• Robusttoneighborhoodsize(resultsnotdisplayedexplicitlyinthistable)

Experiments:ImpactofDifferentParameters• Sparse-to-densevs.EpicFlow:• Resultspostenergyminimizationarebetterforeverydataset

• Figure7:resultslooksimilar,butoncloseobservation,minimizationgivesamoresmoothandrefinedflow,preservingfinedetials


• EdgeAwareandEuclideanDist.:• Negligibleimpactofapproximategeodesicdistancecomparedtoexactgeodesicdistance

• SignificantperformancegaincomparedtoEuclideanDistance(AEEaswellasruntimeforapproximategeodesicdistance)

• MixedApproach:NeighborsfoundusingEuclideandistanceandthengeodesicdistanceusedSlightlyworseoff:Figure6:Euclideanneighborsarenotonsameobject/region


• ImpactofContourDetector:• gPb :dropinperf.,holesincontours• CannyEdge:likeEuclideandistance• Normofimage’sgradient• GroundTruthBoundaries:OnlyforMPI-Sintel(KITTIandMiddleburryGTvaluesweren’tdenseenough):performanceimprovement

Experiments:EpicFlowvsCoarse-to-fine• EpicFlow:leadsonKITTI(affineapproximations)andMPI-Sintel• Coarse-to-fine:leadsonMiddleburry (smalldisplacements)• Epicflowpreservesmotionboundaries,detailslikelimbs,occlusions(geodesicdistance)Figure7.

Experiments:EpicFlowvsCoarse-to-fine

• SensitivitytoMatchingQuality• Tothisendsyntheticmatcheswerecreatedfromgroundtruthflows,removingocclusions,subsamplingtoobtaindesireddensityandthenaddingnoisetogetdesirederrorinmatching

• ParameterOptimization:MPI-Sintel(20%)• EpicFlowyieldsbetterresultswithsufficientlydensematching

• Interpolationbasedheuristicalsorecoversfrommatchingfailuresforsufficientlydensematching(toprightofFig8)

• Density:#matches/#non-occludedpixels• MatchingError:#false-matches/#matches

Experiments:ComparisontoStateoftheArt

• MPI-Sintel:AEEimprovementforlargedisplacementsandocclusionsPerformanceimprovement

• KITTIandMiddleburry:Competitiveperformance

• ThiscouldbebecauseKITTIlackeddenseopticalflowregionsinthegroundtruthitself

Experiments:ComparisontoStateoftheArt

• Timing:• DeepMatching:15s(91%)• ExtractingSEDEdges:0.15s• DenseInterpolation:0.25s• VariationalMinimization:1s

• Improvements/fastermatchingalgorithmsrequired

• FailureCases:• Errorsinsparsematches:missingonthinelements(horns&spear)

• ErrorsinContourExtraction:incorrectcontourextraction(hand)

KeyTake-aways

• Coarsetofineopticalflow:noconvergenceguarantee,errorpropagation,failsforlargeoroccludeddisplacements• EpicFlow• Betterinitialheuristic(/matches)improvesresults• Useofgeodesicinformationiskeytoopticalflowestimation• It’snotalwaystheneuralnetsthatspawnmagic!!

flownet: learning optical flow with convolutional...

Documents