manmohan chandraker - cseweb.ucsd.educseweb.ucsd.edu/~mkchandraker/classes/cse291/... · microsoft...
TRANSCRIPT
Definingcomputervision• Old:Computerprogramsthatcan
• Processimageinformation• Recognizeinstancesofobjects• Finddistancesofobjects
• Modern:Understandingtheworldbasedonvisualcues• Determiningfactorsthatgovernimageformation• Recognitionacrossvariations• Estimatesemanticpropertiesofascene• Recognizecomplexactions• Predictlong-termbehaviors
Studyingcomputervision• Imagesareeverywherearoundus• Rapidlyemergingtechnologies
Autonomous driving Gaming
Smarthomes Factoryautomation
Studyingcomputervision• Imagesareeverywherearoundus• Rapidlyemergingtechnologies• Deepandattractivescientificproblems
• Howdoanimalsrecognizeobjects?• Whydonewbornbabiesrespondtoface-likeshapes?• Beautifulmarriageofmath,physics,biology,CS,engineering
simply prefer +top-heaviness3 (figure 16(b)). Thus, it remains unclear whether this is a general preference (perhaps with no practical significance) or a face-specific orienting response to prime the infant in bootstrapping its nascent face recognition system. Even if this preference really is an innate face-orienting mechanism, it may be more for the benefit of the mother (e.g., to form the mother-child bond) than the infant3s face processing capabilities. A simple arrangement of three dots within an oval may serve as an appropriate template for detecting faces in the bootstrapping stages of a face-learning system. Similar templates have been used with reasonable success in some applications (for example, Sinha, 2002) of face detection.
(a) (b) Figure 16 (a). Newborns preferentially orient their gaze to the face-like pattern on the left, rather than the one shown on the right, suggesting some innately specified representation for faces. (From Johnson et al, 1991.) (b) As a counterpoint to the idea of innate preferences for faces, Simion et al (2001) have shown that newborns consistently prefer top-heavy patterns (left column) over bottom-heavy ones (right column). It is unclear whether this is the same preference exhibited in earlier work, and if it is, whether it is face-specific or some other general-purpose or artifactual preference. Result 17: The visual system progresses from a piecemeal to a holistic strategy over the first several years of life As discussed in Result 8, normal adults show a remarkable deficit in recognition of inverted faces, but no such deficit for inverted images of non-face objects such as houses. A number of studies have shown, however, that this pattern of results takes many years to develop (Carey and Diamond, 1977; Hay and Cox, 2000; Maurer et al, 2002; Mondloch et al, 2002, 2003; Pellicano and Rhodes, 2003; Schwarzer, 2003). Six year old children are not affected by inversion when it comes to recognizing seen faces in a seen-unseen pair [16]; 8 year olds show some inversion effect and 10 year olds exhibit near adult-like performance (see Fig. 17). Experimenters in (Mondloch et al, 2002) selectively manipulated spacing (moving the location of features on a face) versus features (taking eyes or mouth from different faces) and found that it is specifically sensitivity to spacing manipulations that is impaired when faces are inverted. Interestingly, although six year
Visiontoexploreotherworlds
ImagefromNASA’sMarsExplorationRoverSpirit
§ Panoramastitching§ Stereoimaging§ Navigation§ ....
BroadclassesofvisionapplicationsSense Understand Interface
Human-Human Human-Machine Machine-Machine
Computervisionisalsoridingthewave
• Autonomousdriving(Google,Tesla,Mobileye,....)• Augmentedreality(HoloLens,Oculus,MagicLeap)• Socialnetworks(Google,Facebook,....)• Mobileapplications• Surveillance
Visioninaugmentedrealitydevices
Gazetracking
Headposeestimation
Objectdetection
Semanticsegmentation
Depthestimation
Materialandlighting
estimation
Thehardnessoftheproblem• Findinglocations• Localizeobjects• Estimatedistances• Understandrelations• Beawareoftrafficrules• Predictfuturebehaviors• Understandintentions• Interdependentdecisions
Noveldeeplearningframeworksforself-drivingWeaklysupervisedsegmentation
Highqualitylocalizationofobjectparts.Synthetictraining,robusttoocclusions
Distillationnetworksforobjectdetection
DistillationforcompressedCNN(student) tomimicuncompressedCNN(teacher),toachievegreateraccuracyatthesamespeed.
Simulationsfor3Dreconstruction
Multimodalfuturebehaviorprediction
DESIRE:DeepStochasticIOCRNNEncoder-Decoder• DeepCVAE(autoencoder) togeneratediversehypotheses.• RNNtorankpredictions basedonmotion,sceneandinteractions.• Deepinverseoptimalcontrolforlong-termfuturerewards.
Sparselabelingatone-tenththeexpense.Similaraccuracywithweaksupervision.
Real-timemonocular3Dsceneunderstanding
LIDAR Stereo Monocular
Cost ~$70,000 ~$1,000 ~$200
Maintenance Hard Moderate Easy
Needandbenefitsofmonocular Fundamental challenge:scaledrift
3DPoints
Densestereo
Objectdetection
Keytechnique: learnper-frameadaptivecovariances
Large-scale,real-timeSFMand3Dlocalizationwithasinglecamera.
Accuracycomparabletostereosystems.
Challenge:Limitedviews• Wanttorelateinformationacrossmultipleimages• Withlimitedviews
– Haveto“guess”missingcoverage– Correspondenceacrosswidebaseline– Dealwithappearancechanges
Whyiscomputervisiondifficult?
Intra-classvariation
Backgroundclutter
Motion (Source:S.Lazebnik)
Occlusion
Machinelearning
[RaquelUrtasun]
• TypicallyinCS:writeaprogramtoexecuteasetofrules• Computervision:sometimesveryhardtospecifyrules• Machinelearning:developownprogrambasedonexamples• Trainingdata:input-outputpairs
Machinelearningisakeyplayer• What is it?
• Object and scene recognition
• Who is it?• Identity recognition
• Where is it?• Object detection
• What are they doing?• Activities
• All of these are classification problems• Choose one class from a list of possible candidates
Discriminative models
• Direct modeling of
ZebraNon-zebra
Decisionboundary
)|()|(
imagezebranopimagezebrap
• Model and Generative models
)|( zebraimagep ) |( zebranoimagep
Low Middle
High MiddleàLow
)|( zebranoimagep)|( zebraimagep
TraditionalImageCategorization:Trainingphase
TrainingLabels
Training Images
ClassifierTraining
Training
ImageFeatures
TrainedClassifier
Slide credit: Jia-Bin Huang
TrainingLabels
Training Images
ClassifierTraining
Training
ImageFeatures
TrainedClassifier
ImageFeatures
Testing
Test ImageOutdoorPredictionTrained
Classifier
TraditionalImageCategorization:Testingphase
Slide credit: Jia-Bin Huang
Featureshavebeenkey
SIFT [Lowe IJCV 04] HOG [Dalal and Triggs CVPR 05]
SPM [Lazebnik et al. CVPR 06] Textons
SURF, MSER, LBP, GLOH, …..and many others:
LearningaHierarchyofFeatureExtractors
• Hierarchicalandexpressivefeaturerepresentations• Trainedend-to-end,ratherthanhand-craftedforeachtask• Remarkableintransferringknowledgeacrosstasks
Deeplearninghasopenednewareas• Availabilityoflarge-scaleimageandvideodata• Availabilityofcomputationalpower
– BetterandcheaperGPUs– Cloudcomputingresources
• Betterunderstandingofhowtotraindeepneuralnetworks
• Advantagesavailableformanyareasofcomputervision– Recognizeobjectsacrossshapeandappearancevariations– Data-drivenpriorsfor3Dreconstruction– Predictlong-termfuturebehaviorsincomplexscenes– End-to-endtrainingratherthanexpensivefeaturedesign.
Limitsofdeeplearning• Largescalelabeleddataisnotalwaysavailable• Lackofgeneralizationtounseendomains• Goodatnarrow“classification”,notatbroad“reasoning”• Lackofinterpretability• Lackofreliabilityorsecurityguarantees
Data:hardwareandmodelsscalemorethanlabels
[Sun etal.2017]
• Moredatahelps• 4 TBofdataperdayfromacar• Trainingeffort• Rareeventsmattermore• Purelysupervisedmethodsnotscalable
Interpretabilityofoutputs
Linearregression
Nearestneighbors
SVM
Decisionforests
CNNs
Boosting
Accuracy
Interpretability
Automobile industrywantsmodelsbuiltbycombining validatedcomponents
Trade-offsforvariouslearningapproaches Generativeanddiscriminativemethods
Newapproachestoovercomelimits• Weaksupervision• Semi-supervisedlearning• Self-supervision• Domainadaptation• Adversariallearning• Physicalmodeling
GenerativeAdversarialNetworks
InputforgeneratorGeneratorRealdata
Discriminator Atwo-playergamebetweenthegeneratorandthediscriminator
SemanticallyMeaningfulRepresentationsGraduallyandmeaningfully transformsemanticobjectpartswithinandacrosscategories
Shapearithmetic:high-levelreconfiguration ofsemanticobjectparts
Large-scaleoptimization• Internetimagesposechallengesofscaleandoutliers• Reconstructionswithmillionsofimages• Choicestohandledata• Specificoptimizationapproaches
Figure fromAgarwal etal.
Real-timecomputation• Mobileplatforms,embeddedsystems(IoT devices)• Stringentdemandsoncomputationalresources• Lowpowerplatforms(wattage)forautomobileECUs• Carefullydesignedandmultithreadedarchitectures
Newcombe etal.,CVPR2015SongandChandraker,CVPR2014
Take-homemessage• ComputervisionisakeybranchofAI
• Enablesseveralmodernapplicationsaroundus
• Alotofhighlyvisibleandhigh-impactactivity
• Hugeindustryinterest
• Thisisagreattimetostudycomputervision!
Coursedetails• Eachclasswillcoverpapersincomputervision• Examplesoftopics
• Correspondenceestimation• Opticalflow• Stereo• StructurefromMotion• Imageclassification• Facerecognition• Objectdetection• Semanticsegmentation• Actionrecognition• Behaviorprediction• Humanposeestimation• Materialestimation• Adversariallearning• Domainadaptation
Coursedetails
• Presentationinstructions– Twostudentstopresentineachclass– Discusstopicwithinstructoroneweekinadvance– Sendslidesby9pmtwodaysbeforetheclass– Allowspeakingtimeof30minuteseach(about20slides)– Presentationshouldbewell-organizedandthoughtful– Askquestionsandencouragediscussionsalongtheway
• Eachstudentdoes1-2presentations
Coursedetails
• Presentationcontents– Summarizethetopicandhowthepapersaddressit– Whythetopicisinteresting,ordifficultyoftheproblem– Motivatewithapplications– Keytechnicalideas,whytheyareinteresting– Strengthsandweaknessofproposedmethods– Detailedanalysisofexperiments– Ifpossible,includeownanalysisbasedonauthorcode– Openproblems,extensions,likelyfollow-uppapers
Coursedetails• Reviews
– Daybeforeclass,sendabriefreviewof1paper(email,plaintext)– Emailsubject(exactly):“Wi18:CSE291:<PaperTitle>”
• Reviewformat(followexactly)– Summaryofthepaper(3-5sentences)– Strengths– Weaknesses– Critiqueofexperiments– Suggestionsforimprovement– Possibleextensionsorfollow-ups(comeupwithatleastone)
• Presentersneednotsendinreviewforthatclass• Askquestions,answerthem,engageindiscussions
Coursedetails• Finalproject
– Pickanytopicincomputervision– Discusswithinstructortofinalizetopicandscope– Groupsofthreearerecommended– Deadline:week4
• Optionsforproject– Pickaresearchtopicandimplement– Extendexistingpapersininterestingways– Improveimplementationofexistingcodefrompapers– Conductcriticalsurvey,identifyingpossibleextensions.
Coursedetails• Classwebpage:
– http://cseweb.ucsd.edu/~mkchandraker/classes/CSE291/Winter2018/
• Emails:– Instructor:[email protected]– TA:ShashankShastry ([email protected])
• Grading– 20%presentation– 20%reviews– 30%project– 20%finalexam– 10%participation
• Aimistolearntogether,discussandhavefun!