manmohan chandraker - cseweb.ucsd.educseweb.ucsd.edu/~mkchandraker/classes/cse291/... · microsoft...

85
Lecture 0: Introduction CSE 291: Advances in Computer Vision Manmohan Chandraker

Upload: nguyendung

Post on 25-May-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Lecture0:Introduction

CSE291:AdvancesinComputerVisionManmohanChandraker

Definingcomputervision

Wall-E: Fact and Fiction (Minh Do, Princeton University)

Definingcomputervision• Old:Computerprogramsthatcan

• Processimageinformation• Recognizeinstancesofobjects• Finddistancesofobjects

• Modern:Understandingtheworldbasedonvisualcues• Determiningfactorsthatgovernimageformation• Recognitionacrossvariations• Estimatesemanticpropertiesofascene• Recognizecomplexactions• Predictlong-termbehaviors

Studyingcomputervision

Studyingcomputervision• Imagesareeverywherearoundus

Source:Domo

Studyingcomputervision• Imagesareeverywherearoundus• Rapidlyemergingtechnologies

Autonomous driving Gaming

Smarthomes Factoryautomation

Studyingcomputervision• Imagesareeverywherearoundus• Rapidlyemergingtechnologies• Deepandattractivescientificproblems

• Howdoanimalsrecognizeobjects?• Whydonewbornbabiesrespondtoface-likeshapes?• Beautifulmarriageofmath,physics,biology,CS,engineering

simply prefer +top-heaviness3 (figure 16(b)). Thus, it remains unclear whether this is a general preference (perhaps with no practical significance) or a face-specific orienting response to prime the infant in bootstrapping its nascent face recognition system. Even if this preference really is an innate face-orienting mechanism, it may be more for the benefit of the mother (e.g., to form the mother-child bond) than the infant3s face processing capabilities. A simple arrangement of three dots within an oval may serve as an appropriate template for detecting faces in the bootstrapping stages of a face-learning system. Similar templates have been used with reasonable success in some applications (for example, Sinha, 2002) of face detection.

(a) (b) Figure 16 (a). Newborns preferentially orient their gaze to the face-like pattern on the left, rather than the one shown on the right, suggesting some innately specified representation for faces. (From Johnson et al, 1991.) (b) As a counterpoint to the idea of innate preferences for faces, Simion et al (2001) have shown that newborns consistently prefer top-heavy patterns (left column) over bottom-heavy ones (right column). It is unclear whether this is the same preference exhibited in earlier work, and if it is, whether it is face-specific or some other general-purpose or artifactual preference. Result 17: The visual system progresses from a piecemeal to a holistic strategy over the first several years of life As discussed in Result 8, normal adults show a remarkable deficit in recognition of inverted faces, but no such deficit for inverted images of non-face objects such as houses. A number of studies have shown, however, that this pattern of results takes many years to develop (Carey and Diamond, 1977; Hay and Cox, 2000; Maurer et al, 2002; Mondloch et al, 2002, 2003; Pellicano and Rhodes, 2003; Schwarzer, 2003). Six year old children are not affected by inversion when it comes to recognizing seen faces in a seen-unseen pair [16]; 8 year olds show some inversion effect and 10 year olds exhibit near adult-like performance (see Fig. 17). Experimenters in (Mondloch et al, 2002) selectively manipulated spacing (moving the location of features on a face) versus features (taking eyes or mouth from different faces) and found that it is specifically sensitivity to spacing manipulations that is impaired when faces are inverted. Interestingly, although six year

WeUseComputerVision

Computervisioninlivingrooms

MicrosoftKinect Xbox Sportvision firstdownline

Visiontoexploretheworld

Image from Microsoft Virtual Earth

Visiontoexploreotherworlds

ImagefromNASA’sMarsExplorationRoverSpirit

§ Panoramastitching§ Stereoimaging§ Navigation§ ....

Visiontoexploreallworlds

The Matrix movies, ESC Entertainment, XYZRGB, NRC

Includingvirtualones!

OrganizingComputerVision

BroadclassesofvisionapplicationsSense Understand Interface

Reconstruct Recognize Reorganize

BroadclassesofvisionapplicationsSense Understand Interface

Scenes People

BroadclassesofvisionapplicationsSense Understand Interface

Human-Human Human-Machine Machine-Machine

SignificantprogressinrecentyearsSense Understand Interface

AdvancedDriverAssistanceSystems

DeeplearningisrevolutionizingAI

Tic-tac-toe(1952) Checkers(1994) Chess(1997)

Go(2016)Atari(2015)

Computervisionisalsoridingthewave

• Autonomousdriving(Google,Tesla,Mobileye,....)• Augmentedreality(HoloLens,Oculus,MagicLeap)• Socialnetworks(Google,Facebook,....)• Mobileapplications• Surveillance

AugmentedReality

Visioninaugmentedrealitydevices

Gazetracking

Headposeestimation

Objectdetection

Semanticsegmentation

Depthestimation

Materialandlighting

estimation

AutonomousDriving

Autonomousnavigation

Source:Wired

Thehardnessoftheproblem• Findinglocations• Localizeobjects• Estimatedistances• Understandrelations• Beawareoftrafficrules• Predictfuturebehaviors• Understandintentions• Interdependentdecisions

Noveldeeplearningframeworksforself-drivingWeaklysupervisedsegmentation

Highqualitylocalizationofobjectparts.Synthetictraining,robusttoocclusions

Distillationnetworksforobjectdetection

DistillationforcompressedCNN(student) tomimicuncompressedCNN(teacher),toachievegreateraccuracyatthesamespeed.

Simulationsfor3Dreconstruction

Multimodalfuturebehaviorprediction

DESIRE:DeepStochasticIOCRNNEncoder-Decoder• DeepCVAE(autoencoder) togeneratediversehypotheses.• RNNtorankpredictions basedonmotion,sceneandinteractions.• Deepinverseoptimalcontrolforlong-termfuturerewards.

Sparselabelingatone-tenththeexpense.Similaraccuracywithweaksupervision.

Real-timemonocular3Dsceneunderstanding

LIDAR Stereo Monocular

Cost ~$70,000 ~$1,000 ~$200

Maintenance Hard Moderate Easy

Needandbenefitsofmonocular Fundamental challenge:scaledrift

3DPoints

Densestereo

Objectdetection

Keytechnique: learnper-frameadaptivecovariances

Large-scale,real-timeSFMand3Dlocalizationwithasinglecamera.

Accuracycomparabletostereosystems.

PlentyofOtherApplications

Photo-tourism

Source:Snavely etal.

Reconstructingbuildinginteriors

Source:XiaoandFurukawa

Games

Movies

Avatar movie, Zoe Saldana emotes Neyitri (Fox Movie Channel)

Mobilephonesandtablets

PlacerecognitionFacerecognition

AFewChallengesinComputerVision

Challenge:Limitedviews• Wanttorelateinformationacrossmultipleimages• Withlimitedviews

– Haveto“guess”missingcoverage– Correspondenceacrosswidebaseline– Dealwithappearancechanges

Challenge:Lowresolution

Challenge:Complexappearance

Challenge:Non-rigidity

A. Bronstein and M. Bronstein

Challenge:Complexdeformations

Whyiscomputervisiondifficult?

Viewpointvariation

Illumination Scale

Whyiscomputervisiondifficult?

Intra-classvariation

Backgroundclutter

Motion (Source:S.Lazebnik)

Occlusion

MachineLearning

Machinelearning

[RaquelUrtasun]

Machinelearning

[RaquelUrtasun]

• TypicallyinCS:writeaprogramtoexecuteasetofrules• Computervision:sometimesveryhardtospecifyrules• Machinelearning:developownprogrambasedonexamples• Trainingdata:input-outputpairs

So what does recognition involve?

Verification: is that a bus?

Detection: are there cars?

Identification: is that a picture of Mao?

Object categorization

sky

building

flag

wallbanner

bus

cars

bus

face

street lamp

Scene categorization• outdoor• city• traffic• …

Machinelearningisakeyplayer• What is it?

• Object and scene recognition

• Who is it?• Identity recognition

• Where is it?• Object detection

• What are they doing?• Activities

• All of these are classification problems• Choose one class from a list of possible candidates

Discriminative models

• Direct modeling of

ZebraNon-zebra

Decisionboundary

)|()|(

imagezebranopimagezebrap

• Model and Generative models

)|( zebraimagep ) |( zebranoimagep

Low Middle

High MiddleàLow

)|( zebranoimagep)|( zebraimagep

NeuralNetworks

TraditionalImageCategorization:Trainingphase

TrainingLabels

Training Images

ClassifierTraining

Training

ImageFeatures

TrainedClassifier

Slide credit: Jia-Bin Huang

TrainingLabels

Training Images

ClassifierTraining

Training

ImageFeatures

TrainedClassifier

ImageFeatures

Testing

Test ImageOutdoorPredictionTrained

Classifier

TraditionalImageCategorization:Testingphase

Slide credit: Jia-Bin Huang

Featureshavebeenkey

SIFT [Lowe IJCV 04] HOG [Dalal and Triggs CVPR 05]

SPM [Lazebnik et al. CVPR 06] Textons

SURF, MSER, LBP, GLOH, …..and many others:

LearningaHierarchyofFeatureExtractors

• Hierarchicalandexpressivefeaturerepresentations• Trainedend-to-end,ratherthanhand-craftedforeachtask• Remarkableintransferringknowledgeacrosstasks

Significant recent impact on the field

Biglabeleddatasets

Deeplearning

GPUtechnology

Deeplearninghasopenednewareas• Availabilityoflarge-scaleimageandvideodata• Availabilityofcomputationalpower

– BetterandcheaperGPUs– Cloudcomputingresources

• Betterunderstandingofhowtotraindeepneuralnetworks

• Advantagesavailableformanyareasofcomputervision– Recognizeobjectsacrossshapeandappearancevariations– Data-drivenpriorsfor3Dreconstruction– Predictlong-termfuturebehaviorsincomplexscenes– End-to-endtrainingratherthanexpensivefeaturedesign.

Limitsofdeeplearning• Largescalelabeleddataisnotalwaysavailable• Lackofgeneralizationtounseendomains• Goodatnarrow“classification”,notatbroad“reasoning”• Lackofinterpretability• Lackofreliabilityorsecurityguarantees

Data:hardwareandmodelsscalemorethanlabels

[Sun etal.2017]

• Moredatahelps• 4 TBofdataperdayfromacar• Trainingeffort• Rareeventsmattermore• Purelysupervisedmethodsnotscalable

Milestogobefore....

Seatbelts

Speedlimits

Trafficlights

AirbagsABS

ElectronicStabilityControl

Milestogobefore....

Milestogobefore....

Fatalities

Injurie

s

Crashes

Milestogobefore....

Interpretabilityofoutputs

Linearregression

Nearestneighbors

SVM

Decisionforests

CNNs

Boosting

Accuracy

Interpretability

Automobile industrywantsmodelsbuiltbycombining validatedcomponents

Trade-offsforvariouslearningapproaches Generativeanddiscriminativemethods

Objectdetectionforanautorickshaw

Newapproachestoovercomelimits• Weaksupervision• Semi-supervisedlearning• Self-supervision• Domainadaptation• Adversariallearning• Physicalmodeling

GenerativeAdversarialNetworks

InputforgeneratorGeneratorRealdata

Discriminator Atwo-playergamebetweenthegeneratorandthediscriminator

Applicationforfacerecognition

Profile input

Frontaloutput

Speciesorweatherstyletransfer

SemanticallyMeaningfulRepresentationsGraduallyandmeaningfully transformsemanticobjectpartswithinandacrosscategories

Shapearithmetic:high-levelreconfiguration ofsemanticobjectparts

Transferlearningfromsimulations

Imagegenerationfromtext

[Zhangetal.,CVPR2017]

Newdevices• Time-of-flightsensors• Structuredlightsystems• Lightfieldcameras• Codedapertures

Large-scaleoptimization• Internetimagesposechallengesofscaleandoutliers• Reconstructionswithmillionsofimages• Choicestohandledata• Specificoptimizationapproaches

Figure fromAgarwal etal.

Real-timecomputation• Mobileplatforms,embeddedsystems(IoT devices)• Stringentdemandsoncomputationalresources• Lowpowerplatforms(wattage)forautomobileECUs• Carefullydesignedandmultithreadedarchitectures

Newcombe etal.,CVPR2015SongandChandraker,CVPR2014

Take-homemessage• ComputervisionisakeybranchofAI

• Enablesseveralmodernapplicationsaroundus

• Alotofhighlyvisibleandhigh-impactactivity

• Hugeindustryinterest

• Thisisagreattimetostudycomputervision!

CourseDetails

Coursedetails• Eachclasswillcoverpapersincomputervision• Examplesoftopics

• Correspondenceestimation• Opticalflow• Stereo• StructurefromMotion• Imageclassification• Facerecognition• Objectdetection• Semanticsegmentation• Actionrecognition• Behaviorprediction• Humanposeestimation• Materialestimation• Adversariallearning• Domainadaptation

Coursedetails

• Presentationinstructions– Twostudentstopresentineachclass– Discusstopicwithinstructoroneweekinadvance– Sendslidesby9pmtwodaysbeforetheclass– Allowspeakingtimeof30minuteseach(about20slides)– Presentationshouldbewell-organizedandthoughtful– Askquestionsandencouragediscussionsalongtheway

• Eachstudentdoes1-2presentations

Coursedetails

• Presentationcontents– Summarizethetopicandhowthepapersaddressit– Whythetopicisinteresting,ordifficultyoftheproblem– Motivatewithapplications– Keytechnicalideas,whytheyareinteresting– Strengthsandweaknessofproposedmethods– Detailedanalysisofexperiments– Ifpossible,includeownanalysisbasedonauthorcode– Openproblems,extensions,likelyfollow-uppapers

Coursedetails• Reviews

– Daybeforeclass,sendabriefreviewof1paper(email,plaintext)– Emailsubject(exactly):“Wi18:CSE291:<PaperTitle>”

• Reviewformat(followexactly)– Summaryofthepaper(3-5sentences)– Strengths– Weaknesses– Critiqueofexperiments– Suggestionsforimprovement– Possibleextensionsorfollow-ups(comeupwithatleastone)

• Presentersneednotsendinreviewforthatclass• Askquestions,answerthem,engageindiscussions

Coursedetails• Finalproject

– Pickanytopicincomputervision– Discusswithinstructortofinalizetopicandscope– Groupsofthreearerecommended– Deadline:week4

• Optionsforproject– Pickaresearchtopicandimplement– Extendexistingpapersininterestingways– Improveimplementationofexistingcodefrompapers– Conductcriticalsurvey,identifyingpossibleextensions.

Coursedetails• Classwebpage:

– http://cseweb.ucsd.edu/~mkchandraker/classes/CSE291/Winter2018/

• Emails:– Instructor:[email protected]– TA:ShashankShastry ([email protected])

• Grading– 20%presentation– 20%reviews– 30%project– 20%finalexam– 10%participation

• Aimistolearntogether,discussandhavefun!