ece 5582 computer vision - sce.umkc.edu · ieee computer 1995 special issue on content based image...

Spring 2019: Venu: Haag 315, Time: M/W 4-5:15pm

ECE 5582 Computer VisionLec 01: Introduction

Zhu LiDept of CSEE, UMKC

Office: FH560E, Email: [email protected], Ph: x 2346.http://l.web.umkc.edu/lizhu

Z. Li: ECE 5582 Computer Vision, 2019. p.1

slides created with WPS Office Linux and EqualX LaTex equation editor

Outline

Background Objective of the class Prerequisite Lecture Plan Course Project Q&A


An image is worth a thousand words….

What we observe are pixels…. The story: The train wreck at La Gare

Montparnasse, 1895

What computer can do these days: Figure out the building The train People walking around

Still long way to go to figure out the semantics Train crashes It is an abnormal event (context)


La Gare Montparnasse, 1895

Advances in Image Sensors: pixels and voxels

p.4Z. Li: ECE 5582 Computer Vision, 2019.

More than 20 years of Image Retrieval Research…

IEEE Computer 1995 Special Issue on Content Based Image Retrieval (CBIR)


Dr. Raghavan, VijayDistinguished ProfessorCenter for Adv. Computer StudiesUniv of Louisiana at Lafayette

NSF Digital Librararies Initiative Relevance Feedback in CBIR


MPEG-7 Visual Features (Circa 2003)

Color, Shape, Texture Features for Image Search


Color Texture MotionShape

• Contour Shape

• Region Shape

• Camera motion

• Motion Trajectory

• Parametric motion

• Motion Activity

• Texture Browsing

• Homogeneous texture

• Edge Histogram

1. Histogram

• Scalable Color

• Color Structure

• GOF/GOP

2. Dominant Color

3. Color Layout

ImageNet - Deep Learning Classification

Tasks: Image Classification, Object Detection & Localization 2012: Fisher Vector 2013: Deep Learning ~ Conv Neural Networks (CNN) .e.g. AlexNet 2016: (Very) Deep Learning ~ Residual Neural Networks (ResNet),

K. He, MSRA/FAIR.


MPEG CDVS - Identification Compact Descriptor for Visual Search (CDVS) Object Re-Identification Applications: Navigation, Query by Capture, AR/VR

Technology: Key Point (SIFT) detection Fisher Vector Aggregation and Hashing (for shortlisting) SIFT compression

Performance Verification: 90+% precision on 1% recall Retrieval : mAP in 80~90%.


Depth Sensing/SLAM

Key problems for auto driving cars• Depth from Stereo Images• Optical Flow• Scene Flow• 2D/3D data fusion and registration• Image/3D features for SLAM • Higher level syntactic object/event

recognition


Image Analysis Pipeline Handcrafted Feature Based


Pixels Features Aggregation/Dimension Reduction

Classifier

{Cat, Dog}

Image Analysis Pipeline Holistic Image Analysis Direction Pixel Projection Subspace Models

Convolutional Neural Networks


h

w

X in Rhxw

Y=AX

Outline

Background Objective of the class Prerequisite Lecture Plan Course Project Q&A


Prerequisite & Text book Prerequisite

For senior and graduate students in EE/CS Good Matlab/C programming skills. Some Python is

also desirable. Taken Signal & System, or Digital Signal Processing or

consent of the instructor Will have different expectation for MS/PhD and

undergrad students Textbook:

None required (saving $$) , will distribute relevant chapters, papers, and notes.

Key References: R. Szeliski, Computer Vision: Algorithms and

Applications, Springer, 2014. URL: http://szeliski.org/Book/

J. E. Solem, Programming Computer Vision with Python, O’Reilly, 2015. URL: http://programmingcomputervision.com/downloads/ProgrammingComputerVision_CCdraft.pdf


Tentative Lecture Plan Image Processing Basics

Camera model and image formation Image filtering

Image Features for Retrieval Color Features Texture and Shape Features Basic Image Retrieval System and

Metrics Object Identification in Image

Key Point Detection Key Point Feature Description Fisher Vector Aggregation MPEG Mobile Visual Search

Technology and Standard Holistic Approach in Image

Understanding Subspace methods for face recognition:

Eigenface, Fisherface, Laplacianface. Deep Learning in Image Understanding


Homework 1: Image Filtering and Features

Homework 2: Image Retrieval System

Homework 3:SIFT/VGG Feature Aggregation

Homework 4:Subspace and deep learningmethods in face recognition

Potential Course/MS thesis Project Resources from last year: http://sce2.umkc.edu/csee/lizhu/teaching/2017.spring.image-

analysis/main.html

Potential projects with 25% bonus points Boosting based key point feature detection via box filtering for

Hyperspectral images Compact deep learning models for embedded vision Image + Point Cloud based recognition Image registration with point cloud Image Compression for vision tasks


Embedded Deep Learning for Image Understanding

• Automatic Image Tagging:• Mapping image pixel values to tags• Device based recognition:

• Distilling the knowledge into a compact form• Compact CNN model for knowledge distilling

– BIGDATA training set: >10k images per tag– training set augmentation:

» affine transforms, simulated lighting changes– Pruning filters from the CNN structure


Immersive Visual Communication

Light Field Compression (6-DoF) Sub-Aperture Image Based Tensorial Display Decomposition Based

Point Cloud Compression (6-DoF) Video Based: Views + Depth Binary Tree/Octree Geometry Coding + Graph Signal Compression

360 Video Compression (3-DoF) Advanced Motion Model: Affine motion, Spherical motion models Padding


Deep Learning in Video Compression Deep Learning Chroma prediction (from Luma)

Results: BD rate:

p.19

C1:128@16×16

ConvolutionKernel: 5×5 Convolution

Kernel: 3×3,5×5

C2: (96+32)@16×16

C3: (96+32)@16×16C4: 2@16×16

ConvolutionKernel: 1×1,3×3

ConvolutionKernel: 3×3

F1: 256@1×1F2: 196@1×1

F3: 128@1×1

Tile: 128@16×16

Fusion: 128@16×16

Full Connection Full Connection

Full Connection Tiling

Element wise Product

Input: 1@16×16

Input: 99@1×1

Down-sampled Reconstructed

Luma

Neighboring Reconstructed

Luma & Chroma

Predicted Chroma

deep chroma predicted blocks

Z. Li: ECE 5582 Computer Vision, 2019.

Deep Learning in IBC prediction

IBC with deep learning IBC matching block fuse with intra ref

column/row pixels

Performance :

p.20

Overall, 1.3%, 1.4%, 1.6% bits saving for Y, U, V component, respectively.

Peak performance on Luma component on BQTerrace achieves 3.6% BD-Rate reduction.


Low Light Image Denoising & Understanding

Use cases Low light imaging Low light image understanding

Deep Scaling & Residual learning


Results

Low light image denoising

p.22

SID Unet

Ours


Course Outcome

Upon completion of the course you will be able to: Understand the basic operations in image formation and filtering Understand basic image features for retrieval: color, shape, texture Understand key point features and aggregation in object

identification Understand the holistic appearance modeling approach in image

understanding Understand the latest image analysis and understanding techniques

like deep learning . Can apply the knowledge an algorithms to solve real world image

understanding and retrieval problems Well prepared for conducting advanced research and pursing

career/PhD in this topic area. (PhD qualify required course)


Grading

Homeworks (40%) Image Filtering and Basic Features Image Retrieval System and Performance Metrics Key Point Feature and Fisher Vector Aggregation in Object

Identification Subspace Models in Image Understanding

2 Quizzes (20%) : relax, quiz is actually on me, to see where you guys stand Quiz-1: Part I and II Quiz-2: Part III and IV

Project (40%) Original work leads to publication, discuss with me by the mid of

October. (25% bonus point, worth 50 points) Regular project: assign papers to read, implement certain aspect, and

do a presentation.


Logistics

Office Hour: Tue/Thr 2:30-4:00pm, 560E FH Or by appointment

TA: TBD Lab Sessions are planned to cover certain software tools aspects. Office Hour: TBA

Course Resources: Will share a box.com folder with slides, references, data set, and

software Additional reference, software, and data set will be announced.


NSF Center for Big Learning

Short Bio:

Research Interests: Immersive visual communicaiton: light field, point

cloud and 360 video coding and low latency streaming

Low Light, Res and Quality Image Understanding What DL can do for compression (intra, ibc, sr, inter,

end2end) What compression can do for DL (compression,

acceleration)

signal processing and learning

image understanding visual communication mobile edge computing & communication

p.26

Multimedia Computing & Communication LabUniv of Missouri, Kansas City


Q&A

Q&A


ece 5582 computer vision - sce.umkc.edu · ieee computer 1995 special issue on content based image...

Documents