machine learning for visual analytics and compression

15
1 © 2018 NYU WIRELESS 1 Machine Learning for Visual Analytics and Compression YAO WANG, PROFESSOR, ELECTRICAL AND COMPUTER ENGINEERING AND BIOMEDICAL ENGINEERING TANDON SCHOOL OF ENGINEERING NEW YORK UNIVERSITY HTTPS:// WP.NYU.EDU / VIDEOLAB / OCT. 10, 2018

Upload: others

Post on 13-Apr-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning for Visual Analytics and Compression

1© 2018 NYU WIRELESS 1

Machine Learning for Visual Analytics and Compression

YAO WANG, PROFESSOR, ELECTRICAL AND COMPUTER ENGINEERING AND BIOMEDICAL ENGINEERINGTANDON SCHOOL OF ENGINEERINGNEW YORK UNIVERSITYHTTPS:/ /WP.NYU.EDU/VIDEOLAB/OCT. 10, 2018

Page 2: Machine Learning for Visual Analytics and Compression

• Joint optimization of video coding and delivery in networked video applications

• 360 degree video streaming

• Medical image analysis and applications

• Video analytics

• Video coding and video adaptation

• Perceptual video quality modeling

2

Current and Recent Research Interests

Page 3: Machine Learning for Visual Analytics and Compression

Robust Vehicle Tracking at UrbanIntersections

3

• Challenges• Severe occlusion in dense traffic• Vanishing point (non-bird eye view)• Shadows and illumination changes

• Developing a deep learning network that can simultaneously detect and track a video object

• Detect bounding tubes that cover moving objects in short video segments• Extension of faster region-CNN, which detects bounding boxes in

individual frames

• Thanks: Chenge Li, Yilin Song

Page 4: Machine Learning for Visual Analytics and Compression

State of Art: Region-CNN• Extract features from an

image (e.g. VGG)• Generate object

proposals (bounding boxes of vary size)• Each proposal specified

by box position parameters and objectness score

• Refine proposals and classify each detected box

S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towardsreal-time object detection with region proposal networks. InAdvances in neural information processing systems, pages91–99, 2015

Page 5: Machine Learning for Visual Analytics and Compression

Extending Region-CNN For Moving Object Detection in Video

• Consider a video segment consisting of multiple frames

• Use 3D and 2D convolution for feature extraction (C3D and VGG)

• Generate object proposals (bounding tubes of various sizes and orientations)

• Refine proposals and classify each detected tube (car, van, bus, pedestrian, …)

Page 6: Machine Learning for Visual Analytics and Compression
Page 7: Machine Learning for Visual Analytics and Compression

Chenge Li, Gregory Dobler, Yilin Song, Xin Feng, Yao Wang, "TrackNet: Simultaneous Detection and Tracking of Multiple Objects”, http://vision.poly.edu/index.html/uploads/Li_trackNet.pdf

Page 8: Machine Learning for Visual Analytics and Compression

Deep Learning for Image Compression

8

• State-of-art learnt image coding: train different networks for different target rates.

• Our goal: train a single multi-layer network that can provide optimal performance at multiple bit rates or generate layered bit streams

• Each layer learns additional features to reconstruct the residue, optimized for best rate distortion performance

• Important for networked video applications with dynamically varying bandwidths

• Thanks: Chuanmin Jia and Zhaoyi Liu

Page 9: Machine Learning for Visual Analytics and Compression

Auto-Encoder Based Image Compression

9

Loss function considers both distortion and rate:𝐿𝐿 𝜃𝜃𝑓𝑓,𝜃𝜃𝑔𝑔,𝜃𝜃𝑟𝑟 = 𝐷𝐷 𝑥𝑥,𝑔𝑔 𝑓𝑓 𝑥𝑥; 𝜃𝜃𝑓𝑓 + 𝑞𝑞 ; 𝜃𝜃𝑔𝑔 + λ𝑅𝑅(𝑓𝑓 𝑥𝑥;𝜃𝜃𝑓𝑓 + 𝑞𝑞; 𝜃𝜃𝑟𝑟)• Train multiple networks using different λ, to reach different rate-distortion points• Take the set of λ that achieve the lower convex hull of resulting rate-distortion points• Complete different networks for different target rates (:

𝜃𝜃𝑓𝑓 𝜃𝜃𝑔𝑔

𝜃𝜃𝑟𝑟

Page 10: Machine Learning for Visual Analytics and Compression

Scalable Auto Encoder

10

Page 11: Machine Learning for Visual Analytics and Compression

Preliminary Results

11[GDN] Ballé, Johannes, Valero Laparra, and Eero P. Simoncelli. "End-to-end optimized image compression." arXiv preprint arXiv:1611.01704 (ICLR 2017).

Page 12: Machine Learning for Visual Analytics and Compression

Learning for compression and analytics simultaneously

July 2018 12

• No need to decompress and then do visual analytics at receiver• Camera site can do compression and visual analytics

simultaneously• We choose saliency detection as a generic analytics task in the

hope that the learnt features are good for other analytics tasks• The trained compression branch has the potential to yield better

visual quality because the encoded features should contain visually important features.

• Jointly optimize for both rate-distortion and saliency prediction• Thanks: Alp Aygar, Chuanmin Jia

Page 13: Machine Learning for Visual Analytics and Compression

Network Architecture

13

Page 14: Machine Learning for Visual Analytics and Compression

Preliminary Results

15

Left: predicted saliency map, middle: ground truth, right: input color image. Training and testing Data from http://salicon.net

Page 15: Machine Learning for Visual Analytics and Compression

16

Acknowledgement

NYU WIRELESS Industrial Affiliates