visual summary of egocentric photostreams by representative keyframes

23
Visual Summary of Egocentric Photostreams by Representative Keyframes Marc Bolaños, Ricard Mestre, Estefanía Talavera, Xavier Giró-i-Nieto and Petia Radeva 1

Upload: marc-bolanos-sola

Post on 12-Apr-2017

220 views

Category:

Software


0 download

TRANSCRIPT

Visual Summary of Egocentric Photostreams by Representative

Keyframes

Marc Bolaños, Ricard Mestre, Estefanía Talavera, Xavier Giró-i-Nieto and Petia Radeva

1

MotivationLifelogging wearable cameras can produce 1,500 images/day, more than 500,000 images/year.

2

Producing automatic summarization methods could help in many applications. Specially, we are working on:● Memory aid for Mild Cognitive Impairment patients.● Automatic nutrition diary.

Extract the visual summary of a whole day capturing the most representative information for describing the day.

Goal

3

Storytelling

Extract the visual summary of a whole day capturing the most representative information for describing the day.

Goal

4

StorytellingHave breakfast with the family

Extract the visual summary of a whole day capturing the most representative information for describing the day.

Goal

5

StorytellingHave breakfast with the family

Go for a walk

Extract the visual summary of a whole day capturing the most representative information for describing the day.

Goal

6

StorytellingHave breakfast with the family

Go for a walk

Go shopping

Extract the visual summary of a whole day capturing the most representative information for describing the day.

Goal

7

StorytellingHave breakfast with the family

Go for a walk

Go shopping

Take the bus

Extract the visual summary of a whole day capturing the most representative information for describing the day.

Goal

8

StorytellingHave breakfast with the family

Go for a walk

Go shopping

Take the bus

Have a coffee with a friend

State of the Art

Lu, Zheng, and Kristen Grauman. "Story-driven summarization for egocentric video." Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013.

9

High temporal resolution egocentric data.

1. Event segmentation.2. Detection of salient objects and people.3. Subset selection of video shots based on:

a. Storyb. Importancec. Diversity

State of the Art

Doherty, Aiden R., et al. "Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs." Proceedings of the 2008 international conference on Content-based image and video retrieval. ACM, 2008.

10

Low temporal resolution egocentric data.

1. Event segmentation.2. Selection of the keyframes comparing

several methods:a. Middle image of each segment.b. Image close to the average value in

the segment (centroid-like).c. Image with highest “quality”.

Methodology ( I )

11

Methodology ( II )

12

Frames Characterization

Convolutional Neural Networks (CNN) trained on ImageNet.

13

Jia, Yangqing, et al. "Caffe: Convolutional architecture for fast feature embedding." Proceedings of the ACM International Conference on Multimedia. ACM, 2014.

Events Segmentation ( I )Applying an agglomerative clustering and adapting the cut-off parameter, we can obtain a good segmentation of all the events in our day.

14

cut-off parameter

Events Segmentation ( II )Division - Fusion post-processing to obtain a more robust segmentation.

15

a) After Agglomerative Clusteringb) After Division c) After Fusion

Division: splits and labels differently similar events spaced in time.Fusion: merges very short sub-events not considered relevant enough.

Keyframe SelectionVisual similarity-based keyframe selection criteria.

16Distances Matrix

Random WalkMinimum Distance

Similarity-based probabilities

Summary Results

17

Evaluation ( I )

● 5 days● 3 users● 4005 images● Segmentation ground truth

18

Talavera, E., Dimiccoli, M., Bolaños, M., Aghaei, M., & Radeva, P. R-clustering for egocentric video segmentation. IbPRIA 2015, Santiago de Compostela, Spain. Proceedings (Vol. 9117, p. 327). Springer.

Datasets Clustering● Jaccard Index

Evaluation ( II )

19

Keyframe Selection

Lu, Zheng, and Kristen Grauman. "Story-driven summarization for egocentric video." Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013.

Figure: brandchannel.com

● Blind taste test to 30 users for quality evaluation

Representative images of the event #1

Do you think the image on the left can represent the event?

Do you think the image on the center can represent the event?

YesNo

YesNo

YesNo

Do you think the image on the right can represent the event?

What is the most representative image of the event?

LeftCenterRight

Individual Keyframes Quality Evaluation

Evaluation ( III )

20

Keyframe Selection General Summary Quality Evaluation

YesNo

Do you think that this set can summarize the whole day?

Finally, which one do you think is the best visual summary of the day?

Summary 1Summary 2Summary 3Summary 4

Summary 1

Some of the summaries you will see might be very similar (differentiable only in some images). In that case you can choose any of them.

Visual summaries of the day

Evaluation - Individual Keyframes

21

What is the most representative image of the event?

Do you think that the image on the left/center/right can represent the event?

Evaluation - General Summary

22

Can this set of images represent the complete day? Which summary is the best, in your opinion?

Conclusions● New keyframe selection methodology taking into account visual and temporal

information.● Keyframe selection using CNN-based global information and graph-analysis.

● 88-86% user acceptance of our summaries.● 58% users chose our summaries as the best option.

● Use semantic information (e.g. objects, people, actions).● Clinical application on Mild Cognitive Impairment patients.

23

Future Work