visual summary of egocentric photostreams by representative keyframes
TRANSCRIPT
Visual Summary of Egocentric Photostreams by Representative
Keyframes
Marc Bolaños, Ricard Mestre, Estefanía Talavera, Xavier Giró-i-Nieto and Petia Radeva
1
MotivationLifelogging wearable cameras can produce 1,500 images/day, more than 500,000 images/year.
2
Producing automatic summarization methods could help in many applications. Specially, we are working on:● Memory aid for Mild Cognitive Impairment patients.● Automatic nutrition diary.
Extract the visual summary of a whole day capturing the most representative information for describing the day.
Goal
3
Storytelling
Extract the visual summary of a whole day capturing the most representative information for describing the day.
Goal
4
StorytellingHave breakfast with the family
Extract the visual summary of a whole day capturing the most representative information for describing the day.
Goal
5
StorytellingHave breakfast with the family
Go for a walk
Extract the visual summary of a whole day capturing the most representative information for describing the day.
Goal
6
StorytellingHave breakfast with the family
Go for a walk
Go shopping
Extract the visual summary of a whole day capturing the most representative information for describing the day.
Goal
7
StorytellingHave breakfast with the family
Go for a walk
Go shopping
Take the bus
Extract the visual summary of a whole day capturing the most representative information for describing the day.
Goal
8
StorytellingHave breakfast with the family
Go for a walk
Go shopping
Take the bus
Have a coffee with a friend
State of the Art
Lu, Zheng, and Kristen Grauman. "Story-driven summarization for egocentric video." Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013.
9
High temporal resolution egocentric data.
1. Event segmentation.2. Detection of salient objects and people.3. Subset selection of video shots based on:
a. Storyb. Importancec. Diversity
State of the Art
Doherty, Aiden R., et al. "Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs." Proceedings of the 2008 international conference on Content-based image and video retrieval. ACM, 2008.
10
Low temporal resolution egocentric data.
1. Event segmentation.2. Selection of the keyframes comparing
several methods:a. Middle image of each segment.b. Image close to the average value in
the segment (centroid-like).c. Image with highest “quality”.
Frames Characterization
Convolutional Neural Networks (CNN) trained on ImageNet.
13
Jia, Yangqing, et al. "Caffe: Convolutional architecture for fast feature embedding." Proceedings of the ACM International Conference on Multimedia. ACM, 2014.
Events Segmentation ( I )Applying an agglomerative clustering and adapting the cut-off parameter, we can obtain a good segmentation of all the events in our day.
14
cut-off parameter
Events Segmentation ( II )Division - Fusion post-processing to obtain a more robust segmentation.
15
a) After Agglomerative Clusteringb) After Division c) After Fusion
Division: splits and labels differently similar events spaced in time.Fusion: merges very short sub-events not considered relevant enough.
Keyframe SelectionVisual similarity-based keyframe selection criteria.
16Distances Matrix
Random WalkMinimum Distance
Similarity-based probabilities
Evaluation ( I )
● 5 days● 3 users● 4005 images● Segmentation ground truth
18
Talavera, E., Dimiccoli, M., Bolaños, M., Aghaei, M., & Radeva, P. R-clustering for egocentric video segmentation. IbPRIA 2015, Santiago de Compostela, Spain. Proceedings (Vol. 9117, p. 327). Springer.
Datasets Clustering● Jaccard Index
Evaluation ( II )
19
Keyframe Selection
Lu, Zheng, and Kristen Grauman. "Story-driven summarization for egocentric video." Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013.
Figure: brandchannel.com
● Blind taste test to 30 users for quality evaluation
Representative images of the event #1
Do you think the image on the left can represent the event?
Do you think the image on the center can represent the event?
YesNo
YesNo
YesNo
Do you think the image on the right can represent the event?
What is the most representative image of the event?
LeftCenterRight
Individual Keyframes Quality Evaluation
Evaluation ( III )
20
Keyframe Selection General Summary Quality Evaluation
YesNo
Do you think that this set can summarize the whole day?
Finally, which one do you think is the best visual summary of the day?
Summary 1Summary 2Summary 3Summary 4
Summary 1
Some of the summaries you will see might be very similar (differentiable only in some images). In that case you can choose any of them.
Visual summaries of the day
Evaluation - Individual Keyframes
21
What is the most representative image of the event?
Do you think that the image on the left/center/right can represent the event?
Evaluation - General Summary
22
Can this set of images represent the complete day? Which summary is the best, in your opinion?
Conclusions● New keyframe selection methodology taking into account visual and temporal
information.● Keyframe selection using CNN-based global information and graph-analysis.
● 88-86% user acceptance of our summaries.● 58% users chose our summaries as the best option.
● Use semantic information (e.g. objects, people, actions).● Clinical application on Mild Cognitive Impairment patients.
23
Future Work