on-the-fly specific person retrieval
DESCRIPTION
On-the-fly Specific Person Retrieval. Omkar M. Parkhi, Andrea Vedaldi and Andrew Zisserman. 24 th May 2012. University of Oxford. Overview. Textual Queries. Ranked Shots. Search for:. Large collection of u n-annotated videos. People “Barack Obama” “George Bush” “Courtney Cox”. - PowerPoint PPT PresentationTRANSCRIPT
On-the-fly Specific Person Retrieval
University of Oxford 24th May 2012
Omkar M. Parkhi, Andrea Vedaldi and Andrew Zisserman
Overview
People“Barack Obama”“George Bush”“Courtney Cox”
Ranked ShotsTextual QueriesSearch for:
On-the-flyi.e. with no previous knowledgeor model for these queries
Large collection of
un-annotated videos
Scrubs Data Set
• 12 Episodes from Seasons 1-5 and 8
• 5 hours of video data
• About 400k frames, partitioned into 5k shots
• About 300k near frontal face detections
• 768 x 576 MPEG2 format
Demo
• Search for “Courteney Cox” in Scrubs dataset.
• Steps:
1. Download example images from Google
2. Train a ranking function
3. Apply ranking function to video collection
DEMO
On the fly person retrieval systemText Query“Courteney
Cox” Negative Training Images
Video Collection
Face Tracks
Facial Features &Descriptors
Facial Features &Descriptors
Results
ON-LINE PROCESSING
OFF-LINE PROCESSING
Google Image Search
“Courteney Cox”
Facial Features &
Descriptors
Fast Linear Classifier
Ranking
Detection and Tracking
• Viola-Jones face detection on each frame• Tracking measures “connectedness” of a pair of faces by point
tracks intersecting both• Doesn’t require contiguous detections• No drift• Faces clustered into tracks
[Everingham et al. 2006, Apostoloff & Zisserman, 2007]
Scrubs Data Set
• 12 Episodes from Seasons 1-5 and 8
• 5 hours of video data
• About 400k frames, partitioned into 5k shots
• 300k face detections
• 6k face tracks
• 768 x 576 MPEG2 format
Detecting facial feature points
• Pictorial structure model
• Joint model of feature appearance and position
[Felzenszwalb and Huttenlocher’2004, Everingham et al. 2006]
Face Appearance Representation
Affine transformation of face to canonical frame Independent photometric normalization of parts Represent gradients over circle centred on facial feature points Feature descriptor is a 3849 dimensional vector
[Everingham et al. 2006]
Negative Training Images
• Combination of faces from• Random downloaded images• Labeled Faces in the Wild dataset• Caltech Faces dataset
• About 16k face detections.
Caltech 10, 000 Web Faces:http://www.vision.caltech.edu/Image_Datasets/Caltech_10K_WebFaces/Labeled Faces in the Wild:http://vis-www.cs.umass.edu/lfw/
On-the-fly Person RetrievalText Query“Courteney
Cox” Negative Training Images
Video Collection
Face Tracks
Facial Features &Descriptors
Facial Features &Descriptors
Results
ON-LINE PROCESSING
OFF-LINE PROCESSING
Google Image Search
“Courteney Cox”
Facial Features &
Descriptors
Fast Linear Classifier
Ranking
DEMO
TRECVid 2011 (IACC.1.B)
• About 200 hours of video data.
• 8k videos.
• MPEG4, 320x240 pixels
• 130k shots,
• About 3 million face detections
• 25,535 face tracks.
DEMO
Facial attributes – FaceTracer project
Examples: gender: male, female age: baby, child, youth, middle age, senior race: white, black, asian smiling, mustache, eye-wear, hair colour
N. Kumar, P. N. Belhumeur and S. K. Nayar, FaceTracer: A Search Engine for Large Collections of Images with Faces,European Conference on Computer Vision (ECCV), 2010http://www.cs.columbia.edu/CAVE/projects/face_search/
Method
• person independent training set with attribute• facial feature representation• discriminative training of classifier for attribute
DEMO
Quantitative Performance - Scrubs Dataset
• Performance evaluation for 3 guest actors (Brendan Fraser, Courteney Cox and Michael J Fox)
• 12 dataset videos split into training and test sets (3 Training, 9 Testing)• Annotations:
• Manual labeling of training and test set for each actor• Manual labeling of positive training images from Google• Negative training images from Caltech Faces dataset.
Quantitative Performance - Scrubs Dataset
• Retrieval Average Precision (AP)
Training Examples Source Average Precision
Positive Negative Brendan Fraser
Courteney Cox
Michael J Fox
Scrubs Scrubs 0.56 0.88 0.49
Google Scrubs 0.25 0.62 0.52Google Caltech 0.41 0.56 0.57
Brendan Fraser Courteney Cox Michael J Fox0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
+ve=Scrubs -ve=Scrubs+ve=Google -ve=Scrubs+ve=Google -ve=Caltech
Quantitative Performance - Scrubs Dataset
• Using more training data per track
Training Examples Source # samples
per trackAverage Precision
Positive NegativeBrendan Fraser
Courteney Cox
Michael J Fox
Scrubs Scrubs Single 0.56 0.88 0.49Scrubs Scrubs Multiple 0.6 0.88 0.53
Brendan Fraser Courteney Cox Michael J Fox0
0.10.20.30.40.50.60.70.80.9
1
+ve=Scrubs -ve=Scrubs Single+ve=Scrubs -ve=Scrubs Multiple
Future Work
• Exploring sources for positive examples
• Better feature representations
• Combination of attributes and identities
Any Questions?