on-the-fly specific person retrieval

On-the-fly Specific Person Retrieval

University of Oxford 24th May 2012

Omkar M. Parkhi, Andrea Vedaldi and Andrew Zisserman

Overview

People“Barack Obama”“George Bush”“Courtney Cox”

Ranked ShotsTextual QueriesSearch for:

On-the-flyi.e. with no previous knowledgeor model for these queries

Large collection of

un-annotated videos

Scrubs Data Set

• 12 Episodes from Seasons 1-5 and 8

• 5 hours of video data

• About 400k frames, partitioned into 5k shots

• About 300k near frontal face detections

• 768 x 576 MPEG2 format

Demo

• Search for “Courteney Cox” in Scrubs dataset.

• Steps:

1. Download example images from Google

2. Train a ranking function

3. Apply ranking function to video collection

On the fly person retrieval systemText Query“Courteney

Cox” Negative Training Images

Video Collection

Face Tracks

Facial Features &Descriptors


Results

ON-LINE PROCESSING

OFF-LINE PROCESSING

Google Image Search

“Courteney Cox”

Facial Features &

Descriptors

Fast Linear Classifier

Ranking

Detection and Tracking

• Viola-Jones face detection on each frame• Tracking measures “connectedness” of a pair of faces by point

tracks intersecting both• Doesn’t require contiguous detections• No drift• Faces clustered into tracks

[Everingham et al. 2006, Apostoloff & Zisserman, 2007]

Scrubs Data Set

• 12 Episodes from Seasons 1-5 and 8

• 5 hours of video data

• About 400k frames, partitioned into 5k shots

• 300k face detections

• 6k face tracks

• 768 x 576 MPEG2 format

Detecting facial feature points

• Pictorial structure model

• Joint model of feature appearance and position

[Felzenszwalb and Huttenlocher’2004, Everingham et al. 2006]

Face Appearance Representation

Affine transformation of face to canonical frame Independent photometric normalization of parts Represent gradients over circle centred on facial feature points Feature descriptor is a 3849 dimensional vector

[Everingham et al. 2006]

Negative Training Images

• Combination of faces from• Random downloaded images• Labeled Faces in the Wild dataset• Caltech Faces dataset

• About 16k face detections.

Caltech 10, 000 Web Faces:http://www.vision.caltech.edu/Image_Datasets/Caltech_10K_WebFaces/Labeled Faces in the Wild:http://vis-www.cs.umass.edu/lfw/

On-the-fly Person RetrievalText Query“Courteney

Cox” Negative Training Images

Video Collection

Face Tracks



Results

ON-LINE PROCESSING

OFF-LINE PROCESSING

Google Image Search

“Courteney Cox”

Facial Features &

Descriptors

Fast Linear Classifier

Ranking

TRECVid 2011 (IACC.1.B)

• About 200 hours of video data.

• 8k videos.

• MPEG4, 320x240 pixels

• 130k shots,

• About 3 million face detections

• 25,535 face tracks.

Facial attributes – FaceTracer project

Examples: gender: male, female age: baby, child, youth, middle age, senior race: white, black, asian smiling, mustache, eye-wear, hair colour

N. Kumar, P. N. Belhumeur and S. K. Nayar, FaceTracer: A Search Engine for Large Collections of Images with Faces,European Conference on Computer Vision (ECCV), 2010http://www.cs.columbia.edu/CAVE/projects/face_search/

Method

• person independent training set with attribute• facial feature representation• discriminative training of classifier for attribute

Quantitative Performance - Scrubs Dataset

• Performance evaluation for 3 guest actors (Brendan Fraser, Courteney Cox and Michael J Fox)

• 12 dataset videos split into training and test sets (3 Training, 9 Testing)• Annotations:

• Manual labeling of training and test set for each actor• Manual labeling of positive training images from Google• Negative training images from Caltech Faces dataset.


• Retrieval Average Precision (AP)

Training Examples Source Average Precision

Positive Negative Brendan Fraser

Courteney Cox

Michael J Fox

Scrubs Scrubs 0.56 0.88 0.49

Google Scrubs 0.25 0.62 0.52Google Caltech 0.41 0.56 0.57

Brendan Fraser Courteney Cox Michael J Fox0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

+ve=Scrubs -ve=Scrubs+ve=Google -ve=Scrubs+ve=Google -ve=Caltech


• Using more training data per track

Training Examples Source # samples

per trackAverage Precision

Positive NegativeBrendan Fraser

Courteney Cox

Michael J Fox

Scrubs Scrubs Single 0.56 0.88 0.49Scrubs Scrubs Multiple 0.6 0.88 0.53

Brendan Fraser Courteney Cox Michael J Fox0

0.10.20.30.40.50.60.70.80.9

1

+ve=Scrubs -ve=Scrubs Single+ve=Scrubs -ve=Scrubs Multiple

Future Work

• Exploring sources for positive examples

• Better feature representations

• Combination of attributes and identities

Any Questions?

on-the-fly specific person retrieval

Documents

normalized face

scrubs dataset

frontal face detections768

facial feature appearance

imageslabeled faces

pair of faces

web faces

tracks everingham