trecvid evaluations

TRECVID Evaluations

Mei-Chen Yeh03/27/2012

Introduction

• Text REtrieval Conference (TREC)– Organized by National Institute of Standards (NIST)– Support from government agencies– Annual evaluation (NOT a competition)– Different “tracks” over the years, e.g.• web retrieval, email spam filtering, question answering,

routing, spoken documents, OCR, video (standalone conference from 2001)

• TREC Video Retrieval Evaluation (TRECVID)

Introduction

• Objectives of TRECVID– Promote progress in content-based analysis and

retrieval from digital videos– Provide open, metrics-based evaluation– Model real world situations

Introduction

• Evaluation is driven by participants• The collection is fixed, available in the spring– 50% data used for development, 50% for testing

• Test queries available in July, 1 month to submission

• More details: – http://trecvid.nist.gov/

http://trecvid.nist.gov/

http://trecvid.nist.gov/

TRECVID Video Collections• Test data

– Broadcast news– TV programs– Surveillance videos– Video rushes provided by BBC– Documentary and educational materials supplied by the Netherlands Institute

for Sound and Vision (2007-2009)– The Gatwick airport surveillance videos provided by the UK Home Office (2009)– Web videos (2010)

• Languages– English– Arabic– Chinese

Collection History

Collection History

• 2011– 19200 online videos (150 GB, 600 hours)– 50 hours of airport surveillance videos

• 2012– 27200 online videos (200 GB, 800 hours)– 21,000 equal-length, short clips of BBC rush videos– airport surveillance videos (not yet announced)– ~4,000-hour collection of Internet multimedia

Tasks

• Semantic indexing (SIN) • Known-item search (KIS) • Content-based copy detection (CCD) – by 2011• Interactive surveillance event detection (SED) • Instance search (INS) • Multimedia event detection (MED)• Multimedia event recounting (MER) – since

2012

Semantic indexing

• System task:– Given the test collection, master shot reference,

and concept definitions, return for each concept a list of at most 2000 shot IDs from the test collection ranked according to their likeliness of containing the concept.

• 500 concepts (since 2011)• “Concept pair” (2012)

Examples• Boy (One or more male children)• Teenager• Scientists (Images of people who appear to be scientists)• Dark skinned people• Handshaking• Running• Throwing• Eaters (Putting food or drink in his/her mouth)• Sadness • Anger• Windy (Scenes showing windy weather) Full list

http://www-nlpir.nist.gov/projects/tv2012/tv11.sin.500.concepts_ann_v2.xls

Example (concept pair)• Beach + Mountain • Old_People + Flags • Animal + Snow • Bird + Waterscape_waterfront • Dog + Indoor • Driver + Female_Human_Face • Person + Underwater • Table + Telephone • Two_People + Vegetation • Car + Bicycle

Known-item search

• Models the situation in which someone knows of a video, has seen it before, believes it is contained in a collection, but doesn't know where to look.

• Inputs– A text-only description of the video desired– A test collection of videos

• Outputs– Top ranked videos (automatic or interactive mode)

Examples

• Find the video with the guy talking about how it just keeps raining.

• Find the video about some guys in their apartment talking about some cleaning schedule.

• Find the video where a guy talks about the FBI and Britney Spears.

• Find the video with the guy in a yellow T-shirt with the big letter M on it.

• …http://www-nlpir.nist.gov/projects/tv2010/ki.examples.html

http://www.archive.org/details/MMMMMoon-RainMyFourSeasons275

http://www.archive.org/details/MMMMMoon-RainMyFourSeasons275

http://www.archive.org/details/moveon_bushin30seconds_2708

http://www.archive.org/details/moveon_bushin30seconds_2708

http://www-nlpir.nist.gov/projects/tv2010/ki.examples.html

Content-based copy detection

Surveillance event detection• Detects human behaviors in vast amounts surveillance

video, real time!• For public safety and security• Event examples– Person runs– Cell to ear– Object put– People meet– Embrace– Pointing– …

Instance search

• Finds video segments of a certain specific person, object, or place, given a visual example.

Instance search

• Input– a collection of test clips– a collection of queries that delimit a person,

object, or place entity in some example video• Output– for each query up to the 1000 clips most likely to

contain a recognizable instance of the entity

Query examples

Multimedia event detection • System task

– Given a collection of test videos and a list of test events, indicate whether each of the test events is present anywhere in each of the test videos and give the strength of evidence for each such judgment.

• In 2010– Making a cake: one or more people make a cake – Batting a run in: within a single play during a baseball-type game, a

batter hits a ball and one or more runners (possibly including the batter) scores a run

– Assembling a shelter: one or more people construct a temporary or semi-permanent shelter for humans that could provide protection from the elements.

• 15 new events are released for 2011, not yet announced for 2012.

Multimedia event recounting• New in 2012• Task

– Once a multimedia event detection system has found an event in a video clip, it is useful for a human user to be able to examine the evidence on which the system's decision was based. An important goal is for that evidence to be semantically meaningful to a human.

• Input– a clip and a event kit (name, definition, explication--textual exposition

of the terms and concepts, evidential descriptions, and illustrative video exemplars)

• Output– a clear, concise text-only (alphanumeric) recounting or summary of

the key evidence that the event does in fact occur in the video

Schedule

• Feb. call for participation• Apr. complete the guidelines• Jun.-Jul. release query data• Sep. submission due• Oct. return the results• Nov. paper submission due• Dec. workshop

Call for partners

• Standardized evaluations and comparisons• Test on large collections • Failures are not embarrassing, and can be

presented at the TRECVID workshop!• Anyone can participate!– A “priceless” resource for researches

trecvid evaluations

Documents

video annotation

video standalone conference

web videos

videos automatic

participantsthe collection

concept definitions

contentbased analysis

interactive mode