trecvid evaluations
DESCRIPTION
TRECVID Evaluations. Mei-Chen Yeh 03/27/2012. Introduction. Text REtrieval Conference (TREC) Organized by National Institute of Standards (NIST) Support from government agencies Annual evaluation (NOT a competition) Different “tracks” over the years, e.g. - PowerPoint PPT PresentationTRANSCRIPT
TRECVID Evaluations
Mei-Chen Yeh03/27/2012
Introduction
• Text REtrieval Conference (TREC)– Organized by National Institute of Standards (NIST)– Support from government agencies– Annual evaluation (NOT a competition)– Different “tracks” over the years, e.g.• web retrieval, email spam filtering, question answering,
routing, spoken documents, OCR, video (standalone conference from 2001)
• TREC Video Retrieval Evaluation (TRECVID)
Introduction
• Objectives of TRECVID– Promote progress in content-based analysis and
retrieval from digital videos– Provide open, metrics-based evaluation– Model real world situations
Introduction
• Evaluation is driven by participants• The collection is fixed, available in the spring– 50% data used for development, 50% for testing
• Test queries available in July, 1 month to submission
• More details: – http://trecvid.nist.gov/
TRECVID Video Collections• Test data
– Broadcast news– TV programs– Surveillance videos– Video rushes provided by BBC– Documentary and educational materials supplied by the Netherlands Institute
for Sound and Vision (2007-2009)– The Gatwick airport surveillance videos provided by the UK Home Office (2009)– Web videos (2010)
• Languages– English– Arabic– Chinese
Collection History
Collection History
• 2011– 19200 online videos (150 GB, 600 hours)– 50 hours of airport surveillance videos
• 2012– 27200 online videos (200 GB, 800 hours)– 21,000 equal-length, short clips of BBC rush videos– airport surveillance videos (not yet announced)– ~4,000-hour collection of Internet multimedia
Tasks
• Semantic indexing (SIN) • Known-item search (KIS) • Content-based copy detection (CCD) – by 2011• Interactive surveillance event detection (SED) • Instance search (INS) • Multimedia event detection (MED)• Multimedia event recounting (MER) – since
2012
Semantic indexing
• System task:– Given the test collection, master shot reference,
and concept definitions, return for each concept a list of at most 2000 shot IDs from the test collection ranked according to their likeliness of containing the concept.
• 500 concepts (since 2011)• “Concept pair” (2012)
Examples• Boy (One or more male children)• Teenager• Scientists (Images of people who appear to be scientists)• Dark skinned people• Handshaking• Running• Throwing• Eaters (Putting food or drink in his/her mouth)• Sadness • Anger• Windy (Scenes showing windy weather) Full list
Example (concept pair)• Beach + Mountain • Old_People + Flags • Animal + Snow • Bird + Waterscape_waterfront • Dog + Indoor • Driver + Female_Human_Face • Person + Underwater • Table + Telephone • Two_People + Vegetation • Car + Bicycle
Known-item search
• Models the situation in which someone knows of a video, has seen it before, believes it is contained in a collection, but doesn't know where to look.
• Inputs– A text-only description of the video desired– A test collection of videos
• Outputs– Top ranked videos (automatic or interactive mode)
Examples
• Find the video with the guy talking about how it just keeps raining.
• Find the video about some guys in their apartment talking about some cleaning schedule.
• Find the video where a guy talks about the FBI and Britney Spears.
• Find the video with the guy in a yellow T-shirt with the big letter M on it.
• …http://www-nlpir.nist.gov/projects/tv2010/ki.examples.html
Content-based copy detection
Surveillance event detection• Detects human behaviors in vast amounts surveillance
video, real time!• For public safety and security• Event examples– Person runs– Cell to ear– Object put– People meet– Embrace– Pointing– …
Instance search
• Finds video segments of a certain specific person, object, or place, given a visual example.
Instance search
• Input– a collection of test clips– a collection of queries that delimit a person,
object, or place entity in some example video• Output– for each query up to the 1000 clips most likely to
contain a recognizable instance of the entity
Query examples
Multimedia event detection • System task
– Given a collection of test videos and a list of test events, indicate whether each of the test events is present anywhere in each of the test videos and give the strength of evidence for each such judgment.
• In 2010– Making a cake: one or more people make a cake – Batting a run in: within a single play during a baseball-type game, a
batter hits a ball and one or more runners (possibly including the batter) scores a run
– Assembling a shelter: one or more people construct a temporary or semi-permanent shelter for humans that could provide protection from the elements.
• 15 new events are released for 2011, not yet announced for 2012.
Multimedia event recounting• New in 2012• Task
– Once a multimedia event detection system has found an event in a video clip, it is useful for a human user to be able to examine the evidence on which the system's decision was based. An important goal is for that evidence to be semantically meaningful to a human.
• Input– a clip and a event kit (name, definition, explication--textual exposition
of the terms and concepts, evidential descriptions, and illustrative video exemplars)
• Output– a clear, concise text-only (alphanumeric) recounting or summary of
the key evidence that the event does in fact occur in the video
Schedule
• Feb. call for participation• Apr. complete the guidelines• Jun.-Jul. release query data• Sep. submission due• Oct. return the results• Nov. paper submission due• Dec. workshop
Call for partners
• Standardized evaluations and comparisons• Test on large collections • Failures are not embarrassing, and can be
presented at the TRECVID workshop!• Anyone can participate!– A “priceless” resource for researches