searching video collections: representation, indexing ... · played material, statistical analysis...

63
1 Dulce Ponceleon Searching Video Collections: Representation, Indexing, Browsing and Evaluation Part II Universidad de Chile December 2002

Upload: others

Post on 23-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

1Dulce Ponceleon

Searching Video Collections: Representation, Indexing, Browsing and Evaluation

Part II

Universidad de Chile December 2002

Page 2: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile2

Audio Features in Multimedia

Features depend on audio category SpeechMusicSounds (i.e. explosions, street noise, etc.)

FeaturesEnergy, LoudnessPitchCepstral CoefficientsBeatHarmonics

Page 3: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile3

Audio IndexingFeatures: Pitch, Loudness, Energy, Mel Cepstral Coefficients, Zero-Crossing RateSpeech-Music DiscriminationSpeaker IdentificationMusic Retrieval

Query-by-HummingBeat Analysis

Foreground/backgroundNeed to find tiger without regard to backgroundAudio sounds are often isolated

Towards the ‘google system” for audio retrieval..

Page 4: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile4

Examples

Pitch

Vowels

Page 5: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile5

Speech SoundsSpeech sounds are created by vibratory activity in the human vocal tract. Speech is normally transmitted to a listener's ears or to a microphone through the air, where speech and other sounds take on the form of waves.It is not possible to read the phonemes in a waveform, but if we analyze the waveform into its frequency components, we obtain a spectrogram which can be deciphered.We apply a mathematical technique called Fourier analysis to the speech waveform in order to discover what frequencies are present at any given moment in the speech signal. The result of Fourier analysis is a spectrum.these vowels can usually be easily distinguished by the frequency values of the first two or three formants, which are called F1, F2, and F3.

The audible frequency range in human beings extends from 20 Hz to 20,000 Hz (20 kHz).

Page 6: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile6

Spectograms

Page 7: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile7

ApplicationsRecognition of Speech

Recognition of Silence

Recognition of MusicMusic + Security = active area

Newscast RecognitionRecognition of Commercials

Page 8: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile8

Applications ...Recognition of Music

Recognition of the song to link to metadataBroadcast monitoring:

Monitor radio programs, scheduled transmission of advertisement sport, ensure composer‘s royalties for played material, statistical analysis of played material

Music SalesRecord signatures of music/sound for small hand held devices

Audio FingerprintingA compact representation of the signal features for matching. It captures the essence of the music item and thus can be use as a fingerprint of the music item

Page 9: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile9

Audio Classification

Audio-waveform

Cepstral Coefficients

Pitch

Energy/Loudness

Speech Recognition

Music/Sound

Segmentation

Musclefish.ccom

Page 10: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile10

0 0.5 1 1.5 2 2.5 3 3.5-0.5

0

0.5A Huge Tapestry Hung in Her Hallway

0 0.5 1 1.5 2 2.5 3 3.50

100

200Zero Crossing Rate

Note, the zero crossing rate goes way up in the quiet region (there is only noise)And way down when the energy is high (which happens during voiced soundsWhen the signal is repetitive.

Page 11: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile11

Spectrogram of “A Huge Tapestry hung in her Hallway”Filterbank output

DCT ReconstructionDCT Coefficients

Time (frame number)

Freq

uenc

y

Page 12: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile12

Music Retrieval

Music Database

Query by Humming

Query Name by Example

Query

MIDIWaveform

WaveformHumming

Hold microphone to

the radio

Page 13: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile13

Audio Indexing Example: Acoustic Identification

Predicted words: (stemmed)

anim (.108), hors (.105), left (.086), trot (.065), approach (.059), track (.047), walk (.040), depart(.037)

Predicted words: (stemmed)

bird (.11), ambienc (.107), jungl (.104), morn (.094), Africa (.093), anim (.054), bark (.029), dog (.020), cricket (.018)

[Slaney02]

Page 14: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile14

Audio Indexing Example: Acoustic Identification

Predicted words: anim (.108), hors (.105), left (.086), trot (.065), approach (.059), track (.047), walk (.040), depart(.037)

True label: animals: horses, two horses trot past on rough track left to right

Predicted words: bird (.11), ambienc (.107), jungl (.104), morn (.094), Africa (.093), anim (.054), bark (.029), dog (.020), cricket (.018)

True label: jungle, Africa, Africa: morning ambience, birds

[Slaney02]

Page 15: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile15

Semantic Audio Retrieval

Acoustic SpaceSemantic SpaceMixture of Probability Experts

Semantic Space Acoustic Space

Step

Whinny

Semantic Space Acoustic Space

HorseTrot

Acoustic Retrieval Semantic Retrieval

Page 16: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile16

Speech: A Brief HistoryRecognition started late 60’s – early 70’sMultimedia Indexing late 80’s Research has been ongoing

Carnegie Mellon,Columbia University, Georgia Institute of Technology, and University of Texas

Products have been available for only about five years

BBN, IBM, CMUFast-Talk, and ScanSoft

Acceptable performance and accuracy level for commercial use in the last 18 monthsProducts are generally integrated into larger systems.

Page 17: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile17

Speech IndexingLarge Vocabulary Automatic Speech Recognition (ASR)Combined with Text Information RetrievalCombined with Phonetic Retrieval for OOV wordsCombined with Query Expansion/Document Expansion based on N-best or external corpus

Page 18: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile18

Automatic Speech Recognition (ASR)

Closed vocabulary (100,000 words)Misses out-of-vocabulary words (OOV), like names of people, places, products, companies, acronyms, etc.Type of errors: word split, words join, word substitutionTypical error rate: 30%-50% WER

Page 19: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile19

SpeechBot Retrieval Interface

Copyrights Cambridge Research Labs

Page 20: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile20

Phonetic Speech Retrieval

Example: training video, pilot work with BoeingThe original speech:"...you can now arm the door and emergency ... "Speech recognition result: "...you can now are on the door and emergency ... "The query "arm the door" is missed by text search, but can be found by phonetic search.

Page 21: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile21

Approaches to Phonetic Speech RetrievalPhone recognition with phonetic string index [Schauble95]Combined word and phonetic IR using Phone Lattice Scanning [James95, Jones96]Inverted Index of Phonetic Sub strings [Witbrock97]Confusion Matrix based Phonetic Indexing [van Leeuwen, Srinivasan00]

Page 22: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile22

Word and Phone Lattices

Lattice: Directed acyclic graph capturing the multiple hypothesis of an ASR systemCan be generated for words or sub-words (phones)May be shallow or deepHypothetical decoding of the phrase “Please be quite sure”:

Page 23: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile23

Using n-best Data for Indexing

Expand document /query representation with n-best wordsIndex sounds-like phones using (n-best) common meta-phone.

G, DGtlN, NGNtlTH, FTtlB, BD, DD, GDGtlEI, IH, IX, IYEtlCH, JH, SHCtl

AA, AE, AH, AO, AW, AX, AXR, AY, EH, ER

AtlGroup of phonesMetaphone

G, DGtlN, NGNtlTH, FTtlB, BD, DD, GDGtlEI, IH, IX, IYEtlCH, JH, SHCtl

AA, AE, AH, AO, AW, AX, AXR, AY, EH, ER

AtlGroup of phonesMetaphone

Page 24: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile24

Phonetic Retrieval – ASystem Overview

Text to Phones

Generate Keys(use meta-phones)

Retrieve & Merge

Bayesianedit distance

Text query

phonetic query

Keyslist

List of candidates

Rankedresults

Audioinput

Speech to Phones

Generate Keys(use meta-phones)

PopulateIndex

Timedphonetic

transcriptKeyslistSpeech to

PhonesGenerate Keys

(use meta-phones)Populate

Index

Timedphonetic

transcriptKeyslist

Phonetic Index

Timed phonetic transcript

Retrieval

Indexing

Page 25: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile25

Similarity Matching using Phonetic Confusion Matrix

C(oi,qj) is the probability of recognizing phone oi when the actual phone in the audio is qj.

The Bayesian Edit Distance D(o,q) is the log-likelihood of the best editing sequence which converts the query string q to the actual string o.

P(ZH|ZH)P(ZH|Z)P(ZH|AE)P(ZH|AA)

P(Z|ZH)P(Z|Z)P(Z|AE)P(Z|AA)

……

P(AE|ZH)P(AE|Z)P(AE|AE)P(AE|AA)

P(AA|ZH)P(AA|Z)…

……

P(AA|AE)P(AA|AA)

C(oi,qj) =

P(ZH|ZH)P(ZH|Z)P(ZH|AE)P(ZH|AA)

P(Z|ZH)P(Z|Z)P(Z|AE)P(Z|AA)

……

P(AE|ZH)P(AE|Z)P(AE|AE)P(AE|AA)

P(AA|ZH)P(AA|Z)…

……

P(AA|AE)P(AA|AA)

C(oi,qj) =

Page 26: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile26

An ASR Example: ScanSoft

Page 27: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile27

Copyright Virage Video Logger

Virage Video Logging Interface

Page 28: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile28

Speech Indexing PerformanceNote: Speech corpuses, queries and metrics vary significantly: therefore it is very difficult to compare p-r numbers directlyTwo classes of evaluations: document/story retrievalwhere p-r numbers are reported as a % of full text retrieval

Cambridge reports 82-85% relative precision compared to perfect text retrieval with WER 47%CMU reports ~80.2% relative precision compared to perfect text retrieval with WER 50.7%

Word Spotting evaluation against manual full text transcriptions

VMR from Cambridge reports average precision of 0.315TNO reports 100/hr for a 3 phone word and 8/hr for a 6 phone word [van Leeuwen]

Page 29: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile29

In Vocabulary Words by Length

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Precision

Recall

short words

long words

Overall 74% precision at 50% recall.

DATA SET

100Hrs of broadcast news (1.04M words)

QUERIES:24,000 single-word queries

EVALUATION

Compare retrieved matches with ground-truth manual, time-aligned transcript (objective)

Use exact word match (conservative)

Page 30: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile30

Mixing Document Collections

Poorly translated documents less likely to be retrievedProblem faced by Cross Language Information Retrieval CommunityProblem: How to address ranking bias at a per document levelSome ideas: Detect collection source, estimate noise/error rate and compensate in ranking scheme

Page 31: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile31

Performance Claims

Fast-Talk says it canIndex a one-hour audio file in 5 minProcess 30 hours of content per second in response to a specific, 10-phoneme search query (2.53-GHz Pentium CPU)

Recognition faster than 1.5 x real-timeOn 100 hours – how long a query took to run (CIKM talk)

Page 32: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile32

Multiple Languages Support

Demand is slowly growingIBM’s Via-Voice is available in 9 languagesBBN’ Audio Indexer: generates searchable transcripts in Arabic, Chinese, English or Spanish in real time on a standard PCPorting a product is expensive and time-consuming

Collect and transcribe acoustic data Train and evaluate new acoustic models

Page 33: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile33

Conquered ChallengesReal time performanceSpeaker Independent Technology

Gender, age, dialect, style, etc.Acoustic Models tuned to different environments: telephony, TV or radioLanguage Models Speaker identificationSpeaker segmentation Reduced background noiseTools to customize your language modelTrainingFast algorithms for indexing and searching

Page 34: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile34

Open Challenges

Users have high expectationsPrecision and price might impact widespread adoption in some cases

court reporting, medical dictationexpensive systems, as high as $100,000

Conversational speech recognition (i.e. meetings)How to go beyond incremental improvementsWhat about the killer app?Multiple Languages

Page 35: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile35

Searching Video Collections: Representation, Indexing, Browsing and Evaluation

Introduction to Multimedia Information Retrieval Effective MMIR

Multimedia RepresentationMultimedia IndexingQuery FormulationMultimedia RetrievalBrowsingDistribution/StreamingEvaluation

Multimedia IR ApplicationsConclusions

Page 36: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile36

What is a Multimedia Query?

KeywordsNatural language queryWith Multimedia:

sample image basic art tools (sketch, shape, etc)Query by Humming

Page 37: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile37

MMIR Query ProcessingQuery formulation beyond words

Sample image, color images, texture, shapes, etc. (i.e., hard to input)Find a picture of a satellite

Ambiguous and incomplete queriesInexact media representationRelevance detected easily

Page 38: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile38

Text Retrieval/Search StrategiesQuery Compared with Document RepresentationBoolean SearchMatching Functions (SMART system/cosine correlation)Serial SearchCluster based retrieval

Page 39: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile39

Retrieval Models

Key elementsDocument and query representation, similarity measure, retrieval function

Retrieval ModelsBoolean: Text, Speech

Statistical vector space: Text, ASR Text, Images

Probabilistic: Images, Audio, Video

Distinction between model and implementation

Page 40: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile40

Multimedia Retrieval Implementation using Models

Object Layer

Feature Layer

Media/Data Layer

Concept LayerRelationships/

Events

Blocks of attributes

Colors, shapes, textures,text

Images, video, audio,

Page 41: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile41

Similarity Measures

Distance MeasuresMean Character DifferenceMinkowski Metric

Manhattan, Euclidean, Chebyshev

Correlation CoefficientsCorrelation CoefficientsCosine Measure

Association Coefficients

Page 42: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile42

Combining Indexes: Multimodal Retrieval

Weighted Sum with different Normalization SchemesAdaptive Weights: Each Modality is Query Dependent – use thresholds as measure of similarityDomain Knowledge Modeling: represent knowledge as concept tree, frames, semantic nets

Page 43: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile43

Semantic-based Retrieval Example

Keywords: rose flower plant leavesCopyright Berkeley Blobworld system

Page 44: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile44

Semantic-based Retrieval Example

Query on

“Rose”

Copyright Berkeley Blobworld System

Page 45: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile45

Semantic-based Retrieval Example

Query on

Copyright Berkeley Blobworld system

Page 46: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile46

Semantic-based Retrieval ExampleQuery on

and

“Rose”

Copyright Berkeley Blobworld system

Page 47: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile47

Semantic-based Retrieval Example

Appearance counts!

Semantics counts!

Copyright Berkeley Blobworld System

Page 48: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile48

Effective MMIR

OverviewMedia RepresentationMedia IndexingQuery FormulationMedia RetrievalBrowsingDistribution/StreamingEvaluation

Page 49: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile49

TREC Goals

To increase research in information retrieval based on large-scale collections To provide an open forum for exchange of research ideas to increase communication among research, academia and governmentTo improve evaluation methodologies and measures for text retrievalTo create a series of collections covering different aspects of text retrieval [Voorhees00]

Page 50: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile50

Page 51: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile51

TREC Tasks and Evaluations

Traditional precision and recall measures with binary relevance judgments Three types of automatic tasks:

Ad-hoc task - new queries against new dataRouting and Filtering tasks - old queries against new dataSpecialized tasks question answering, known-item task

Interactive task - different functional systems are compared by giving tasks to human searchers, in which a real-time interface to an experimental system is used to gain the best possible results in under 5 minutes.

Page 52: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile52

Past SDR Test Collections

TREC-6 ‘97 TREC-7 ‘98 TREC-8 ‘99Broadcast News Collection

43 Hours 1996-97 1,451 Stories ~276 wrds/stry

87 Hours 1996-97 2,866 Stories ~269 wrds/stry

557 Hours Jan-Jun 1998 21,754 Stories~169 wrds/stry

Baseline ASR Transcripts

IBM (50% WER)

NIST/CMU SPHINX (33.8% / 46.6% WER)

NIST/BBN Byblos (27.5% / 26.7% WER)

Paradigm Known-Item (% at rank 1)

Ad Hoc (MAP) Ad Hoc (MAP)

Queries 50 23 49

Map = Mean Average Precision

Page 53: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile53

IR MetricsTraditional TREC ad-hoc Metric:

Mean Average Precision (MAP) using TREC_EVALCreated assessment pools for each topic using top 100 of all retrieval runs

Mean pool size: 596 (2.1% of all segments)Min pool size: 209Max pool size: 1309

NIST assessors created reference relevance assessments from topic pools Somewhat artificial for boundary unknown conditions

Page 54: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile54

Known Story Boundary Condition

Retrieval using pre-segmented news storiessystems given index of story boundaries for recognition with IDs for retrieval

excluded non-news segments stories are treated as documents

systems produce rank-ordered list of Story IDsdocument-based scoring:

score as in other TREC Ad Hoc tests using TREC_EVAL

Page 55: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile55

Unknown Story Boundary ConditionRetrieval using continuous speech stream

systems process entire broadcasts for ASR and retrieval with no provided segmentationsystems output a single time marker for each relevant excerpt to indicate topical passages

this task does NOT attempt to determine topic boundaries

time-based scoring:map to a story ID (“dummy” ID for retrieved non-stories and duplicates) score as usual using TREC_EVALpenalizes for duplicate retrieved storiesstory-based scoring somewhat artificial but expedient

Page 56: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile56

TREC SDR Conclusions

ad hoc retrieval in broadcast news domain appears to be a “solved problem”

systems perform well at finding relevant passages in transcripts produced by a variety of recognizers on full unsegmented news broadcasts

performance on own recognizer comparable to human reference just beginning to investigate use of non-lexical information

Caveat EmptorASR may still pose serious problems for Question Answering domain where content errors are fatal

Page 57: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile57

TREC Video RetrievalChallenge:

Answering “semantic” queries for video contentI.e., “retrieve video showing rocket launches”I.e., “retrieve clips of people water skiing”

Fully automated content analysis (w/o transcript or manual annotation of content)

Types of Queries:Automatic queries: feature extraction (i.e., color, texture, edges, motion) and content-based retrieval (CBR) using examplesInteractive queries: CBR + statistical modeling of features

Off-line: automatic classification of video content using statistical models of generic concepts (i.e., scenes, events, objects)Query time: user selection of classifiers (models) and example content (features) in iterative search process

Compare with automatic speech recognition (ASR) approach

Page 58: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile58

Assumptions and LimitationsAssumptions:

Example content provided with query, and/orStatement of information need is available

Statistical modeling of specific semantics:Limitations:

Large number of semantic concepts are relevant to any videoInsufficient training data (need training video content + labels)Not all concepts are easily modeled from simple visual features

Advantages:Feasible to train small number of statistical models for genericconcepts (i.e., indoors vs. outdoors, nature vs. man-made)

Complex concepts composed from generic concepts:i.e., waterskiing outdoors + water + people + boat

Complements content-based retrieval (CBR):i.e., CBR Model 1 Model 2 CBR … Results

Page 59: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile59

Shot Boundary Detection

SMPTE 00:12:45:20

Detects cuts, dissolves, fades and other gradual changesCompare multiple pairs of frames: 1, 3 and 7 frames apartProcesses decoded frames

Supports MPEG, QT, AVI, live feed,…No knobs (e.g., “sensitivity”) or tuning by user

Page 60: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile60

TREC Video Shot Boundary Detection Results

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 10.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

1

2

34

5

789

1011

12

13

14

15

Avg

Recall

Pre

cisi

onNIST Precision-Recall for all types of edits

Page 61: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile61

TREC Video Retrieval ResultsTREC Video Retrieval (IBM Research General Search Results)

0

2

4

6

8

10

12

14

16

18

20

vt11

vt24

vt37

vt38

vt39

vt40

vt41

vt42

vt43

vt44

vt45

vt46

vt47

vt48

vt49

vt50

vt51

vt52

vt53

vt54

vt55

vt56

vt57

vt58

vt59

vt63

vt64

vt65

vt66

vt72

vt73

vt74

TREC Video Retrieval Topic #

# Hi

ts IBM_A_ASRIBM_A_CBRIBM_A_C+S

ASR = Speech only CBR = Content-based only C+S = Combination

Page 62: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile62

TREC Video Retrieval 2002

Shot Boundary Detection not a separate taskRetrieval task to use NIST generated shotsManual queries only, interactive and automatic retrieval tasksMAP to be used as evaluation measureNo “Known Item” evaluation (?)

Page 63: Searching Video Collections: Representation, Indexing ... · played material, statistical analysis of played material Music Sales Record signatures of music/sound for small hand held

CIW-DCC, Universidad de Chile63

Searching Video Collections: Representation, Indexing, Browsing and Evaluation

Introduction to Multimedia Information Retrieval Effective MMIR

Multimedia RepresentationMultimedia IndexingQuery FormulationMultimedia RetrievalBrowsingDistribution/StreamingEvaluation

Multimedia IR Applications Conclusions