context-driven semantic multimedia search

Goportis Conference 2013 on Non-Textual Information - Strategy and Innovation Beyond Text

Context-Driven Semantic Multimedia

Search

Dr. Harald SackHasso-Plattner-Institut for IT-Systems Engineering

University of Potsdam

Hannover, 19/03/2013

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

•Searching Multimedia Web vs. Archive

•How to Open Up Multimedia Data?Automated Multimedia Analysis

•How to Determine the Meaning of Metadata? Context-Driven Semantic Analysis

•Some Examples of Semantic Search

Context-Driven Semantic Multimedia Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

3

Searching the Web


4

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Workshop ,Corporate Semantic Web‘, XInnovations 2011, Berlin, 19. Sep. 2011Google Multimedia Search


‣Google Multimedia Search relies on text-based metadata and link context

How does Google find Multimedia?


Seach by Media Content


The Ordinary Archive is a Small World...

Jules Verne


But, wouldn‘t it be nice, if.....

Jules Verne

...but maybe you are also interested in- George Melies (2 videos)- Mark Twain (1 video)- H.G. Wells (2 videos)- science fiction (11 videos)- adventure (20 videos)- France (101 videos)- Moon (33 videos)- literature (434 videos)- art (1.205 videos)


How to Search in Multimedia Archives?

vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut Potsdam

Step 1: Digitization of analog media

Step 2: Annotation with (text-based) metadata

Searching a Multimedia Archive

Step 3: Content-based retrieval based on available metadata


Today: Manual Annotation


(Selected) Automated Media Analysisaudio-visual

text / images

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011 image


text / images


Visual Analysis


text / images


Text Recognition

Visual Analysis


text / images


VisualConceptDetection

Text Recognition

Visual Analysis


text / images



Text Recognition

Visual Analysis

(Selected) Automated Media Analysis

Logo Detection

audio-visual

text / images



Text Recognition

Visual Analysis


Face Detection

Face Detection

Logo Detection

audio-visual

text / images



Text Recognition

Visual Analysis


Face Detection

Face Detection

Logo Detection

audio-visual

text / images

Audio-Mining

structuralanalysis

AutomatedSpeech

Recognitionaudio event detection

audio

Structural Video Analysis

• Decomposition of time-based media into meaningful media fragments of coherent content that can be used as basic element for indexing and classification

scenes

shots

subshots

frames

video

keyframes

Video Optical Character Recognition (OCR)

Fig. 1. Workflow of the proposed text detection method. (b) is the vertical edge map of (a). (c) is the vertical dilation map of(b). (d) is the binary map of (c). (e) the result map of subsequent connected component analysis. (f) shows the binary map afterthe adaptive projection profile refinement. (g) is the final detection result.

for text detection of nature scene images. The operator com-putes for each pixel the width of the most likely stroke con-taining the pixel. The output of the operator is a stroke-featuremap, which has the same size as the input image, while eachpixel represents the corresponding stroke width value of theinput image.

3. TEXT DETECTION IN VIDEO IMAGES

Text detection is the first task of video OCR. Our approachdetermines, whether a single frame of a video file containstext lines, for which a tight bounding box is returned. In or-der to manage detected text lines efficiently, we have defined aclass ”text line object” with the following properties: bound-ing box location (the top-left corner position), bounding boxsize. After the first round of text detection, the refinement andthe verification procedures ensure the validity of the detectionresults in order to reduce false alarms.

3.1. Text detector

Before performing the text detection process, a gaussiansmooth filter is applied to the images that have an entropyvalue larger than a predefined threshold Tentr . For our pur-pose, Tentr =5.25 has proven to be to the best advantage.

We have developed an edge based text detector, subse-quently referred to edge text detector. The advantage of ourdetector is its computational efficiency compared to other ma-chine learning based approaches, because no computation-ally expensive training period is required. However, for vi-sually different video sequences a parameter adaption has tobe performed. The best suited parameter combination of ourmethod were learned from the test runs on the given test data.

Fig. 2. Workflow of the proposed adaptive text line refinementprocedure

The processing workflow for a single frame is depictedin Fig. 1 (a-e). First, a vertical edge map is produced usingSobel filter [8] (cf. Fig. 1 (b)). Then, the morphological dila-tion operation is adopted to link the vertical character edgestogether (cf. Fig. 1 (c)). Let MinW denote the detected min-imal text line width. A rectangle kernel:1�MinW is definedfor vertical dilation operator. Subsequently, a binary maskis generated by using Otsu’s thresholding method [9]. Ulti-mately, we create a binary map after Connected Component

• Video OCR is much more difficult than traditional print OCR• fast detection/filtering of text candidates• verification of text candidates• script separation from background• visual quality enhancement• application of standard OCR software• spell correction w.r.t. context and temporal

redundancy

• Face DetectionDetect candidate image regionsin a video frame that depict a human face

• Face TrackingTrack a detected face in videoover consecutive frames within shot boundaries

• Face ClusteringGroup faces detected and tracked in videos into visually similar sets within a single video

• Face Recognition/IdentificationReliable identification of detected faces

Video Face Detection, Tracking & Clustering

personfrontal face:90%

not a person

personprofile face:70%

Visual Context Detection

• Adaption of traditional ,Bag of Words‘ approach from text retrieval

• Image is expressed as vector (histogram)of dictionary codeword frequencies

• classification via machine learning(Support Vector Machines)

• Konzeptzuordnung durch maschinelles Lernverfahren (hier Support Vector Machines)


• Authoritative Metadata• structured data• semi-structured data

• natural language text • Non-authoritative Metadata

• (free) user tags and comments• restricted vocabularies

• (Media) Analysis Metadata• low level features• high level features

• etc.

How to Determine the Meaning of Metadata?

SemanticAnalysis

reliability

context

pragmatics

location dependency

accuracy

timedependency

level ofabstraction

Annotation of Audiovisual Data

Metadata Extraction

Metadata (e.g. MPEG-7) ... <SpatialDecomposition> <TextAnnotation> <KeywordAnnotation> <Keyword>Astronaut</Keyword> </KeywordAnnotation> </TextAnnotation> <SpatialMask> <SubRegion> <Polygon> <Coords> 480 150 620 480 </Coords> </Polygon> </SubRegion> </SpatialMask> ... </SpatialDecomposition> ...

• Multimedia data with spatiotemporal Annotations

Neil Armstrong

,Neil Armstrong‘ is more than just a character string

Neil Armstrong


Neil Armstrong

Astronaut

is a


Neil Armstrong

Astronaut

is a

Science Occupation

subClassOf


Neil Armstrong

Astronaut

is a

Science Occupation

subClassOf

Employment

subClassOf


Neil Armstrong

Astronaut

is a

Person

is a

Science Occupation

subClassOf

Employment

subClassOf


Neil Armstrong

Astronaut

is a

Person

is a

Science Occupation

subClassOf

Employment

subClassOfhas an


Neil Armstrong

Astronaut

is a

Person

is a

Science Occupation

subClassOf

Employment

subClassOf

Entities

Ontologies

has an


Neil Armstrong

Astronaut

is a

Person

is a

Science Occupation

subClassOf

Employment

subClassOf

Entities

Ontologies

has an


is NOT a

Neil Armstrong

Astronaut

is a

Person

is a

Science Occupation

subClassOf

Employment

subClassOf

Entities

Ontologies

has an


Kosmonautsame as

is NOT a

Neil Armstrong

Astronaut

is a

Person

is a

Science Occupation

subClassOf

Employment

subClassOf

Entities

Ontologies

has an


Kosmonautsame as

Juri Gagarin

is a

is NOT a

Where does the knowledge come from...?

Web of Data = Linked Open DataBut what, if there is no trivial unique identification?

Web of Data = Linked Open DataBut what, if there is no trivial unique identification?

Armstronguser tag

Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Institute, University of Potsdam

Armstrong

Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Institute, University of Potsdam

ArmstrongArmstrong+Moon

Web of Data = Linked Open DataUnderstanding requires Context

Armstrong


Armstrong

Moon


Armstrong

Moon

Eagle


Armstrong

Moon

EagleSpace

4242 42 4224424242 42 4242Semantic AnalysisSemantics is determined by Context

Context Item

N.Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013

SEMEX Multimedia Context Model


Context Item



Context Dimensions

TemporalContext

SpatialContext

ProvenanceContext


Context Item



Context Dimensions

TemporalContext

SpatialContext

ProvenanceContext

Relevance

determines


Context Item



Context Dimensions

TemporalContext

SpatialContext

ProvenanceContext

Relevance

determines

Contextual Description

ClassDiversity

Level of Structure


Context Item



Context Dimensions

TemporalContext

SpatialContext

ProvenanceContext

Relevance

determines

Ambiguity

influences


ClassDiversity

Level of Structure


Context Item



Context Dimensions

TemporalContext

SpatialContext

ProvenanceContext

Relevance

determines

Ambiguity

influences


ClassDiversity

Level of Structure

SourceReliability

SourceDiversity


Context Item



Context Dimensions

TemporalContext

SpatialContext

ProvenanceContext

Relevance

determines

Ambiguity

influences

Accuracy

influences


ClassDiversity

Level of Structure

SourceReliability

SourceDiversity


Context Item


„Armstrong landed the Eagle on the Moon.“Text


Context Dimensions

TemporalContext

SpatialContext

ProvenanceContext

Relevance

determines

Ambiguity

influences

Accuracy

influences


ClassDiversity

Level of Structure

SourceReliability

SourceDiversity

Armstrong

George Armstrong Custer

Neil Armstrong

The Armstrong Twins

Armstrong, Florida

Armstrong, Ontario

Armstrong Automobile

Joe ArmstrongArmstrong County, Texass

Armstrong Gun

Craig Armstrong

Armstrong (Moon Crater)

Louis Armstrong

Armstrong Tunnel

Louis Armstrong International Airport

Armstrong‘s Theorem

Sir Thomas Armstrong

Ian Armstrong

Eagle Moon

Eagle (Bird)

Eagle (heraldry)

USCGC Eagle

The Eagle (2011 film)

Eagle (song)

John H. EagleEagle (typeface)

Eagle Falls (Washington)

Eagle (Moon Crater)

Eagle (comic)

Eagle (lunar module)

Eagle TV

Armstrong Tunnel

The Eagle (Pub)

War Eagle

The Eagle (newspaper)

Eagle (racehorse)

Angela EagleLinda Eagle

James Philipp Eagle

95 entities448 entities

Armstrong (British Columbia)Karen Armstrong

Curtis Armstrong

Gillian Armstrong Hilary Armstrong

William L. Armstrong

156 entities

Man on the Moon (film)

Moon (song)

Moon Son-Ri

C Moon

The Moon (Tarot card)

Edgar Moon

Moon OSMoon (Band)

Moon

Moon 44

Man on the Moon (soundtrack)

William Moon

Lottie Moon

Mr. Moon (song)

Man on the Moon (musical)

Darvin Moon

Moon 83

Francis MoonGary Moon

Robert Charles Moon

Black Moon

Allan Moon

Ban-Ki Moon

Fly me to the Moon (song)

Semantic AnalysisNamed Entity Mapping

„Armstrong landed the Eagle on the Moon.“

Consider all entities within the same context

Select matching entities from all possible candidate entities: • Popularity based strategies• Linguistical strategies• Statistical strategies• Semantic based strategies

General Approach1. Make an assumption 2. Do the strategies support or contradict your assumption3. Make decision according to logical and probabilistic rules/constraints

Semantic AnalysisNamed Entity Recognition

N. Ludwig, H. Sack, “Named entity recognition for user-generated tags,TIR 2011

• reference text corpus(wikipedia)

• link graph (wikipedia)• semantic graph

(dbpedia)

Entity Selection Process

Armstrong

George Armstrong Custer

The Armstrong Twins

Armstrong, Florida

Armstrong, Ontario

Armstrong Automobile

Joe ArmstrongArmstrong County, Texass

Armstrong Gun

Craig Armstrong

Armstrong (Moon Crater)

Armstrong Tunnel

Louis Armstrong International Airport

Armstrong‘s Theorem

Sir Thomas Armstrong

Ian Armstrong

Eagle Moon

Eagle (Bird)

Eagle (heraldry)

USCGC Eagle

The Eagle (2011 film)

Eagle (song)

John H. EagleEagle (typeface)

Eagle Falls (Washington)

Eagle (Moon Crater)

Eagle (comic)

Eagle TV

Armstrong Tunnel

The Eagle (Pub)

War Eagle

The Eagle (newspaper)

Eagle (racehorse)

Angela EagleLinda Eagle

James Philipp Eagle

95 entities448 entities

Armstrong (British Columbia)Karen Armstrong

Curtis Armstrong

Gillian Armstrong Hilary Armstrong

William L. Armstrong

156 entities

Man on the Moon (film)

Moon (song)

Moon Son-Ri

C Moon

The Moon (Tarot card)

Edgar Moon

Moon OSMoon (Band)

Moon 44

Man on the Moon (soundtrack)

William Moon

Lottie Moon

Mr. Moon (song)

Man on the Moon (musical)

Darvin Moon

Moon 83

Francis MoonGary Moon

Robert Charles Moon

Black Moon

Allan Moon

Ban-Ki Moon

Neil Armstrong

Eagle (lunar module)

Moon

Louis Armstrong

Fly me to the Moon (song)

Semantic AnalysisNamed Entity Recognition

„Armstrong landed the Eagle on the Moon.“

N. Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013

Entity Selection Process(Semantic) Graph Analysis

4242 42 4224424242 42 4242


30

Semantically Annotated Multimedia

Video Analysis /Metadata Extraction

timemetadata

metadatametadata

metadatametadata

e.g., person xylocation yzevent abc

e.g., bibliographical data,geographical data,encyclopedic data, ..

Entity Recognition/ Mapping

N. Ludwig, H. Sack: Named Entity Recognition for User-Generated Tags. In Proc. of the 8th Int. Workshop on Text-based Information Retrieval, IEEE CS Press, 2011


31

Entity Based Search

• linguistic ambiguities of traditional keyword based search can be avoided

• enables high precision and high recall retrieval

http://www.yovisto.com/labs/autosuggestion/

• Query string refinement / extension• entity auto-suggestion• interpretation of natural language queries

J. Osterhoff, J. Waitelonis, H. Sack, Widen the Peepholes! Entity-Based Auto-Suggestion as a rich and yet immediate Starting Point for Exploratory Search, IVDW 2012

http://mediaglobe.yovisto.com/mggui-dev2/

http://mediaglobe.yovisto.com/mggui-dev2/


32

http://mediaglobe.yovisto.com:8080/mggui-dev2/

search facets

C. Hentschel, H. Sack, et al., Open up cultural heritage in video archives with mediaglobe, I2CS 2012

http://mediaglobe.yovisto.com:8080/mggui/#start

http://mediaglobe.yovisto.com:8080/mggui/#start


34

Explorative Search

dbpedia-owl:mission

dbpedia:Neil_Armstrong

dbpedia:Apollo_11dbpedia-owl:mission

category:Apollo_program

dcterms:subject

dbpedia:Apollo_13

dcterms:subject

yago:Space_accidents_and_incidents

rdf:type

rdf:type

dbpedia:Space_Shuttle_Challenger

dbpedia-owl:mission

dbpedia:Buzz_Collins

dbpedia:Michael_Collins

http://mediaglobe.yovisto.com:8080/J. Waitelonis, H. Sack: Towards exploratory video search using linked data, MTAP Volume 59, Number 2 (2012), 645-672

http://mediaglobe.yovisto.com:8080/

http://mediaglobe.yovisto.com:8080/

http://www.springerlink.com/content/m45r495n634m0q05/

http://www.springerlink.com/content/m45r495n634m0q05/


Contact:Dr. Harald SackHasso-Plattner-Institut für SoftwaresystemtechnikUniversität PotsdamProf.-Dr.-Helmert-Str. 2-3D-14482 Potsdam

Homepage:http://www.hpi.uni-potsdam.de/meinel/team/sack.htmlBlog: http://yovisto.blogspot.com/E-Mail: [email protected] Twitter: lysander07 / biblionomicon / yovisto Slides can be found at http://slideshare.com/lysander07/

Thank you very much

for your attention!

http://www.hpi.uni-potsdam.de/meinel/team/sack.html

http://www.hpi.uni-potsdam.de/meinel/team/sack.html

http://yovisto.blogspot.com

http://yovisto.blogspot.com

mailto:[email protected]

mailto:[email protected]

http://slideshare.com/lysander07/

http://slideshare.com/lysander07/

context-driven semantic multimedia search

Education

videosharald sack

media contentharald

harald sackhassoplattner

automated multimedia

jules verneharald sack

semantic analysis

link contextharald sack

multimedia web