context-driven semantic multimedia search
DESCRIPTION
Presentation at GOPORTIS 2013 Conference, March, 19, 2013, HannoverTRANSCRIPT
Goportis Conference 2013 on Non-Textual Information - Strategy and Innovation Beyond Text
Context-Driven Semantic Multimedia
Search
Dr. Harald SackHasso-Plattner-Institut for IT-Systems Engineering
University of Potsdam
Hannover, 19/03/2013
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011
•Searching Multimedia Web vs. Archive
•How to Open Up Multimedia Data?Automated Multimedia Analysis
•How to Determine the Meaning of Metadata? Context-Driven Semantic Analysis
•Some Examples of Semantic Search
Context-Driven Semantic Multimedia Search
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
3
Searching the Web
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
4
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
4
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Workshop ,Corporate Semantic Web‘, XInnovations 2011, Berlin, 19. Sep. 2011Google Multimedia Search
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011
‣Google Multimedia Search relies on text-based metadata and link context
How does Google find Multimedia?
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011
Seach by Media Content
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011
The Ordinary Archive is a Small World...
Jules Verne
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011
The Ordinary Archive is a Small World...
Jules Verne
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011
But, wouldn‘t it be nice, if.....
Jules Verne
...but maybe you are also interested in- George Melies (2 videos)- Mark Twain (1 video)- H.G. Wells (2 videos)- science fiction (11 videos)- adventure (20 videos)- France (101 videos)- Moon (33 videos)- literature (434 videos)- art (1.205 videos)
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011
How to Search in Multimedia Archives?
vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut Potsdam
Step 1: Digitization of analog media
Step 2: Annotation with (text-based) metadata
Searching a Multimedia Archive
Step 3: Content-based retrieval based on available metadata
vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut Potsdam
vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut Potsdam
Today: Manual Annotation
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011
(Selected) Automated Media Analysisaudio-visual
text / images
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011 image
(Selected) Automated Media Analysisaudio-visual
text / images
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011 image
Visual Analysis
(Selected) Automated Media Analysisaudio-visual
text / images
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011 image
Text Recognition
Visual Analysis
(Selected) Automated Media Analysisaudio-visual
text / images
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011 image
VisualConceptDetection
Text Recognition
Visual Analysis
(Selected) Automated Media Analysisaudio-visual
text / images
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011 image
VisualConceptDetection
Text Recognition
Visual Analysis
(Selected) Automated Media Analysis
Logo Detection
audio-visual
text / images
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011 image
VisualConceptDetection
Text Recognition
Visual Analysis
(Selected) Automated Media Analysis
Face Detection
Face Detection
Logo Detection
audio-visual
text / images
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011 image
VisualConceptDetection
Text Recognition
Visual Analysis
(Selected) Automated Media Analysis
Face Detection
Face Detection
Logo Detection
audio-visual
text / images
Audio-Mining
structuralanalysis
AutomatedSpeech
Recognitionaudio event detection
audio
Structural Video Analysis
• Decomposition of time-based media into meaningful media fragments of coherent content that can be used as basic element for indexing and classification
scenes
shots
subshots
frames
video
keyframes
Video Optical Character Recognition (OCR)
Fig. 1. Workflow of the proposed text detection method. (b) is the vertical edge map of (a). (c) is the vertical dilation map of(b). (d) is the binary map of (c). (e) the result map of subsequent connected component analysis. (f) shows the binary map afterthe adaptive projection profile refinement. (g) is the final detection result.
for text detection of nature scene images. The operator com-putes for each pixel the width of the most likely stroke con-taining the pixel. The output of the operator is a stroke-featuremap, which has the same size as the input image, while eachpixel represents the corresponding stroke width value of theinput image.
3. TEXT DETECTION IN VIDEO IMAGES
Text detection is the first task of video OCR. Our approachdetermines, whether a single frame of a video file containstext lines, for which a tight bounding box is returned. In or-der to manage detected text lines efficiently, we have defined aclass ”text line object” with the following properties: bound-ing box location (the top-left corner position), bounding boxsize. After the first round of text detection, the refinement andthe verification procedures ensure the validity of the detectionresults in order to reduce false alarms.
3.1. Text detector
Before performing the text detection process, a gaussiansmooth filter is applied to the images that have an entropyvalue larger than a predefined threshold Tentr . For our pur-pose, Tentr =5.25 has proven to be to the best advantage.
We have developed an edge based text detector, subse-quently referred to edge text detector. The advantage of ourdetector is its computational efficiency compared to other ma-chine learning based approaches, because no computation-ally expensive training period is required. However, for vi-sually different video sequences a parameter adaption has tobe performed. The best suited parameter combination of ourmethod were learned from the test runs on the given test data.
Fig. 2. Workflow of the proposed adaptive text line refinementprocedure
The processing workflow for a single frame is depictedin Fig. 1 (a-e). First, a vertical edge map is produced usingSobel filter [8] (cf. Fig. 1 (b)). Then, the morphological dila-tion operation is adopted to link the vertical character edgestogether (cf. Fig. 1 (c)). Let MinW denote the detected min-imal text line width. A rectangle kernel:1�MinW is definedfor vertical dilation operator. Subsequently, a binary maskis generated by using Otsu’s thresholding method [9]. Ulti-mately, we create a binary map after Connected Component
• Video OCR is much more difficult than traditional print OCR• fast detection/filtering of text candidates• verification of text candidates• script separation from background• visual quality enhancement• application of standard OCR software• spell correction w.r.t. context and temporal
redundancy
• Face DetectionDetect candidate image regionsin a video frame that depict a human face
• Face TrackingTrack a detected face in videoover consecutive frames within shot boundaries
• Face ClusteringGroup faces detected and tracked in videos into visually similar sets within a single video
• Face Recognition/IdentificationReliable identification of detected faces
Video Face Detection, Tracking & Clustering
personfrontal face:90%
not a person
personprofile face:70%
Visual Context Detection
• Adaption of traditional ,Bag of Words‘ approach from text retrieval
• Image is expressed as vector (histogram)of dictionary codeword frequencies
• classification via machine learning(Support Vector Machines)
• Konzeptzuordnung durch maschinelles Lernverfahren (hier Support Vector Machines)
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011
• Authoritative Metadata• structured data• semi-structured data
• natural language text • Non-authoritative Metadata
• (free) user tags and comments• restricted vocabularies
• (Media) Analysis Metadata• low level features• high level features
• etc.
How to Determine the Meaning of Metadata?
SemanticAnalysis
reliability
context
pragmatics
location dependency
accuracy
timedependency
level ofabstraction
Annotation of Audiovisual Data
Metadata Extraction
Metadata (e.g. MPEG-7) ... <SpatialDecomposition> <TextAnnotation> <KeywordAnnotation> <Keyword>Astronaut</Keyword> </KeywordAnnotation> </TextAnnotation> <SpatialMask> <SubRegion> <Polygon> <Coords> 480 150 620 480 </Coords> </Polygon> </SubRegion> </SpatialMask> ... </SpatialDecomposition> ...
• Multimedia data with spatiotemporal Annotations
Neil Armstrong
,Neil Armstrong‘ is more than just a character string
Neil Armstrong
,Neil Armstrong‘ is more than just a character string
Neil Armstrong
Astronaut
is a
,Neil Armstrong‘ is more than just a character string
Neil Armstrong
Astronaut
is a
Science Occupation
subClassOf
,Neil Armstrong‘ is more than just a character string
Neil Armstrong
Astronaut
is a
Science Occupation
subClassOf
Employment
subClassOf
,Neil Armstrong‘ is more than just a character string
Neil Armstrong
Astronaut
is a
Person
is a
Science Occupation
subClassOf
Employment
subClassOf
,Neil Armstrong‘ is more than just a character string
Neil Armstrong
Astronaut
is a
Person
is a
Science Occupation
subClassOf
Employment
subClassOfhas an
,Neil Armstrong‘ is more than just a character string
Neil Armstrong
Astronaut
is a
Person
is a
Science Occupation
subClassOf
Employment
subClassOf
Entities
Ontologies
has an
,Neil Armstrong‘ is more than just a character string
Neil Armstrong
Astronaut
is a
Person
is a
Science Occupation
subClassOf
Employment
subClassOf
Entities
Ontologies
has an
,Neil Armstrong‘ is more than just a character string
is NOT a
Neil Armstrong
Astronaut
is a
Person
is a
Science Occupation
subClassOf
Employment
subClassOf
Entities
Ontologies
has an
,Neil Armstrong‘ is more than just a character string
Kosmonautsame as
is NOT a
Neil Armstrong
Astronaut
is a
Person
is a
Science Occupation
subClassOf
Employment
subClassOf
Entities
Ontologies
has an
,Neil Armstrong‘ is more than just a character string
Kosmonautsame as
Juri Gagarin
is a
is NOT a
Where does the knowledge come from...?
Where does the knowledge come from...?
Where does the knowledge come from...?
Web of Data = Linked Open DataBut what, if there is no trivial unique identification?
Web of Data = Linked Open DataBut what, if there is no trivial unique identification?
Armstronguser tag
Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Institute, University of Potsdam
Armstrong
Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Institute, University of Potsdam
ArmstrongArmstrong+Moon
Web of Data = Linked Open DataUnderstanding requires Context
Armstrong
Web of Data = Linked Open DataUnderstanding requires Context
Armstrong
Moon
Web of Data = Linked Open DataUnderstanding requires Context
Armstrong
Moon
Eagle
Web of Data = Linked Open DataUnderstanding requires Context
Armstrong
Moon
EagleSpace
4242 42 4224424242 42 4242Semantic AnalysisSemantics is determined by Context
Context Item
N.Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013
SEMEX Multimedia Context Model
4242 42 4224424242 42 4242Semantic AnalysisSemantics is determined by Context
Context Item
N.Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013
SEMEX Multimedia Context Model
Context Dimensions
TemporalContext
SpatialContext
ProvenanceContext
4242 42 4224424242 42 4242Semantic AnalysisSemantics is determined by Context
Context Item
N.Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013
SEMEX Multimedia Context Model
Context Dimensions
TemporalContext
SpatialContext
ProvenanceContext
Relevance
determines
4242 42 4224424242 42 4242Semantic AnalysisSemantics is determined by Context
Context Item
N.Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013
SEMEX Multimedia Context Model
Context Dimensions
TemporalContext
SpatialContext
ProvenanceContext
Relevance
determines
Contextual Description
ClassDiversity
Level of Structure
4242 42 4224424242 42 4242Semantic AnalysisSemantics is determined by Context
Context Item
N.Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013
SEMEX Multimedia Context Model
Context Dimensions
TemporalContext
SpatialContext
ProvenanceContext
Relevance
determines
Ambiguity
influences
Contextual Description
ClassDiversity
Level of Structure
4242 42 4224424242 42 4242Semantic AnalysisSemantics is determined by Context
Context Item
N.Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013
SEMEX Multimedia Context Model
Context Dimensions
TemporalContext
SpatialContext
ProvenanceContext
Relevance
determines
Ambiguity
influences
Contextual Description
ClassDiversity
Level of Structure
SourceReliability
SourceDiversity
4242 42 4224424242 42 4242Semantic AnalysisSemantics is determined by Context
Context Item
N.Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013
SEMEX Multimedia Context Model
Context Dimensions
TemporalContext
SpatialContext
ProvenanceContext
Relevance
determines
Ambiguity
influences
Accuracy
influences
Contextual Description
ClassDiversity
Level of Structure
SourceReliability
SourceDiversity
4242 42 4224424242 42 4242Semantic AnalysisSemantics is determined by Context
Context Item
N.Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013
„Armstrong landed the Eagle on the Moon.“Text
SEMEX Multimedia Context Model
Context Dimensions
TemporalContext
SpatialContext
ProvenanceContext
Relevance
determines
Ambiguity
influences
Accuracy
influences
Contextual Description
ClassDiversity
Level of Structure
SourceReliability
SourceDiversity
Armstrong
George Armstrong Custer
Neil Armstrong
The Armstrong Twins
Armstrong, Florida
Armstrong, Ontario
Armstrong Automobile
Joe ArmstrongArmstrong County, Texass
Armstrong Gun
Craig Armstrong
Armstrong (Moon Crater)
Louis Armstrong
Armstrong Tunnel
Louis Armstrong International Airport
Armstrong‘s Theorem
Sir Thomas Armstrong
Ian Armstrong
Eagle Moon
Eagle (Bird)
Eagle (heraldry)
USCGC Eagle
The Eagle (2011 film)
Eagle (song)
John H. EagleEagle (typeface)
Eagle Falls (Washington)
Eagle (Moon Crater)
Eagle (comic)
Eagle (lunar module)
Eagle TV
Armstrong Tunnel
The Eagle (Pub)
War Eagle
The Eagle (newspaper)
Eagle (racehorse)
Angela EagleLinda Eagle
James Philipp Eagle
95 entities448 entities
Armstrong (British Columbia)Karen Armstrong
Curtis Armstrong
Gillian Armstrong Hilary Armstrong
William L. Armstrong
156 entities
Man on the Moon (film)
Moon (song)
Moon Son-Ri
C Moon
The Moon (Tarot card)
Edgar Moon
Moon OSMoon (Band)
Moon
Moon 44
Man on the Moon (soundtrack)
William Moon
Lottie Moon
Mr. Moon (song)
Man on the Moon (musical)
Darvin Moon
Moon 83
Francis MoonGary Moon
Robert Charles Moon
Black Moon
Allan Moon
Ban-Ki Moon
Fly me to the Moon (song)
Semantic AnalysisNamed Entity Mapping
„Armstrong landed the Eagle on the Moon.“
Consider all entities within the same context
Select matching entities from all possible candidate entities: • Popularity based strategies• Linguistical strategies• Statistical strategies• Semantic based strategies
General Approach1. Make an assumption 2. Do the strategies support or contradict your assumption3. Make decision according to logical and probabilistic rules/constraints
Semantic AnalysisNamed Entity Recognition
N. Ludwig, H. Sack, “Named entity recognition for user-generated tags,TIR 2011
• reference text corpus(wikipedia)
• link graph (wikipedia)• semantic graph
(dbpedia)
Entity Selection Process
Armstrong
George Armstrong Custer
The Armstrong Twins
Armstrong, Florida
Armstrong, Ontario
Armstrong Automobile
Joe ArmstrongArmstrong County, Texass
Armstrong Gun
Craig Armstrong
Armstrong (Moon Crater)
Armstrong Tunnel
Louis Armstrong International Airport
Armstrong‘s Theorem
Sir Thomas Armstrong
Ian Armstrong
Eagle Moon
Eagle (Bird)
Eagle (heraldry)
USCGC Eagle
The Eagle (2011 film)
Eagle (song)
John H. EagleEagle (typeface)
Eagle Falls (Washington)
Eagle (Moon Crater)
Eagle (comic)
Eagle TV
Armstrong Tunnel
The Eagle (Pub)
War Eagle
The Eagle (newspaper)
Eagle (racehorse)
Angela EagleLinda Eagle
James Philipp Eagle
95 entities448 entities
Armstrong (British Columbia)Karen Armstrong
Curtis Armstrong
Gillian Armstrong Hilary Armstrong
William L. Armstrong
156 entities
Man on the Moon (film)
Moon (song)
Moon Son-Ri
C Moon
The Moon (Tarot card)
Edgar Moon
Moon OSMoon (Band)
Moon 44
Man on the Moon (soundtrack)
William Moon
Lottie Moon
Mr. Moon (song)
Man on the Moon (musical)
Darvin Moon
Moon 83
Francis MoonGary Moon
Robert Charles Moon
Black Moon
Allan Moon
Ban-Ki Moon
Neil Armstrong
Eagle (lunar module)
Moon
Louis Armstrong
Fly me to the Moon (song)
Semantic AnalysisNamed Entity Recognition
„Armstrong landed the Eagle on the Moon.“
N. Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013
Entity Selection Process(Semantic) Graph Analysis
4242 42 4224424242 42 4242
vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut Potsdam
30
Semantically Annotated Multimedia
Video Analysis /Metadata Extraction
timemetadata
metadatametadata
metadatametadata
e.g., person xylocation yzevent abc
e.g., bibliographical data,geographical data,encyclopedic data, ..
Entity Recognition/ Mapping
N. Ludwig, H. Sack: Named Entity Recognition for User-Generated Tags. In Proc. of the 8th Int. Workshop on Text-based Information Retrieval, IEEE CS Press, 2011
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
31
Entity Based Search
• linguistic ambiguities of traditional keyword based search can be avoided
• enables high precision and high recall retrieval
http://www.yovisto.com/labs/autosuggestion/
• Query string refinement / extension• entity auto-suggestion• interpretation of natural language queries
J. Osterhoff, J. Waitelonis, H. Sack, Widen the Peepholes! Entity-Based Auto-Suggestion as a rich and yet immediate Starting Point for Exploratory Search, IVDW 2012
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
32
http://mediaglobe.yovisto.com:8080/mggui-dev2/
search facets
C. Hentschel, H. Sack, et al., Open up cultural heritage in video archives with mediaglobe, I2CS 2012
Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
34
Explorative Search
dbpedia-owl:mission
dbpedia:Neil_Armstrong
dbpedia:Apollo_11dbpedia-owl:mission
category:Apollo_program
dcterms:subject
dbpedia:Apollo_13
dcterms:subject
yago:Space_accidents_and_incidents
rdf:type
rdf:type
dbpedia:Space_Shuttle_Challenger
dbpedia-owl:mission
dbpedia:Buzz_Collins
dbpedia:Michael_Collins
http://mediaglobe.yovisto.com:8080/J. Waitelonis, H. Sack: Towards exploratory video search using linked data, MTAP Volume 59, Number 2 (2012), 645-672
Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011
Contact:Dr. Harald SackHasso-Plattner-Institut für SoftwaresystemtechnikUniversität PotsdamProf.-Dr.-Helmert-Str. 2-3D-14482 Potsdam
Homepage:http://www.hpi.uni-potsdam.de/meinel/team/sack.htmlBlog: http://yovisto.blogspot.com/E-Mail: [email protected] Twitter: lysander07 / biblionomicon / yovisto Slides can be found at http://slideshare.com/lysander07/
Thank you very much
for your attention!