cross-media intelligent searching in digital library yueting zhuang zhejiang university, china nov....
TRANSCRIPT
Cross-media Intelligent Cross-media Intelligent Searching in Digital Searching in Digital
Library Library
Yueting Zhuang Yueting Zhuang
Zhejiang University, ChinaZhejiang University, China
Nov. 18, 2006, EgyptNov. 18, 2006, Egypt
ICUDL06, YT ZhuangICUDL06, YT Zhuang
OutlineOutline
1. CADAL: China digital library1. CADAL: China digital library
2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library
3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval
4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice
5. Building Personalized Portal5. Building Personalized Portal
6. Conclusion6. Conclusion
ICUDL06, YT ZhuangICUDL06, YT Zhuang
OutlineOutline
1. CADAL: China digital library1. CADAL: China digital library
2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library
3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval
4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice
5. Building Personalized Portal5. Building Personalized Portal
6. Conclusion6. Conclusion
ICUDL06, YT ZhuangICUDL06, YT Zhuang
1. CADAL: China Digital 1. CADAL: China Digital LibraryLibrary
China-US One Million Book Digital Library Project
a unique library resource to scholars, students, and
citizens
contain over one million scanned books
A big step towards the goal: create a universal free to
read digital library• Get knowledge available on the web, anytime, anyone, anywhere
http://www.cadal.zju.edu.cnhttp://www.cadal.zju.edu.cn
ICUDL06, YT ZhuangICUDL06, YT Zhuang
1.0231.023 million books was digitized, including: million books was digitized, including: Degree dissertationDegree dissertation Modern Chinese books Modern Chinese books Traditional cultural resources Traditional cultural resources English booksEnglish books
Supporting multimedia resource:Supporting multimedia resource: Image Image audioaudio videovideo 3D model3D model Chinese calligraphyChinese calligraphy
about 200,000 clicks a day (http://www.cadal.zju.edu.cn)about 200,000 clicks a day (http://www.cadal.zju.edu.cn) users spread over 70 countries and regionsusers spread over 70 countries and regions 16 scanning centers in China, occupying more than 2000 square met16 scanning centers in China, occupying more than 2000 square met
ersers
As of today, CADAL has achieved:As of today, CADAL has achieved:
ICUDL06, YT ZhuangICUDL06, YT Zhuang
Service structure of Service structure of CADAL:CADAL:
CALIS Integration
Unified Authentication
Personal Portal
Personal Service
Unified Quick Search
Advanced Search
Knowledge Map
Sign Language
Movie Search
CalligraphySearch
Image Search
Cultural Relics
Illustration Search
Bilingual Translation
Help System
FullText Search
Metadata Havesting
Resource Location
Access Control Policy
User Management Logging
ICUDL06, YT ZhuangICUDL06, YT Zhuang
digital resources are classified into 8 classes digital resources are classified into 8 classes
according to the publication time and type.according to the publication time and type.
both unified and advanced search are provided for all both unified and advanced search are provided for all
resourcesresources
Current services provided by CADALCurrent services provided by CADAL::
(1) (1) Metadata searchingMetadata searching
ICUDL06, YT ZhuangICUDL06, YT Zhuang
(3) (3) advanced searchadvanced search
Users can choose search scope, combined results and result style
Second search, full texts and detailed information are available in result page.
ICUDL06, YT ZhuangICUDL06, YT Zhuang
(4) (4) full-text searchfull-text search
Full text search uses the texts from OCR
ICUDL06, YT ZhuangICUDL06, YT Zhuang
OutlineOutline
1. CADAL: China digital library1. CADAL: China digital library
2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library
3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval
4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice
5. Building Personalized Portal5. Building Personalized Portal
6. Conclusion6. Conclusion
ICUDL06, YT ZhuangICUDL06, YT Zhuang
2. Our Vision to Next Generation of 2. Our Vision to Next Generation of Digital LibraryDigital Library
support multimodal sourcessupport multimodal sources
enable cross-media retrievalenable cross-media retrieval
What the next generation of DL looks like?
typical features of existing DLs: books are indexed by title, author, keywords…books are indexed by title, author, keywords…
users query books by keywords inputusers query books by keywords input
mostly only text information is returnedmostly only text information is returned
multimodal data is not fully-supportedmultimodal data is not fully-supported
ICUDL06, YT ZhuangICUDL06, YT Zhuang
Extension to the concept of “Book”Extension to the concept of “Book”
The key of our vision to next generation of The key of our vision to next generation of digital library is the extension of “book” digital library is the extension of “book” conceptconcept• A book is regarded as A book is regarded as not only the written not only the written
symbols on papers, but also any type of symbols on papers, but also any type of multimedia “item”,multimedia “item”, such as such as
A video clipA video clip An audio clipAn audio clip A piece of paintingA piece of painting …………..
ICUDL06, YT ZhuangICUDL06, YT Zhuang
So in the next generation of DL, “book” can be in “multimodal”:
Scenery Image Chinese Calligraphy Video fragment Audio clips
……
a general data representation for multimodal data
feature analysis knowledge mining
We can find a general data structure to represent multimodal “books”
ICUDL06, YT ZhuangICUDL06, YT Zhuang
Supporting multimodal data is an important trend in multimedia retrieval:
We get multimodal information from real world, then can we get multimodal data from digital world, especial like a digital library?
multimodal ?
real world digital world
texts
image
audio
video……
ICUDL06, YT ZhuangICUDL06, YT Zhuang
Cross-media retrievalCross-media retrieval
After the extension of “After the extension of “Book”Book” concept, the retrieval shall also be concept, the retrieval shall also be extended. extended.
We call it “cross-media retrieval”. We call it “cross-media retrieval”.
ICUDL06, YT ZhuangICUDL06, YT Zhuang
Cross-media-Cross-media-
Cross-media-
Scenario: a simple example of cross-media :
Starting Query
Starting QueryStarting
Query
User can start a query from any type of media, and relevant multimedia data would be returned.
Textual Description tothe giant Panda: the Panda is a kind of cat which ……
“Giant Panda” Image
“Giant Panda” Text “Giant Panda” Audio
ICUDL06, YT ZhuangICUDL06, YT Zhuang
Cross-media retrieval is a useful way to access multimodal data:
available available
available available
Cross-media retrieval can be regarded as the simulation of the real world, and it helps us get multimodal data in a more flexible and more informative way!
textsimage
audiovideo
…… ……
ICUDL06, YT ZhuangICUDL06, YT Zhuang
What cross-media retrieval needs to do?
user query interfaceSubmit a query example
It can be an image, audio or keywords…
cross-media search enginecross-media search enginecross-media search engine
texts image audio video
raw data
knowledge base
multimodal representation & index
query results:
texts, images, audios…
ICUDL06, YT ZhuangICUDL06, YT Zhuang
OutlineOutline
1. CADAL: China digital library1. CADAL: China digital library
2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library
3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval
4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice
5. Building Personalized Portal 5. Building Personalized Portal
6. Conclusion6. Conclusion
ICUDL06, YT ZhuangICUDL06, YT Zhuang
3. From Multimedia Retrieval to 3. From Multimedia Retrieval to Cross-media RetrievalCross-media Retrieval
1) Image Retrieval: Content-based
ICUDL06, YT ZhuangICUDL06, YT Zhuang
negative example
query example
Searching images
relevance feedback
positive example
ICUDL06, YT ZhuangICUDL06, YT Zhuang
multimedia retrieval
(2) Image retrieval: text-based
Query text
ICUDL06, YT ZhuangICUDL06, YT Zhuang
(3) Motion retrieval
Given a query example of motion data, we can find similar motion data from database.
multimedia retrieval
ICUDL06, YT ZhuangICUDL06, YT Zhuang
(4) Audio retrieval: Content-based
multimedia retrieval
content-based audio search engine
audio depository
audio query example
user
submit
adjust feature weight
adjust query center
returned audio results
return
relevance feedbackuser judge
System Framework
ICUDL06, YT ZhuangICUDL06, YT Zhuang
audio retrieval: key techniques
multimedia retrieval
extract auditory features in compression field from extract auditory features in compression field from
audio clipsaudio clips
cluster fuzzy auditory featurescluster fuzzy auditory features
represent audio clips with the cluster centerrepresent audio clips with the cluster center
retrieve similar audios by cluster center matchingretrieve similar audios by cluster center matching
introduce relevance feedback techniquesintroduce relevance feedback techniques
ICUDL06, YT ZhuangICUDL06, YT Zhuang
query examplefeature weight
relevance feedback
weight adjusting
audio retrieval: an example
multimedia retrieval
ICUDL06, YT ZhuangICUDL06, YT Zhuang
(5) video retrieval: Overview
multimedia retrieval
unlike text resources, video is unstructured.unlike text resources, video is unstructured.• rich in visual contents;rich in visual contents;• poor in semantic understanding; poor in semantic understanding;
the challenging issues:the challenging issues:• summarization & structuring;summarization & structuring;• video miningvideo mining
ICUDL06, YT ZhuangICUDL06, YT Zhuang
(5) video retrieval: key techniques
multimedia retrieval
video structuring: video structuring: construct video table-of-content (VTOC)construct video table-of-content (VTOC) make it physically structured. make it physically structured.
video summarization: video summarization: help the user quickly grasp the content of video clipshelp the user quickly grasp the content of video clips support video browsing support video browsing video encoding/compressionvideo encoding/compression
ICUDL06, YT ZhuangICUDL06, YT Zhuang
video
Scene
group
shot
key frame
concept clustering
video stream
temporal features
spatial features
table of contents
shot boundary detection
Key Frame Extraction
grouping
scene construction
video structuring
ICUDL06, YT ZhuangICUDL06, YT Zhuang
video summary: video content mining
original video(redundant)
summarized video(concise and informative )
video contentmining
Find meaningful patterns to support efficient video browsing
ICUDL06, YT ZhuangICUDL06, YT Zhuang
two news video are separated in 6 video shots (the following are the key frames) .And their total length is 3 minutes
video summary: an example
ICUDL06, YT ZhuangICUDL06, YT Zhuang
After video summarization, the video is 3 seconds.
And it consists of 3 key frames as below.
ICUDL06, YT ZhuangICUDL06, YT Zhuang
video shot clustering result
video shot
original videosimilar video shots are clustered together
ICUDL06, YT ZhuangICUDL06, YT Zhuang
(6) 3D model retrieval: overview
multimedia retrieval
measure 3D model with shape similarity
ICUDL06, YT ZhuangICUDL06, YT Zhuang
(6) 3D model retrieval: an example
multimedia retrieval
query example
ICUDL06, YT ZhuangICUDL06, YT Zhuang
As shown above, the multimedia As shown above, the multimedia retrieval is generally retrieval is generally content-based X retrieval—CBXR. —CBXR.
ICUDL06, YT ZhuangICUDL06, YT Zhuang
towards cross-media Retrieval
Motivation
image retrieval
audio retrievalvideo retrieval
motion retrieval
3D model retrieval
Cross-media retrieval……
intelligent integration
We can provide a more flexible and efficient way to access multimodal data.
We name it as cross-media retrieval.
CBXR
ICUDL06, YT ZhuangICUDL06, YT Zhuang
Support multimodal sourcesSupport multimodal sources smooth integration of multimodal data;smooth integration of multimodal data;
query media objects by examples of different modalities; query media objects by examples of different modalities;
Challenging issues:Challenging issues: texts, images, audios, etc. are represented with different texts, images, audios, etc. are represented with different
featuresfeatures
different features are heterogeneousdifferent features are heterogeneous
cross-media similarity can’t be measured by content featurescross-media similarity can’t be measured by content features
there is a semantic gap between low-level features and there is a semantic gap between low-level features and semanticssemantics
ICUDL06, YT ZhuangICUDL06, YT Zhuang
Our Solution to Cross-media retrieval
build cross-indexing from multimodal build cross-indexing from multimodal datadata
organize multimedia documentorganize multimedia document
explore cross-media correlationsexplore cross-media correlations
…………
ICUDL06, YT ZhuangICUDL06, YT Zhuang
Cross-indexing-based retrieval: General idea
text
image
audio
video
graphics
text search engine
image search engine
audio search engine
video search engine
graphics search engine
preprocessingcross-index
graph
cross-index multimodal
search engine
SVM based
clustering
Retrie
val in
terfa
cequery
search results fusion
results
relevance feedback
……
ICUDL06, YT ZhuangICUDL06, YT Zhuang
an image query example
retrieved images
retrieved video
retrieved audio
(1) Cross-index retrieval: interface
The system now support images, audios and videos. Users can submit any of the media objects, and the system returns relevant images, audios and videos.
ICUDL06, YT ZhuangICUDL06, YT Zhuang
Building multimedia document: General idea
definition of multimedia documentdefinition of multimedia document
a logical representation of multimodal data;a logical representation of multimodal data;
consists of semantically related media objects; consists of semantically related media objects;
formal structure:formal structure:
Document := <ID, Title, URI, KeywordList, ElementSet,LinkSet>Document := <ID, Title, URI, KeywordList, ElementSet,LinkSet>
ElementSet := { (Audio| Image | Text | Video) i | i N }∈ElementSet := { (Audio| Image | Text | Video) i | i N }∈
Audio := <ID, ParentID, URI, Size, KeywordList, AudioFeature>Audio := <ID, ParentID, URI, Size, KeywordList, AudioFeature>
Image := <ID, ParentID, URI, Size, KeywordList, ImageFeature>Image := <ID, ParentID, URI, Size, KeywordList, ImageFeature>
Text := <ID, ParentID, URI, KeywordList >Text := <ID, ParentID, URI, KeywordList >
Video := <ID, ParentID, URI, Frames, KeywordList, VideoFeature>Video := <ID, ParentID, URI, Frames, KeywordList, VideoFeature>
ICUDL06, YT ZhuangICUDL06, YT Zhuang
Build multimedia document: framework
text
image
audio
video
graphics
Semantic skeleton base
Storage SubsystemMultimedia document
Preprocessing
Learning and Relevance feedback subsystem
Query Processor(multimedia document + media objects)
keyword
ICUDL06, YT ZhuangICUDL06, YT Zhuang
Besides keyword-based search, the user can perform a content-based search with a specific media object as the query example
A multimedia document is visualized as its sketch, i.e. text, images and key-frame lists for videos.
image video text multimedia document
the left figure is the relevant media data retrieved by the query of “water”.
Building multimedia document: retrieval interface
ICUDL06, YT ZhuangICUDL06, YT ZhuangChallenges:
visual feature space auditory feature space
high-level semantics: war, dog, bird, car, tiger
Gap 2: Semantic gap
1. multimodal data reside in heterogeneous feature spaces2. the semantic gap
Gap 1: Content gap
Exploring cross-media correlations: challenges
ICUDL06, YT ZhuangICUDL06, YT Zhuang
Images and audios represent high-level semantics from different perspectives. If we can find the correlation between different perspectives, we can enable cross-media retrieval with the bridge of correlations.
bird explosiontiger dogcar
correlationcorrelation
Exploring Cross-media Correlations: Solutions
ICUDL06, YT ZhuangICUDL06, YT Zhuang
Canonical correlation analysis
11 12, ..., 1
21 22, ..., 2
......
1 2, ...,
,
,'
,
m
m
n n nm
x x x
x x xX
x x x
11 12, ..., 1
21 22, ..., 2
......
1 2, ...,
,
,'
,
m
m
n n nm
y y y
y y yY
y y y
Output:
11 12, ..., 1
21 22, ..., 2
......
1 2, ...,
,
,
,
p
p
n n np
x x x
x x xX
x x x
11 12, ......, 1
21 22, ......, 2
......
1 2, ......,
,
,
,
q
q
n n nq
y y y
y y yY
y y y
image feature matrix: Audio feature matrix:
Input : npX nqY
At the same time, the correlation between X and Y maximally coincides with the correlation between X’ and Y’
X and Y are of different dimension !
X and Y are of the same dimension !
Basic idea:
Exploring cross-media correlations: mathematical realization
ICUDL06, YT ZhuangICUDL06, YT Zhuang
the correlation network in the subspace
locate
1. how to measure both intra- and inter-media correlations ?1. how to measure both intra- and inter-media correlations ?
2. how to introduce new media objects into the system?2. how to introduce new media objects into the system?
locate
testing data
Intra-mediaIntra-media
cross-media
cross-media
Exploring cross-media correlations: subsequent challenges
ICUDL06, YT ZhuangICUDL06, YT Zhuang
OutlineOutline
1. CADAL: China digital library1. CADAL: China digital library
2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library
3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval
4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice
5. Building Personalized Portal 5. Building Personalized Portal
6. Conclusion6. Conclusion
ICUDL06, YT ZhuangICUDL06, YT Zhuang
4. Retrieval of Chinese Calligraphy 4. Retrieval of Chinese Calligraphy CharacterCharacter
motivation: Original calligraphy works is unique. They exist in paper, bamboo slips, and are easily to be destroyed.
ICUDL06, YT ZhuangICUDL06, YT Zhuang
How to search?
In our digital library, we digitize Chinese Calligraphy works, Design retrieval systems to make them sharable by all the people on internet.
ICUDL06, YT ZhuangICUDL06, YT Zhuang
the objective:
1. to query similar characters1. to query similar characters
Similar characters could be found and returned to users.This is like traditional content based image retrieval.
ICUDL06, YT ZhuangICUDL06, YT Zhuang
2. to find out where a character comes from2. to find out where a character comes from
We aim to provide an intelligent way to find out surrounding characters, and represent them to users.
Character “ 其” comes from this work
ICUDL06, YT ZhuangICUDL06, YT Zhuang
System Overview
segmentation
individual
characters
feature extraction
Database
feature dataraw data
scanner
Ancient Books
digitize
search engine
ICUDL06, YT ZhuangICUDL06, YT Zhuang
feature extractionfeature extraction
shape matchingshape matching
speed upspeed up
(2). retrieval :
(1). segmentation :
noise eliminationnoise elimination
page-image analysispage-image analysis
smoothingsmoothing
ICUDL06, YT ZhuangICUDL06, YT Zhuang
(1) segmentation
We segment page into columns, and cut the columns into individual characters within the minimum-bounding box.
minimum-bounding box
ICUDL06, YT ZhuangICUDL06, YT Zhuang
(2) Retrieval of Chinese Calligraphy Characters
feature extraction:feature extraction:
we use contour points to represent the calligraphy character,and keep the features of each individual calligraphy character in the database
Calligraphy character is written by brush in stead of hard pen.The brush causes stroke varies in different shape and different sickness. Also the ancient calligraphy has many degradation because of nature changes.
ICUDL06, YT ZhuangICUDL06, YT Zhuang
shape matching:shape matching:
•use polar coordinates to represent the characters:
divide the direction into 8 bins equally, and divide each bin into 4 areas. Then count the points in every bins as show in the picture.
ICUDL06, YT ZhuangICUDL06, YT Zhuang
speed up strategy:speed up strategy:
coarse-to-fine Strategy
improve Shape matching algorithm• dynamic Time Warping of projecting histogram• extended DTW for 2D calligraphy contour warping
high dimensional indexing
ICUDL06, YT ZhuangICUDL06, YT Zhuang
Visualization of Chinese
Calligraphy
Shape-based character retrieval
Retrieval result
Submit Example
ICUDL06, YT ZhuangICUDL06, YT Zhuang
OutlineOutline
1. CADAL: China digital library1. CADAL: China digital library
2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library
3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval
4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice
5. Building Personalized Portal5. Building Personalized Portal
6. Conclusion6. Conclusion
ICUDL06, YT ZhuangICUDL06, YT Zhuang
5. Building Personalized Portal5. Building Personalized Portal
Personalized portal
Web personalization is the technique to help users quickly Web personalization is the technique to help users quickly locate interesting information which features locate interesting information which features multimediamultimedia and and cross-mediacross-media..
Service integration around the content
Information filtering based recommendation
Show me the information that I really need !
ICUDL06, YT ZhuangICUDL06, YT Zhuang
personalized portal
Personalization services provided by portal:Personalization services provided by portal: my bookshelfmy bookshelf my bookmarkmy bookmark my rulesmy rules personal profile personal profile
settingsetting
My bookshelf
My bookmark
Books recommended by rules
ICUDL06, YT ZhuangICUDL06, YT Zhuang
detail information about bookdetail information about book translate metadatatranslate metadata full-text searchfull-text search my bookshelf managementmy bookshelf management rankingranking CALIS union catalog and inter- CALIS union catalog and inter-
library loan library loan
““My bookshelf”My bookshelf” management management ““my bookmark”my bookmark” management management bilingual translation bilingual translation full-text searchfull-text search
service integration around the content
ICUDL06, YT ZhuangICUDL06, YT Zhuang
information filtering based recommendation
the classification of Web datathe classification of Web data content data: texts, images……content data: texts, images…… structure data: XML/HTML tagstructure data: XML/HTML tag usage data: Web access logusage data: Web access log user profile: preferences, demographic informationuser profile: preferences, demographic information
implementing information filtering techniquesimplementing information filtering techniques content –based filtering methodcontent –based filtering method collaborative filtering methodcollaborative filtering method
ICUDL06, YT ZhuangICUDL06, YT Zhuang
6. Conclusion6. Conclusion•Next generation of digital library shall focus more on multimedia, and finally cross-media retrieval.
•But more research issues to be faced with……
• Cross-Media Representation Framework• Cross-Media Knowledge-based Reasoning• Analysis and Recognition• Complex retrieval