84079707-nemo-ppt (1)
TRANSCRIPT
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 1/48
Sailing the Web with
Captain Nemo a Personalized Metasearch Engine
(http: / /www.dblab.ntua.gr/~stef/nemo)
Stefanos Souldatos, Theodore Dalamagas, Timos Sellis
(National Technical University of Athens, Greece)
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 2/48
INTRODUCTION
Metasearching
Personalization
Metasearching & Personalization
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 3/48
Metasearching
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 4/48
Metasearching
WEB
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 5/48
Metasearching
SearchEngine
1
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 6/48
Metasearching
SearchEngine
1
Search
Engine
2
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 7/48
Metasearching
SearchEngine
1
Search
Engine
2
SearchEngine
3
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 8/48
Metasearching
MetasearchEngine
SearchEngine
1
Search
Engine
2
SearchEngine
3
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 9/48
Metasearching
MetasearchEngine
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 10/48
Personalization
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 11/48
Personalization
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 12/48
Personalization
M t hi &
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 13/48
Metasearching &
Personalization
ResultRetrieval
ResultPresentation
Result Administration
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 14/48
INTRODUCTION TO
CAPTAIN NEMO
Personalization in Captain Nemo
Contribution
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 15/48
Person a l iza t ion in Capt a in Nem o
Personal Retrieval Model(search engines, #pages, timeout)
Personal Presentation Style(grouping, ranking, appearance)
Topics of Personal Interest(semi-automatic classification)
ResultRetrieval
Result
Presentation
Result
Administration
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 16/48
Contribution
n We present personalization techniques for metasearch engines (presentation style,retrieval model, ranking algorithm).
n We suggest semi-automatic classificationtechniques in order to recommend relevanttopics of interest to classify the retrieved
Web pages.n We present a fully-functional metasearch
engine, called Captain Nemo, that
implements the above framework.
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 17/48
RELATED WORK
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 18/48
Personalization in Retrieval
WebCrawler
Search
Ixquick
Infogrid
Mamma
Profusion
Query Server
search engines tobe used
timeout option (i.e.max time to wait for
search engine results)
number of pages to be
retrieved by each engin
User defines the:
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 19/48
Personalization in Retrieval
WebCrawler
Search
Ixquick
Infogrid
Mamma
Profusion
Query Server
search engines tobe used
timeout option (i.e.max time to wait for
search engine results)
number of pages to be
retrieved by each engin
User defines the:
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 20/48
Personalization in Retrieval
WebCrawler
Search
Ixquick
Infogrid
Mamma
Profusion
Query Server
search engines tobe used
timeout option (i.e.max time to wait for
search engine results)
number of pages to be
retrieved by each engin
User defines the:
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 21/48
Personalization in Presentation
AllthewebPersonal stylesheets tocustomize the look ‘n’ feel
AltaVistaHigh or low details in thedescription of the results
WebCrawlerMetaCrawler
Dogpile
Result grouping by searchengine that retrieved them
Metasearch Engines
Search Engines
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 22/48
Topics of Personal Interest
Buntine etal. (2004)
Topic-based open sourcesearch engine
Organizes search resultsinto custom foldersNorthern Light
Recognises categories and
improves queries towardsa categoryInquirus2
Chakrabarti
et al. (1998)
Exploit link information for
hypertext categorization
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 23/48
CAPTAIN NEMO
UserProfile
Personal Retrieval Model
Personal Presentation Style
Topics of Personal Interest
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 24/48
Personal Retrieval Model
n Search Engines
n Number of Results
n Search Engine Timeout
n Search Engine Weight
SearchEngine 1
SearchEngine 2
SearchEngine 3
ü
20
6
ü
30
8
ü
10
4
7 10 5
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 25/48
CAPTAIN NEMO
UserProfile
Personal Retrieval Model
Personal Presentation Style
Topics of Personal Interest
R lt G i
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 26/48
Result Grouping
n Merged in a single list
n Grouped by search engine
n Grouped by relevant topic of interest
R lt C t t
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 27/48
Result Content
n Title
n Title, URL
n Title, URL, Description
L k ‘ ’ F l
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 28/48
Look ‘n’ Feel
n Color Themes(XSL Stylesheets)
n Page Layout
n Font Size
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 29/48
CAPTAIN NEMO
UserProfile
Personal Retrieval Model
Personal Presentation Style
Topics of Personal Interest
Topics Administration
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 30/48
Topics Administration
n The user defines topics of personal interest(i.e. thematic categories).
n Each thematic category has a name and a
description of 10-20 words.
n The system offers an environment for the
administration of the thematic categories andtheir content.
Semi automatic Classification
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 31/48
Semi-automatic Classification
n The system proposes the most appropriatethematic category for each result.
n The user can save the results in the
proposed or other category.
n The classification implements a NearestNeighbor algorithm (Witten et al., 1999)
comparing the title and description of resultswith the name and description of thethematic categories.
Classification Example
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 32/48
Classification Example
Topics of Interest(t1) Sports:
football basketball
baseball swimmingtennis soccer game
(t2) Science:
scientific maths
physics computertechnology
(t3) Arts:
decorating art
painting poetrysculpture musi
Alen Computer Co. can teach you the art of programming...Technology is just a game
now...computer science for beginners
Result
Classification Example
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 33/48
Classification Example
Topics of Interest(t1) Sports:
football basketball
baseball swimmingtennis soccer game
(t2) Science:
scientific maths
physics computertechnology
(t3) Arts:
decorating art
painting poetrysculpture musi
Alen Computer Co. can teach you the art of programming...Technology is just a game
now...computer science for beginners
Result
0.287
Classification Example
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 34/48
Classification Example
Topics of Interest(t1) Sports:
football basketball
baseball swimmingtennis soccer game
(t2) Science:
scientific maths
physics computertechnology
(t3) Arts:
decorating art
painting poetrysculpture musi
Alen Computer Co. can teach you the art of programming...Technology is just a game
now...computer science for beginners
Result
0.287 0.892
Classification Example
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 35/48
Classification Example
Topics of Interest(t1) Sports:
football basketball
baseball swimmingtennis soccer game
(t2) Science:
scientific maths
physics computertechnology
(t3) Arts:
decorating art
painting poetrysculpture musi
Alen Computer Co. can teach you the art of programming...Technology is just a game
now...computer science for beginners
Result
0.287 0.892 0.368
Classification Example
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 36/48
Classification Example
Topics of Interest(t1) Sports:
football basketball
baseball swimmingtennis soccer game
(t2) Science:
scientific maths
physics computertechnology
(t3) Arts:
decorating art
painting poetrysculpture musi
Alen Computer Co. can teach you the art of programming...Technology is just a game
now...computer science for beginners
Result
0.287 0.892 0.368
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 37/48
METASEARCH RANKING
Two Ranking Approaches
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 38/48
Two Ranking Approaches
Using Initial
Scores of
Search Engines
Not Using
Initial Scores of
Search Engines
Using Initial Scores
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 39/48
Using Initial Scores
n Rasolofo et al. (2001) believe that the initial scoresof the search engines can be exploited.
n Normalization is required in order to achieve a
common measure of comparison.
n A weight factor incorporates the reliability of each
search engine. Search engines that return more
Web pages should receive higher weight. This isdue to the perception that the number of relevant
Web pages retrieved is proportional to the total
number of Web pages retrieved as relevant.
Not Using Initial Scores
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 40/48
Not Using Initial Scores
n The scores of various search engines are notcompatible and comparable even when normalized.
n Towell et al. (1995) note that the same document
receives different scores in various search engines.
n Gravano and Papakonstantinou (1998) point out
that the comparison is not feasible not even among
engines using the same ranking algorithm.n Dumais (1994) concludes that scores depend on
the document collection used by a search engine.
Aslam and Montague (2001)
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 41/48
Aslam and Montague (2001)
n Bayes-fuse uses probabilistic theory tocalculate the probability of a result to berelevant to a query.
n Borda-fuse is based on democratic voting. Itconsiders that each search engine givesvotes in the results it returns (N votes in the
first result, N-1 in the second, etc). Themetasearch engine gathers the votes and theranking is determined democratically bysumming up the votes.
Aslam and Montague (2001)
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 42/48
Aslam and Montague (2001)
n Weighted borda-fuse: weighted alternativeof borda-fuse, in which search engines are
not treated equally, but their votes are
considered with weights depending on thereliability of each search engine.
Weighted Borda-Fuse
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 43/48
Weighted Borda Fuse
n V (r i,j) = w j * (maxk(r k) - i + 1)n V(r i,j): Votes of i result of j search engine
n w j: weight of j search engine (set by user)
n maxk(r k) : maximum number of results
n Example:
2345SE1:
345SE2:
12345SE3:
W1=7
14212835
304050
510152025
W2=10
W3=5
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 44/48
CONCLUSION – FUTURE WORK
Conclusion
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 45/48
n We presented Captain Nemo, a fully-functional metasearch engine with personalsearch spaces.
n Users can define their personal retrievalmodel, presentation style and topics of interest.
n Captain Nemo recommends a relevant topicof interest to classify each result, exploitingNearest-Neighbour classification techniques.
Future Work
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 46/48
n To replace the flat model of topics of interestby a hierarchy of topics in the spirit of Kunz
and Botsch (2002).
n To improve the classification process,
exploiting background knowledge in the form
of ontologies (Bloehdorn & Hotho, 2004).
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 47/48
Captain Nemo
http://www.dblab.ntua.gr/~stef/nemo
Links
8/23/2019 84079707-nemo-ppt (1)
http://slidepdf.com/reader/full/84079707-nemo-ppt-1 48/48
IntroductionIntroduction to Captain Nemo
Related work
Captain Nemo: Personal Retrieval Model
Captain Nemo: Personal Presentation Style
Captain Nemo: Topics of Personal Interest
Metasearch Ranking
Conclusion – Future Work