site explorer server: an integrated, client-server, query system for web sites
Post on 15-Jan-2016
40 Views
Preview:
DESCRIPTION
TRANSCRIPT
Site Explorer Server: an integrated, client-server, query system for Web sites
Giancarlo Bongiovanni, Flavio Fontana, Stefano Borghetti
Dept. Of Computer Science, University of Rome, “La Sapienza”ENEA’s Usability Lab
Site Explorer Server: an integrated, client-server, query system for Web sites
Giancarlo Bongiovanni, Flavio Fontana, Stefano Borghetti
Dept. Of Computer Science, University of Rome, “La Sapienza”ENEA’s Usability Lab
Summary:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Summary:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Internet is the biggest and the most widespread network
Millions of heterogeneous users
Billions of information sources provided by Web
Exponantial increasing of Web site count
Increasing of network access by end users
Increasing of web browser funtionalities
Increasing of search engines performs
33 millions in the United States, 1 million and 300 thousand in Germany, 371 thousand in Italy
The users that use Internet since more than 3 years are only the 11%
Information Search in Internet
158 Milions of accesses in Junary
‘99
A forecast of 200 millions in
2000Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Users problems related to information search:•Many users don’t know the Web information model•Users have problems to find a valid tools able to locate the relevant information•Users have problems to describe searched information using right and concise terms•Users have problems to use advanced search tools (i.e. Site Explorer Server is more difficult to use rather than browser)
Il problema della ricerca delle informazioni sul Web
Issue: Information search in Internet could be a problem for particular type of users?
Issue: Information search in Internet could be a problem for particular type of users?
Today a better scenarioIndex:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
SiteExplorer
v1.1IR
Implementation of a Client/Server tools able to make Web IR using Java, experimented and tested ENEA
Implementation of a Client/Server tools able to make Web IR using Java, experimented and tested ENEA
Tool integrated with browser
Analisi dei requisiti dell’utente
Network service
New search and exploration tools
New and alternative Web approach to traditional browser
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Information Retrieval SystemsStruttura generale
Gerard Salton, Introduction to modern
information retrieval, Ed. 1983, McGraw-Hill, Inc.
User
Documents
Data structure in
pre-defindedlanguage
ResultQuery
Similar
Indexing
•Query formulation by user
•Indexing process
•Result formulation
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Information Retrieval SystemsFormulazione della richiesta
Query formulation is a list of terms able to express and summarize the searched argument
Query formulation is a list of terms able to express and summarize the searched argument
Boolean Systems combine the terms using boolean operators:•and•or•andnot
Extended boolean systems use additional operators:•nearness of terms•cutting of terms•search using particular field
In Ranking systems query formulation is made using natural language phrases
Op
era
tori
bo
ole
an
iInformation and retrievalInformation or retrievalInformation andnot retrieval
Examples:O
pe
rato
ri e
ste
siR
an
kin
g
Information adj retrievalInform*Information [in titolo]
Examples:
Examples:“Uman influence in Information Retrieval systems
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Indexing is a process to analyse documents and to provide a short contents rapresentation.
Indexing is a process to analyse documents and to provide a short contents rapresentation.
Data structure to contains document rapresentation
A file where every record describe the releted record with each particular term
Information Retrieval SystemsIndicizzazione
Rapresentation is based on a keyword vector. These keywords are choosen by a manual process or are extracted by an authomatic process
<information, retrieval, data-strucuture, alghoritms>
Example:“Information Retrieval Data Structure &
Algorithms”
Example:List, tree, index file, etc.
Te
rms
vect
or
Da
ta s
tru
ctu
res
Example:
Ive
rte
d in
de
xin
g
Doc. 1 Doc. 2 Doc.3Information 1 0 1
Retrieval 1 1 1
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
In traditional IRS the result is a potential relevant document list
In traditional IRS the result is a potential relevant document list
Explicit measure of relevance level (score)
Information Retrieval SystemsFormulazione e presentazione del risultato
William B. Frakes, Ricardo Baeza-
Yates, Information Retrieval Data
Structure & Algorithms, Ed.
1992, Prentice Hall, Inc.
Gerard Salton, Introduction to
modern information retrieval, Ed. 1983, McGraw-Hill, Inc.
Documents ordinated by relevance levelR
esu
ls o
rde
r
Dynamic presentation (results manipulation)
Graphic and direct method presentations
Use of windows (different way to present the results)
Multimedia integration
Ne
w f
ea
ture
s
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Information Retrieval SystemsCalcolo dello score
Compute of a term weght for a documentTerm frequence in the document * term relevance weigth in the collection
Key point in score compute
Compute the score:•Boolean system: use SOP method•Ranking system: use particular formula.
Information Retrieval SystemsCalcolo dello score
Score compute is focused to measure the relevance of specific terms in specific documents
Score compute is focused to measure the relevance of specific terms in specific documents
A method to weight the term relevance in the whole document collection
Frequence normalization for particular document collection
Example:1log2
ii n
NIDF (Sparck Jones, 1972)
kkk noisetotfreqsignal 2log (Dennis, 1967)
avgsimavgsimdiscvalue kk
j
ijij freq
freqKKcfreq
max1
Example:
j
ijij length
freqnfreq
2
2
log
1log
(Croft, 1983)
(Harman, 1986)
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
I motori di ricerca
Web interface(Query and results) Index DB
Authomatic indexing system
SIMILAR
Webpages
New functionality in the most popular search engine:Sites classificationIntegration of new advanced search services to search information in particular format (picture, sounds, MP3, e-mail etc.)not much search engines provide a document scoreMigration from search service to on-line seller guides
Media Matrix - June 1999
S. Engine Pos.Yahoo 1Exite 8Lycos 9Altavista 19About 21HotBot 23Looksmart 25GoTo 32
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
23%
19%55%
3%
Europe
Asia-Pacific
North America
Others
Source: FIND/ITPD, III, Gennaio 1999 - NII project, supported by DOIT, MOEA
InternetDa trent’anni ad oggi
30 years
1969
Firs
t tra
nsm
issi
on o
n
ARPA
NET
1978
uffic
ializ
atio
n of
TCP/
IP 1991
Wor
ld W
ide
Web 1992
ISO
C1983
NSF
Net
1999
Inet
’99
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
30 years
1969
Firs
t tra
nsm
issi
on o
n
ARPA
NET
1978
uffic
ializ
atio
n of
TCP/
IP 1991
Wor
ld W
ide
Web 1992
ISO
C1983
NSF
Net
1999
Inet
’99
11%
30%
31%
28%
more 3 years
0.5 years
1-3 years
0,5-1 years
Source: FIND/ITPD, III, Gennaio 1999 - NII project, supported by DOIT, MOEA
InternetDa trent’anni ad oggi
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
50,7%
28,8%
16,8%
1,6%
0,9%
0,7%
0,5%
0,0% 10,0% 20,0% 30,0% 40,0% 50,0% 60,0%
Casa
Scuola
Uff icio
Computer Education Class
Postazioni pubbliche
Computer portatile
Altri
Source: FIND/ITPD, III, Gennaio 1999 - NII project, supported by DOIT, MOEA
InternetDa trent’anni ad oggi
30 years
1969
Firs
t tra
nsm
issi
on o
n
ARPA
NET
1978
uffic
ializ
atio
n of
TCP/
IP 1991
Wor
ld W
ide
Web 1992
ISO
C1983
NSF
Net
1999
Inet
’99
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Source: FIND/ITPD, III, Gennaio 1999 - NII project, supported by DOIT, MOEA
21%
39%
26%
9%
5%
0% 10% 20% 30% 40% 50%
Una
Due
Tre
Quattro
Cinque o più
InternetDa trent’anni ad oggi
30 years
1969
Firs
t tra
nsm
issi
on o
n
ARPA
NET
1978
uffic
ializ
atio
n of
TCP/
IP 1991
Wor
ld W
ide
Web 1992
ISO
C1983
NSF
Net
1999
Inet
’99
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Future Tracks:
•Research and technologies•Educational•The Public Administration•E-commerce
0 100 200 300 400 500 600
1999
2000
2001
2002
2003
InternetVerso il domani
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Site Explorer Server v2.0
Java
Main features
Multithread
Object-oriented
Dynamic
Portable
Platform independence
Technologies
Applet
High functionalities for networking
Oriented to Graphic User
Interfaces implementation
Oriented to Client/Server
systems implementation
Client
Server
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Site Explorer Server v2.0Obiettivi
able to work directly on Web
able to helps the user to find interesting documents on Web
Goals - To implement a new system:
able to integrate:•search functions•alternative approach rather than browser•management functions•user position to access to the Web etherogeneous data using a unique way.
with an high usability degree
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Site Explorer Server v.2.0. AClient/Server system, implemented using Java, able to make automatic Web site analyse, and to provide, as result, the tree site structure where the root node represents the site home-page.
Site Explorer Server v.2.0. AClient/Server system, implemented using Java, able to make automatic Web site analyse, and to provide, as result, the tree site structure where the root node represents the site home-page.
•Focused on information search and retreiving by keywords search approach•an easy information-filtering service•a score computation service•user management
Site Explorer Server v2.0
INTERNET
Web
site
User
Interface
Site Explorer Server
A network service
An accessible (open to everybody) open and multi-platform service
Ad
diti
on
al
fea
ture
s
Client
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Internet
Web site#1
Web site#2
Web site #n
HTTP
SEPHTTPServer
(SES)
SEC
Browser (SEJA)
SEJAapplet
Windows
User 2
Unix
User 3
Mac-OS
User mUser 1
•Client/Server system•The Server (SES) is a Java application•The Client (SEJA) is a Java applet•SES and SEJA speak using a dedicated Application layer protocol (SEP)
Site Explorer Server v2.0Architettura esterna e configurazione
Te
chn
ica
l fe
atu
res
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Query selectorprocess
HTTP connectionprocess
Links extractionprocess
Contents extractionprocess
Keywords analisysprocess
Scoreprocess
Resultbuilder
Result-displayprocess
Web sitesUSER
Site Explorer Server v2.0Funzionamento e processi
Client user interface
Next site’s page
Query
Result
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Site Explorer Server v2.0
Main
Connection request (client)
Comunicator Function manager
RetrieverUser manager
Internet
Query (client)
Results (client)
Site analyser
Page analyser
•full-text document analyse•Links cheking using connection requests•HTML 4 oriented
Site Explorer Server v2.0Sottocomponenti del SES
Fea
ture
s
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Three score level:Level 1 score. It’s based only on the keywords items inside the Web page.
Level 2 score. It’s also based on the keywords distribution inside the whole Web site.
Level 3 score. It’s based also on the position of keywords items inside the Web page structure.
Site Explorer Server v2.0Lo score di Site Explorer Server
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Site Explorer Server v2.0Site Explorer Java Applet GUI
Tree structure area
Retrieved object in Web site
Multimedia area
Textualarea
Displayedresult
Menù-bar
Tool-bar
Stateindicator
State bar
Site Explorer Server v2.0Site Explorer Java Applet GUI
Site Explorer Server v2.0Site exploration
Con
ness
ione
al s
erve
r
Site Explorer Server v2.0Site exploration
Indicatore di connessine
attiva
Site Explorer Server v2.0Site exploration
New
site
ana
lyse
req
uest
Site Explorer Server v2.0Site exploration
Use
of
a fa
vorit
e si
te a
naly
se r
eque
st
Site Explorer Server v2.0Site exploration
Use
of
a pr
e-de
fined
site
ana
lyse
re
ques
t
Site Explorer Server v2.0Site exploration
Rec
eivi
ng r
esul
t
Site Explorer Server v2.0Site exploration
Res
ults
nav
igat
ion
Relevat page
indicator
Score level
Site Explorer Server v2.0Site exploration
Res
ults
bro
wsi
ng
Site Explorer Server v2.0Site exploration
Site Explorer Server v2.0Il pilot-center
Lo Usability Lab (Ulab), istituito nel 1992 presso il pilot-center del progetto ESPRIT III VENUS e svolge un’attività di Ricerca & Sviluppo nel campo delle interfacce visuali avanzate a basi di dati e sistemi
informativi multimediali in rete.
Lo Usability Lab (Ulab), istituito nel 1992 presso il pilot-center del progetto ESPRIT III VENUS e svolge un’attività di Ricerca & Sviluppo nel campo delle interfacce visuali avanzate a basi di dati e sistemi
informativi multimediali in rete.
Macchine di sviluppo e test:Intel Pentium II 350Mhz / Windows 98 (Netlab)Intel Pentium MMX 166Mhz / Windows 95 (Fontanaulab)AMD K6 300Mhz/ Windows 98 (Ulab)Sun Sparc Station 5 / Unix Solaris 2.5 (Venus)Sun Sparc Station 10 / Unix Solaris 2.5 (Dafne)
Strumenti software:JDK v1.1.6, JDK v1.1.7, JDK v1.1.7a, JDK v1.17b, JDK 1.1.8Edit+, NetbeansJava Swing v1.0.3, Java Media Framework v1.1
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
•A strong system•good/exellent usability degree•A good response time (Analyse and result build)
8,9
9,4
8,4
7,9
8,7
7,6
9,1
0,0 2,0 4,0 6,0 8,0 10,0
Connection
Query
Tree
Textual contents
Multimedia contents
Icons
Brow sing
Sit
e E
xp
lore
r S
erv
er
fun
ctio
na
lity
General user satisfaction degree
Site Explorer Server v2.0Conclusion and experimental results
50 users selected using ENEA/VENUS methodology:random user. Occassional system use.Professional users: System user related to their work.Expert user.
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Site Explorer Server v2.0ENEA applications
G7 Global-Inventory projectA project data card collection•Site search engine vs Site Explorer Server
G7 Global-Inventory projectA project data card collection•Site search engine vs Site Explorer Server
Plus - Prosoma LinkUp ServiceA multimedia data card collection
Plus - Prosoma LinkUp ServiceA multimedia data card collection
Experimental sites:ULAB sites
Experimental sites:ULAB sites
Future testing:•Virtual Lab Site•FAD
Future testing:•Virtual Lab Site•FAD
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Site Explorer Server v2.0e altri sistemi esistenti
LinkBot - Analisi dei link
Site Explorer - Costruzione di un albero per un singolo sitoSurfMapJavaNavigator
PersonalSearch: applet come motore di ricerca per un sitoVirgilio - Funzione di ricerca su un sito
MerzeScope: applet di navigazione su un grafo con funzione di ricerca per un solo sito
Esp
lora
zio
ne
d
ei l
ink
Ap
ple
t p
er
na
vig
azi
on
e
su m
ap
pa
Ric
erc
a s
u
un
sito
Na
vig
azi
on
e
su m
ap
pa
e
fun
zio
ne
di
rice
rca
HyperSystem Net40 - esplora un sito e ne da una rappresentazione ad albero permettendo la navigazione
Esp
lora
zio
ne
e
rap
pre
sen
tazi
on
e d
i un
sito
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
The insertion of a new system agent able to make automatic off-line Web site analysis to suggest to the user, using his profile information, a set of query about specific themes.
Site Explorer Server v2.0Future works
A totally modular internal architecture to be able to add new modules and news functions in the simplest and most dynamic way.
The implementation of a user profile system based on the user’s interests constantly updateable by a feed-back technique.
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
Index:•Introduction•Information Retrieval Systems and keyword score•Search engines•Internet now and the future•Java•Site Explorer Server v2.0•Conclusion and experimental results•Future works
top related