brute force web search for wireless devices using mobile agents
TRANSCRIPT
The Journal of Systems and Software 69 (2004) 195–206
www.elsevier.com/locate/jss
Brute force web search for wireless devices using mobile agents
Konstantinos G. Zerfiridis, Helen D. Karatza *
Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
Received 24 November 2001; received in revised form 19 March 2002; accepted 5 May 2002
Abstract
Web based search engines have been with us for a long time now. They proved to be an irreplaceable tool for researchers and
Internet users all over the world. The exponential growth of the Internet has disclosed great challenges to these engines, as it is hard
to maintain an accurate database of numerous web pages over time. This problem becomes wearisome, as it is often necessary to
browse through several results before locating a web page that matches the given query. As today mobile devices are able to connect
to the Internet through high-cost low-bandwidth wireless networks, this tactic can become very expensive. Motivated by these issues,
we designed and implemented SearchSweep, a mobile agent based client-server system that uses existing search engine systems on
the web to locate and download web pages. A refinement system on the server makes this solution ideal for mobile users, or users
with limited bandwidth. The structure of SearchSweep platform is presented and the use of mobile agents on wireless devices is
proposed as a way of attacking their limitations.
� 2003 Elsevier Inc. All rights reserved.
Keywords: Mobile agents; Wireless networks; Document retrieval
1. Introduction
Ever since they were created, search engines are used
to locate URLs on the web, according to the given
keywords. They have been a useful tool to researchers
all over the world, as the exponential growth of the
Internet became overwhelming. But along with the
evolution of the net, certain problems arise. Namely, agood percentage of the URLs returned from the search
engines are either non-existing or they no longer carry
the required content. Searching the Internet can become
a costly and time consuming task because most wireless
devices connect to the Internet through a high-cost low-
bandwidth connection.
Typically web based search engines (Kamei et al.,
1997) use a web spider, which constantly collects new orupdated web pages, and a database in which the infor-
mation of the web page information (URL, words,
links, date, etc.) are archived. Most of these search en-
gines use a combination of keywords and boolean op-
*Corresponding author. Tel.: +30-2310-997974; fax: +30-2310-
998310.
E-mail addresses: [email protected] (K.G. Zerfiridis), karatza@csd.
auth.gr (H.D. Karatza).
0164-1212/$ - see front matter � 2003 Elsevier Inc. All rights reserved.
doi:10.1016/S0164-1212(03)00085-2
erators to locate the appropriate web pages in their
database. This carries some drawbacks. For example,
there is no way to specify the importance of one key-
word over another using boolean logic. Furthermore,
because of the heterogeneity of the syntax used in the
web search engines, the same query may produce dif-
ferent results in different search engines. Therefore, the
user is forced to be familiar with each one.When looking for certain content on the Internet, it is
often necessary to using a search engine. Today there is
a variety of search engines on the net, each one utilizing
different ways of indexing, retrieving and searching for
content. However, the rapid evolution of the Internet
renders a great percentage of a search engine’s database
invalid as outdated web pages are often removed or
changed. Additionally, the overwhelming expansion ofthe net forces them to index more content in less time.
Therefore, increasing the size of the database often su-
persedes the need of keeping it updated. This creates a
lot of problems as most of the times it is necessary to go
through a lot of web pages that a search engine suggests,
just to find out that most of them no longer carry the
required content.
One solution is to refine the results by downloadingthe web pages and verifying the search query over each
196 K.G. Zerfiridis, H.D. Karatza / The Journal of Systems and Software 69 (2004) 195–206
one. But to our knowledge there is no search engine that
does that, because such a task would greatly diminish
the bandwidth, the processing power, and therefore the
quality of service of the search engines. There are cur-
rently many applications available (for example Coper-
nic, BullsEye) that do this task. The downside is thatthese applications must run on the user’s computer,
causing in many cases resource depletion. A simple
query search using these applications could cause en-
ough traffic to clog a dial-up connection for several
minutes. Thus, using such tools with wireless devices
could become very expensive.
We propose the use of mobile agents for document
retrieval as a middleware between the web search en-gines and the wireless users. We implemented Search-
Sweep, a client-server platform which utilizes mobile
agents’ inherent ability of state and code migration. An
advantage of this approach is that the user does not
need to be connected while the search is under way, and
the total amount of communication is only a fraction of
the alternative. To our knowledge nothing similar has
ever been implemented.The structure of this paper is as follows. Section 2
examines some of the techniques used by search engines,
their advantages and disadvantages, and the limitations
of the wireless networks. In Section 3 basic concepts of
mobile agents are briefly reviewed and the structure of
SearchSweep is discussed. Section 4 shows the benefits
of using such a platform in comparison to existing ap-
proaches by means of quantitative experiments. Aqualitative report of the results is shown in Section 5.
Section 6 briefly conceptualizes on the advantages of
using mobile agents on a wireless network and on a
cluster of servers, and presents ongoing and future
work. Finally, Section 7 summarizes the paper.
2. The problem
2.1. Search engines
To find out why search engines have such erratic
behavior, the way they work should be examined. There
are currently numerous engines each designed for a
specific purpose and each having its advantages and
disadvantages. They utilize several technologies in orderto locate the required content. As different engines uses
different strategies, their results often deviate widely. As
a result, it is often necessary to use more than one in
order to locate a satisfactory page.
An approach towards unifying several search engines
to one are the meta-search engines (Chignell et al., 1999).
These engines retrieve the results from several web based
search engines when a single query is given. But thismethod has the disadvantage that it may return the same
results over and over again from each one of the search
engines. Even in the case where a refinement is done by
the meta-search engine in order to prevent duplicate hits,
the results may not contain the required content.
Many search engines organize the web pages in self-
organizing maps (Ritter and Kohonen, 1989; Kohonen,
1998) in order to save space and therefore speed up thesearch process. SOMs are neural networks that are able
to organize the information according to relevance. This
has the advantage that search engines can find web
pages relevant to the subject of the given query, but on
the downside, many of the results may not contain the
required keywords.
Most search engines remove the articles, the prepo-
sitions, or even some clauses from the retrieved webpages before being processed into the database, as their
participation in a query is possibly insignificant. This
has the disadvantage that an ‘‘exact phrase’’ search will
not always match the given query.
A great amount of the search engines use suffix-
striping algorithms on the words of the retrieved web
pages before processing them further. This is also
known as stemming (Frakes and Beaza-Yates, 1992;Porter, 1980), and by using it, a smaller keyword dic-
tionary is produced. This results in faster searching. An
additional benefit of such algorithms is that the search
engine may return more pages relevant to the query. But
in the case that only pages that contain the exact key-
word are needed, this technique might produce un-
wanted results.
Search engines are constantly retrieving content fromnew URLs because their aim is to include as much
content as possible. Most of these engines implement
algorithms for repeated retrievals of the same URL in
order to determine how often it is being updated. This
way, search engines can have a relatively current state of
web pages that are updated every hour, every day, every
month, or remain the same over time. As those algo-
rithms cannot predict with accuracy when a page isgoing to be updated, there is always a possibility that the
search engine may return addresses that no longer carry
the required content. Furthermore, these algorithms
cannot efficiently account for removal of web pages that
are not updated at all. That is because when the algo-
rithm determines that a web page does not change over
time, it gives a low priority for rechecking for updates at
this address. Therefore, even if this page is removedfrom the server, the web engine will include it in the
results. The ‘‘non-existing URL’’ problem can also
occur if part of the network that connects the target
server with the rest of the Internet is temporarily down.
Several web based search engines emerged that can
receive a query from a user in the form of a question,
and derive URLs that possibly answer the users ques-
tion. Such search engines use sophisticated A.I. algo-rithms and other search engines, inheriting as a result
their disadvantages.
K.G. Zerfiridis, H.D. Karatza / The Journal of Systems and Software 69 (2004) 195–206 197
Additionally, because of the quality-for-speed trade-
offs often made, the results of web search engines can
become highly unstable. Search engines have proved to
have radical behavior with regard to their results. De-
pending on the number of matches, they can act quite
differently.When a query comprised of a single keyword was
send to seven of the most well known search engines, a
great amount of hits were produced. Each engine pro-
duced thousands of results. As shown in Fig. 1, by
examining the first hundred results of each query––ac-
cording to each engine’s sorting––it is derived that the
valid pages could amount in some cases to just 5% of the
total pages examined. In the examination process, eachURL was downloaded and labeled: (a) valid if it con-
tained the requested keyword, (b) duplicate if it had the
exact same content as another URL that was labeled
valid, (c) invalid if the keyword did not appear in the
URL’s content, and (d) unreachable if the retriever was
unable to download it.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Engine 1 Engine 2 Engine 3 Engine 4
Fig. 1. Amount and quality of results for
0.00
2.00
4.00
6.00
8.00
10.00
12.00
Engine1 Engine2 Engine3 En
Fig. 2. Amount and quality of results
This was not the case when more complex queries
were used (Fig. 2). For this experiment, queries of sev-
eral irrelevant keywords were submitted in order to
produce limited results. The same set of queries was used
for all the engines. In this case, invalid pages were
practically non-existent, but duplicate and unreachablepages accounted in some cases for half the hits.
Additionally, Selberg and Etzioni (2000) shows that
the results of the search engines may change significantly
over time. It becomes therefore obvious that the com-
plex algorithms used, lead search engines to erratic be-
havior.
2.2. Wireless networks
During the recent years, wireless communications
enjoyed thriving success. Today wireless telephones are
widespread, helping people communicate easier than
ever. The ascending popularity of portable devices
moved the attention from phones to communications in
Engine 5 Engine 6 Engine 7
Duplicate
Unreachable
Valid
Invalid
queries that produce several results.
gine4 Engine5 Engine6 Engine7
Duplicate
Unreachable
Valid
for queries with limited results.
198 K.G. Zerfiridis, H.D. Karatza / The Journal of Systems and Software 69 (2004) 195–206
general, in order to have instant access to information.
Modern wireless networks are designed to be highly
flexible in order to be easily deployed and accessed from
all over the world.
The success of PDAs and portable devices created the
need for wireless access to the Internet. The popularityof GSM networks for wireless voice communication,
used by today’s cellular phones, made them the obvious
solution. But having a top speed of 9600 bps, it is im-
practical and very expensive to use such networks for
Internet browsing. However, this speed is adequate for
sending and reading e-mail and viewing simple web
pages. An implementation on those characteristics is the
WAP browsers on modern cellular phones. Currently, alimited number of GSM network providers can support
speeds which in theory can reach of up to 43.2 Kbps
using HSCSD.
The successor of GSM is the GPRS (Lin et al., 2001),
which is the mobile phone system that is commercially
available since 2002. It utilizes various radio channel
coding schemes to achieve raw radio link bit rates of 171
Kbps.Another emerging wireless communication system is
Bluetooth. It connects devices at speeds up to 1 Mbps at
a maximum distance of 10 m. Its low cost and limited
power consumption makes it an ideal communication
medium for wireless devices of any kind. Although its
really short range makes it impossible for outdoor use, it
can be used to provide access to the Internet with the
help of other devices. Such is the case with the newcellular phones that supports high-speed data and
Bluetooth technology for Internet access from notebook
computers. Similar is the case for HomeRF at speeds of
1.6 Mbps (it is expected to reach 10 Mbps) and an ef-
fective range of 50 m. Bluetooth and HomeRF are fre-
quency-hopping technologies that are ideal for
streaming audio and data over home networks. They
cannot really substitute LANs, so they often coexistwith other technologies such as 802.11b. Wireless net-
works based on industry standard IEEE 802.11b sup-
port transmission raw radio link bit rates of up to 11
Mbps, while IEEE 802.11a can provide 54 Mbps. These
networks are designed to cover a relatively confined
area. It should be noted that the actual throughput of
such networks may vary widely depending on any
number of variables. Transfer rates of 5 Mbps for802.11b and 25 Mbps for 802.11a are quite common for
distances of 100 m between the sender and the receiver.
Although there are currently many wireless local area
network (LAN) and metropolitan area network (MAN)
solutions providing us with a range of bandwidths, the
most widespread use of digital wireless networks is ex-
pected to come from the GSM or GPRS network. They
provide wide area coverage and require relatively lowpower consumption. Their bandwidth is sufficient for a
number of applications, so it is the obvious choice for
the mobile user. Their major disadvantages are high cost
of use and low transfer rates. Using such devices to lo-
cate content on the net can prove to be very expensive.
3. The solution
Software agents are programs that act on behalf of
people. They are able to perform specified tasks that are
assigned to them and they can accomplish that with or
without the supervision of the user, according to the
requirements of the given job.
Mobile agents have an additional property (Chess
et al., 1995a). The ability to transport themselves ondifferent systems after being executed, carrying with
them their program code, current state of execution and
any data that they obtained. This gives them the unique
capacity of living on a distributed network rather than
on a distant stationary system, and to take advantage of
the services that each host has to offer locally. Fur-
thermore, mobile agents allow proprietary code to be
used on the hosts, allowing complete customization ofthe retrieved results. The hosts should implement a
specified environment that can authenticate the origin
and credentials of the arriving mobile agents, provide
for them the necessary execution machine and limit their
access to system resources (Chess et al., 1995b).
Aglets Workbench (IBM, 1997; Lange and Oshima,
1998) is a framework developed by IBM Japan research
group for programming and deploying mobile agents.‘‘Aglets’’ are Java objects that can move from one sys-
tem on the Internet to another autonomously. Although
they can carry an itinerary, aglets can change it dy-
namically as they roam the Internet. They can transport
themselves, spawn new aglets, interact with other aglets
on the same or a distant context or even clone them-
selves. Implemented on Java, they have inherited the
property of being able to exist on heterogeneous net-works. This makes them ideal for flexible client-server
solutions, because in most cases clients are thinner than
the servers. Such is the case of wireless devises.
Furthermore, the Aglet’s platform is implemented to
use OMG’s MASIF (Mobile Agent System Interoper-
ability Facility) interface, allowing them to interoperate
with different agent systems. Additionally, Aglets can
use IBM’s JKQML (Java Knowledge Query and Ma-nipulation Language) for developing intelligent mobile
agents.
Although the Aglet platform is not the most opti-
mized for performance (Silva et al., 2000), it recently
became an open source project when IBM released the
source code. This makes it the most likely platform to be
adopted by a great number of mobile devices. Further-
more, its evolution to a faster platform with even morefeatures has already begun. The SearchSweep platform
described in this paper is a versatile and expandable
Fig. 3. Searcher’s graphic interface.
K.G. Zerfiridis, H.D. Karatza / The Journal of Systems and Software 69 (2004) 195–206 199
project; therefore it has a lot to benefit from the po-
tentials of this developing technology. SearchSweep
makes use of the policy based security system of the
Aglets platform and benefits from its widespread use on
heterogeneous systems.
The Aglets platform is written entirely with Java,therefore it can be run on any system that incorporates
Java 2 Standard Edition. Currently there are many
personal digital assistants that include the full version of
J2SE. Additionally there is a Java 2 Micro Edition
version of Aglets under development as there are many
mobile phones that have embedded J2ME.
3.1. SearchSweep structure
SearchSweep is designed using mobile agent tech-
nology. This gives it a straightforward structure, which
is easy to use and extend. From the developers point of
view, the major advantage that the mobile agent plat-
form provides is the modularity of each subsystem. The
messaging capabilities of agents, and their inherent
ability to dispatch their code to a different system,makes such solutions ideal for systems that need to be
highly scalable.
Mobile agents as a middleware can be used in a
variety of ways. The SearchSweep platform uses the
client-server approach. The client is responsible for in-
stantiating a Searcher mobile agent. This agent has the
inherent abilities to utilize certain services if provided by
the current context. Initially, the agent presumably ex-ists on a mobile device with limited resources. In our
implementation, the Searcher agent implements a simple
graphical interface (Fig. 3). Through that, the user is
able to configure the agent properly and to send it to a
server.
As a client cannot always know what services can be
provided by the server, the Searcher agent can create a
BotProbe agent, which is immediately dispatched to theserver where it queries for available services. Afterwards
it sends a report back to the agent that created it and
dies. This is depicted in Fig. 4. After the desired services
are selected and the appropriate keywords are supplied,
the Searcher agent is submitted to the server.
The server should have appropriate resources and
infrastructure to support multiple Searcher agents. In
this case, aManager agent is responsible for starting and
Searcher1
3
Client
Fig. 4. The searcher aglet creates and dispatches the BotProb aglet to the
necessary information (2) and sends a message with that information back t
maintaining a number of Retriever agents. Retrievers are
the only agents that have access to the Internet and their
job is to download from the web any given page. TheManager, which is configurable through its own inter-
face (Fig. 5), is able to create more Retrievers if there is a
high demand, or destroy some of them if they are no
longer needed in order to release system resources. Fig. 6
shows that the server’s context contains five free Re-
trievers. The Manager agent also maintains a cache of
the retrieved web pages on the local file system for future
use. This proved to speed up the process of retrievingweb pages, when similar requests were made.
On the server side, a Search Engine Registry agent
exist which is in charge of subscribing new SearchBot
agents to the system. SearchBots are agents, which have
the necessary knowledge to access a web based search
engine, and parse their results. This approach gives to
the system the flexibility to accept new services on the
fly. If a SearchBot is send to the server that contains theSearch Engine Registry agent, it will be verified and
subscribed so that future Searcher agents can use it.
When the Searcher agent reaches the server, it com-
municates with the SearchBots to negotiate a proper
query for each of the search engines. Afterwards, it re-
quests from the Manager to retrieve the results from the
Bot Prob Search Engine Registry2
Server
server (1), where it locates the Search Engine Registry, retrieves the
o the searcher aglet (3).
Fig. 5. The Manager can be monitored and configured via its own
graphical interface.
200 K.G. Zerfiridis, H.D. Karatza / The Journal of Systems and Software 69 (2004) 195–206
appropriate web sites. When the results arrive, the
Searcher parses through them and eliminates the du-
plicate URLs. It sends to the Manager a request to re-
trieve the results and when all of them are available, itvalidates them by scanning through each retrieved web
page to verify the existence (or absence) of the keywords
(Fig. 7). Searcher repeats this process as many times as
necessary to meet the criteria given by the user.
Retrieve Result PageMore PagesRequested
Fig. 7. Verification process of the m
Fig. 6. Aglet’s contex
It then starts the refinement process. Initially the
agent strips the unnecessary html tags, and creates a
signature file from the text of each downloaded page.
These signature files are compared so that the duplicate
or almost identical pages can be located. In that case,
only one is kept, and the rest of the URLs are marked asalternative sources of that page. The next step is to sort
the retrieved pages according to a marking scheme. The
marking scheme used by the Searcher takes into account
two parameters: how many times each keyword is found
in a page, and how many times this page is referenced by
the other pages that where downloaded by this Searcher.
These two parameters have equal weight. Thus, the
most referenced page with the least keyword appea-rance, scores the same as the page which has not
been referenced at all but has the most keyword ap-
pearance.
When the Searcher retrieves the requested amount of
relevant pages, it is able to automatically return to the
device from where it was dispatched. This process is
depicted in Fig. 8. Alternatively, it may stay dormant on
the server until the user manually retracts it. This mightprove useful for users that cannot stay on-line for the
duration of this process. Although the Searcher agent is
carrying back the web addresses that meet the given
Retrieve Results Evaluate Results
obile Searcher at the server.
t at the server.
Mobile User Search Sweep Server
Internet
Web based search engines
12
4 3
Fig. 8. The searcher aglet dispatches from the wireless device to the server (1), the server retrieves the requested results from the search engines (2), the
searcher requests from the server to download the appropriate web pages (3), they are evaluated, and the searcher traverses to the wireless device with
the results (4).
Fig. 9. SearchSweep Micro Browser.
K.G. Zerfiridis, H.D. Karatza / The Journal of Systems and Software 69 (2004) 195–206 201
criteria, it can also carry properly refined versions of the
top rated web pages for viewing on mobile devices (Fig.
9). In that case, the use of compression decreases the
amount of transferred data even further.
SearchSweep platform use the web based search en-
gines, and therefore it is subject to their shortcomings.
Nevertheless, as the Searcher agent always downloads
and verifies each page before possessing it any further, itavoids most of the common problems that search engine
have. In essence, they are used only as a first step in a
search, providing the system with possible hits. Pages
that do not contain the given keywords but are included
in a search engine’s results, either because of the algo-
rithm used (stemming, SOMS, article removal, speed-up
tradeoffs), or because they are outdated, are removed by
the SearchSweep. Searchers can combine the power ofseveral engines by querying more than one search engine
at a time. The end user does not have to be familiar with
the syntax used by each one, as the agent provides a
unified GUI. Any duplicate pages are eliminated in the
refinement process. By utilizing multiple specialized
search engines, SearchSweep can target specific areas of
interest (for example auctions) and minimize their often
observed erratic behavior.
Because the SearchSweep platform is built on mobile
agent technology, it is highly scalable. Fig. 10 shows that
new services can be added on the server context in the
form of agents. This is done easily as long as the pro-
grammer follows certain guidelines. The agent has to
subscribe its services to the server’s context, and supplyinformation about the services that it can provide when
asked. The structure of such an agent is shown below:
public class CompressionAgent extends
Aglet {
public void onCreation(Object init) {
subscribeMessage(00ServiceProvider-Agent00);
}
public boolean handleMessage(Message
msg) {
if (msg.sameKind(00getSPProxy00)) {
msg.sendReply(getProxy());
return true;
ManagerCache
Retriever Registry
Retriever #1
Retriever # n
Retriever #2 SearchBot # 1
SearchBot # 2
SearchBot # n
Search Engine Registry
Searcher
Additional ServicesCompresionRTF2TXT
Fig. 10. Diagram of the server’s context.
202 K.G. Zerfiridis, H.D. Karatza / The Journal of Systems and Software 69 (2004) 195–206
}
if (msg.sameKind(00moreInfo00)) {
Object[] o;
. . .o[0]¼ 00Compression00;msg.sendReply(o);
return true;
}
if (msg.sameKind(00. . .00)) {
. . .}
. . .return false;
}
}
4. Experimental results
The average amount of traffic that a refined search
could produce can exceed 5 MB in a matter of seconds,
Idle server
No cache & no proxy U
Dispatch to server 5.46 s 5
Dispatch to server traffic 6120 bytes 6
Average delay in queue 0 s 0
Mean queue length 0 0Mean serve time 12.87 s 1
Server Network Traffic 740,311 bytes 7
Dispatch to client 40.23 s 3
Dispatch to client traffic 47,912 bytes 4
Total client Traffic 54,032 bytes 4
Service time 58.56 s 5
depending on how many search engines are used. Al-
though this kind of search returns results with the exact
content that was requested, low-bandwidth users couldnot afford to use it. SearchSweep platform uses a high-
bandwidth server to retrieve the content, refine it and
return to the user only the highest-ranking results.
In this section experimental results of the Search-
Sweep platform are presented. The agent used requested
up to 10 results from five search engines. After the re-
fining process, it selected the five highest-ranking results,
compressed them and it returned back to the user. Theagent did not retrieve the images of the processed web
pages as this can be done directly by the user. The
keywords used were chosen in random from a properly
processed text file. The two main metrics measured were
the traffic that was produced, and the service time. The
server used was a 500 MHz PC with 256 MB RAM and
the client was a 266 MHz PC. The server had a T1
connection to the Internet, and the communication be-tween the server and the client was restricted to 9600
bps.
Saturated server
sing cache & proxy Using cache & proxy
.54 s 6.01 s
111 bytes 6134 bytes
s 2663.56 s
1992.23 s 13.42 s
27,470 bytes 723,157 bytes
6.69 s 41.21 s
3,483 bytes 49,120 bytes
9,594 bytes 55,254 bytes
4.47 s 2724.20 s
K.G. Zerfiridis, H.D. Karatza / The Journal of Systems and Software 69 (2004) 195–206 203
4.1. Experiment #1
In this experiment the SearchSweep platform was
tested without making use of the local cache or the
proxy server. Fifty tests were conducted using randomly
generated queries. In order to test the raw behavior ofthe platform, it was made certain that the server was
available at the beginning of each test.
The results show that an average of 723 KB of data
was retrieved from each test, and the total traffic be-
tween the client and the server was only 53 KB, which is
almost 14 times less than the amount of traffic that the
query produced at the server. The total service time was
less than one minute. However, if we were to download723 KB from a 9.6 Kbps connection we would have to
be on-line for more than 10 min.
This experiment shows the definite advantage of the
SearchSweep platform for wireless devices with limited
bandwidth. Although our assumption that the server is
always free, is not realistic, the results come to show that
the platform works and has practical value on high
priced wireless connection.
4.2. Experiment #2
For this experiment SearchSweep’s cache and the
LAN’s proxy server were used. Again, all the tests
conducted on the server when it was free. This experi-
ment provided us with information on the usefulness of
the local cache and speed-up or slowdown of the serverdue to the use of the proxy.
After 50 tests, the cache file reached 40 MB. A point
that should be noted here is that although there were
1739 new entries to the cache file, there was not even one
correct hit. Therefore, we came to the conclusion that
the use of a local cache is not necessary and that it even
slowed down the server in some cases. The uses of the
LAN’s proxy did not seem to slow down the perfor-mance of the platform. The response time of the plat-
form to this set of tests was slightly faster, but this can
be attributed to the fact that the total amount of data
retrieved was less than the amount in the first experi-
ment.
4.3. Experiment #3
For the final experiment it was made sure that the
server was busy all the time. It was found that the per-
formance of the current Aglet platform had an adequate
response when up to 200 agents were waiting idle and
the rest of the system worked to serve the agents that
came in first. However there was a significant slowdown
after an undefined threshold. The system became un-
stable and a reliable server with more than 400 agentscould not be maintained.
To test the system when it had a significant amount of
agent waiting to be served, the queue was limited to 200
entries and populated with agents. Then we started
sending a new agent when each agent returned. We
started taking measurements when the last agent of the
initial 200 returned. This was done in order to take anaccurate measurement for the average delay in queue.
The results show that on a busy server, the Search-
Sweep platform can be up to 5 times slower than the
alternative. Downloading 706 KB over a 9600 Kbps
connection can take 10 min, and in this experiment the
average wait in queue was 44.39 min. Nevertheless,
mobile agents are designed for disconnected operation.
Therefore a user can submit an agent, disconnect andreconnect to retrieve it, reducing the total on-line time to
the level of the first experiment. Additionally, modern
wireless networks like GPRS are charging the content
and not the on-line time. The total traffic in this exper-
iment between the client and the server was 15 times
smaller that the network traffic produced by the server.
5. Qualitative report
Although the SearchSweep platform was designed to
alleviate the mobile users from the painstaking and ex-
pensive task of searching through several web pages to
find the required content, certain intelligence was build
into the system. As described previously, the dispatched
agent can evaluate each page locally at the server andcarry back only a selected number of results. The mobile
agent may use any number of algorithms on the set of
pages that were retrieved by the server as an evaluation
of their quality. In order to rate the quality of the re-
turned results by the Searcher agent we conducted a
number of experiments.
We used four queries that produced different amount
of results on the web search engines. This was done sothat we can get a clear picture of how the platform be-
haves under different circumstances. The first query re-
trieved pages about finding accommodation in Athens,
and produced the most hits. The second query aimed in
finding web pages which contained drivers for an ob-
solete video card, and the third was about finding re-
ports on the adaptation to the new euro currency. The
forth query, that produced the least hits, retrieved pagesabout a very specific custom made cell phone accessory.
Additionally we varied the depth of the search by
increasing each time the number of engines used and the
amount of hits that the system requested from each one.
Although the Searcher agent returned a set of URLs, we
only evaluated the first five pages that the agent com-
pressed and brought back. If the page was irrelevant to
the subject of the given query, it was given no marks atall. The top score for each web page was 20, and
therefore in theory a query can score up to 100. The
204 K.G. Zerfiridis, H.D. Karatza / The Journal of Systems and Software 69 (2004) 195–206
results were judged according to the quantity, quality
and relevance of information that they contained.
Hits per engine (enginesused)
Averagehits
10 (3) 30 (5) 100 (7)
Query 1 22 57 75 43,000
Query 2 31 46 63 5000
Query 3 25 64 60 1000
Query 4 54 85 85 200
The test results show that as the depth of the search is
getting deeper, the agent returns results that more clo-
sely match our criteria. The only exception was with thethird query where the last test scored poorly in com-
parison to the second test. We traced this exception and
found that it was the ‘‘most-referenced’’ scheme used by
the agent that was responsible. Although this scheme
did not have any effect at the third test fn the last query
as well, it produced favorable results at all tests of the
first two queries.
Overall, the agent returned at least one page con-taining sufficient information about the requested query
in each case. It should be noted that third set of tests was
time consuming. In order to prevent flooding the server
with such queries, it is recommended not to request
more than thirty hits per engine, as the results of the
second set of tests were satisfactory in all cases.
public class SearcherExample extends Searcherpublic void setResults(Vector results){
// Called before preparing the Ve}public Vector getResults(){
…}public void setSearchParameters(String[
String[] shouldNotIncludePhrase…
}public boolean Verify(String s){
// Verifies that s fulfills the search}public void dispatchToManager(){
// called before the agent is dispa}public void dispatchToHome(){
// called before the agent is dispa}public void showGUI(){
// creates a transient interface.}public void arrivedHome() {
// called when the agent arrives t}
}
Fig. 11. Example of creating a new mobile SearchS
6. Future work
Because the SearchSweep platform was implemented
using mobile agents, it is highly scalable. The current
version runs the server on one machine. But agents have
a number of properties that make them ideal for bal-ancing the load of an application to several servers.
Thus, when for example the Manager agent reaches a
certain threshold of requests that are waiting in the
queue, it can clone itself, and send its clone to an
available server to continue in parallel. A version of
SearchSweep implemented on a heterogeneous cluster of
eight interconnected servers is under way. It uses Epoch
Load Sharing (Karatza and Hilzer, 2001) with smallepoch size, as a prior knowledge of the agent’s execution
time is not known. This is expected to alleviate to a
certain degree of scalability issues raised in Section 4.
We are also working towards creating a version that will
be able to handle disconnected operation, increased
queue length and smaller service time.
Systems such as SearchSweep are ideal for Internet
Service Providers that are giving access to users withwireless devices. An extension to the platform, which
logs resource usage for each user’s agent, could be easily
implemented. This way a billing system can be installed
and the platform could be used commercially. Vendors
could also design proprietary Searcher agents as shown
in Fig. 11, and implement custom functions such as
charging techniques.
{
ctor for dispatching
] shouldIncludePhrase,, Integer pages){
parameters
tched to the manager.
tched back to home
o the device that dispatched it
weep searcher, by inheriting from Searcher.
public class ExampleBot extends SearchBot{public String getSearchURL(String[] shouldIncludePhrase,
String[] shouldNotIncludePhrase, Integer pages){// Creates a proper query URL, understood by// this search engine
}
public Object[] parseResults(String wpage){Boolean morePages;Vector URLs=new Vector();// Given the html source of a web page, it// determines which URLs are the results, and if// there are more results availableObject[] r={URLs,morePages};return r;
}}
Fig. 12. Example of creating a Search Engine Broaker, by inheriting from SearchBot.
K.G. Zerfiridis, H.D. Karatza / The Journal of Systems and Software 69 (2004) 195–206 205
The SearchSweep platform can be used in the
emerging mobile e-commerce (or m-commerce) market
to provide highly accurate content to wireless devices.
We believe that it can greatly assist the development of
more mobile agent solutions for the mobile users.
SearchSweep is designed for mobile devices as theysuffer from limited network bandwidth and processing
power, but in order to be used in m-commerce some
security issues must be resolved.
As shown in Fig. 12, new SearchBots can be easily
implemented and incorporated in the SearchSweep
platform at runtime. A future addition to the platform
would be to create communities of SearchBots accord-
ing to their relevance. For example, SearchBots thatrepresent auction search engines could be grouped to-
gether to provide unified services. Moreover, the
Searcher agent is a Java class which can be easily
overridden in order to provide more sophisticated
search patterns. For example another version of
Searcher can be implemented, which will rate each result
according to where certain words appear in a page.
7. Conclusions
There are many professions that would benefit from
wireless networks as LANs are restricting because of
their physical infrastructure. Many of the inherent dis-
advantages of such networks can be treated with the use
of mobile agents. In a nutshell, our idea was to usemobile agents in a wireless network between many cli-
ents and a server in order to take advantage of the
ability of the mobile agents to operate asynchronously
and independently of the process that created them on
the client computer. We showed that the mobile agent
technology can be very beneficial to wireless devices,
because they can dispatch to a server any complex task
that needs resources which are either limited or un-
available to that device. With these in mind we designed
and implemented SearchSweep, a search refining plat-
form which utilizes the high bandwidth provided by a
server. The inherent advantages of using mobile agents,
makes the SearchSweep platform highly scalable.
References
Chignell, M., Gwizdka, J., Bodner, R., 1999. Discriminating meta-
search: a framework for evaluation. Information Processing &
Management 35 (3), 37–362.
Chess, D., Grosof, B., Harrison, C., Levine, D., Parris, C., Tsudik, G.,
1995a. Itinerant agents for mobile computing. Journal of IEEE
Personal Communications 2 (5).
Chess, D., Harrison, C., Kershenbaum, A., 1995. Mobile Agents: Are
They a Good Idea? IBM Research Division, research report.
Frakes, B., Beaza-Yates, R., 1992. Information Retrieval: Data
Structures and Algorithms. Prentice-Hall, Inc.
IBM Tokyo Research Labs, 1997. The Aglet Workbench: Program-
ming Mobile Agents in Java. http://www.trl.ibm.co.jp/aglets/.
Kamei, S., Kawano, H., Hasegawa, T., 1997. Effectiveness of
cooperative resource collecting robots for web search engines.
Proceedings of IEEE Pacific RIM Conference on Communications,
Computers, and Signal Processing, 410–413.
Karatza, H., Hilzer, R., 2001. Epoch load sharing in a network of
workstations. In: Proceedings of 34th Annual Simulation Sympo-
sium 2001. IEEE Computer Society, Seattle, WA.
Kohonen, T., 1998. Self-organization of very large document collec-
tions: State of the art. In: Proceedings of the 8th International
Conference on Artificial Neural Networks. Springer, pp. 65–74.
Lange, D., Oshima, M., 1998. Programming and Deploying Java
Mobile Agents with Aglets. Addison-Wesley.
Lin, B., Rao, H., Chlamtac, I., 2001. General Packet Radio Service
(GPRS): architecture, interfaces, and deployment. Wireless Com-
munications and Mobile Computing 1 (1), 77–92.
Porter, M., 1980. An algorithm for suffix stripping. Program 14 (3),
130–137.
Ritter, H., Kohonen, T., 1989. Self-organizing semantic map. Bio-
logical Cybernetics 61, 241–254.
Selberg, E., Etzioni, O., 2000. On the instability of Web Search
Engines. In: RIAO’2000 Content-Based Multimedia Information
Access. Coll�eege de France, Paris, France.
206 K.G. Zerfiridis, H.D. Karatza / The Journal of Systems and Software 69 (2004) 195–206
Silva, L., Soares, G., Martins, P., Batista, V., Santos, L., 2000.
Comparing the performance of mobile agent systems: a study of
benchmarking. Journal of Computer Communications 23 (8).
Konstantinos G. Zerfiridis received his Diploma degree in Mathematicsin June 1998 at the Aristotle University of Thessaloniki. In 1999 hereceived his M.Sc. degree in computer science from the University ofEdinburgh. He is currently a researcher and working towards a Ph.D.at the Aristotle University of Thessaloniki. His research interests are
mobile computing,mobile agents and distributed systems.His email andweb address are <[email protected]> and <agent.csd.auth.gr/~zerf>.
Helen D. Karatza is an Associate Professor at the Department of In-formatics at the Aristotle University of Thessaloniki, Greece. Her re-search interests mainly include performance evaluation of parallel anddistributed systems, multiprocessor scheduling and simulation. Heremail and web address are <[email protected]> and <agent.csd.auth.gr/~karatza>.