on peer-to-peer client web cac
TRANSCRIPT
7/29/2019 On Peer-To-peer Client Web Cac
http://slidepdf.com/reader/full/on-peer-to-peer-client-web-cac 1/5
Abstract -Conventional web caching systems based on cli-
ent-server model often suffer from the limited cache space and the
single point of failure. In this paper, we present a novel
peer-to-peer client web caching system, in which end-hosts collec-
tively share their web cache contents. Aggregating these individual
web caches, a huge virtual cache space is formed, and the burden
on web servers can be greatly lightened. We design an efficient
algorithm for managing and searching in the aggregated cache.
We also implement consistency control to prevent sharing stale
web objects in peers’ caches. Finally and most importantly, con-
sidering that end-hosts are generally not trusty as servers or
proxies, we employ an opinion-based sampling technique to
minimize the chance of distributing forged copies from maliciousnodes. We have built a prototype of the proposed system, and our
experimental results demonstrate that it has fast response time
with low overhead, and can effectively identify and block mali-
cious peers. 1
I. I NTRODUCTION
In the past decade, the web is growing with tremendous speed
and the contents are becoming enormously rich. To reduce
network traffic and user latency, web caching systems have
been widely deployed [14,15]. However, existing caching sys-
tems often suffer from the limited cache space and the risk of
single point failure. There have been many proposals on coop-
erative caching among proxies [2]-[6], yet they may still suffer
from the similar problems in a traditional client-server model.
In this paper, we present a novel peer-to-peer (P2P) client
web caching system, in which end-hosts in the network collec-
tively share their web cache contents. Aggregating these indi-
vidual web caches, a huge virtual cache space is formed, and the
burden of the web server can be greatly lightened. Yet there are
several critical issues to solve: First, how shall we manage the
client caches for ease of search? Second, how to control the
consistency with dynamic nodes? Third, how to maintain a
reasonable trust-level of the system, especially considering that
the clients are generally not trusty as servers or proxies?
To this end, we design an efficient algorithm for managing
and searching the aggregated cache. We also implement con-sistency control to prevent sharing stale web objects in peers’
web cache. Finally and most importantly, we propose an opin-
1 J. Liu’s work is partially supported by a Canadian NSERC Discovery Grant
and a SFU President’s Research Grant.2 X. Chu’s work is partially supported by a Research Grant Council, Hong
Kong, China, under Grant RGC/HKBU2159/04E, and a Faculty ResearchGrant of Hong Kong Baptist University (FRG/03-04/II-22).
ion-based sampling technique to minimize the chance of dis-
tributing forged copies from malicious nodes. We have built a
prototype of the proposed system, and our experimental results
demonstrate that it has fast response time with low overhead,
and can effectively identify and block malicious nodes.
The remainder of this paper is organized as follows. In Sec-
tion II, we present our trustable P2P web caching system in
detail. The performance evaluation of the system is presented in
Section III. Finally, Section IV concludes the paper.
II. THE P2PCLIENT WEB CACHING SYSTEM Fig. 1 depicts a generic P2P web caching system. With a P2P
network, the storage spaces of several machines are virtually
combined to form a huge web cache space to serve all the peers
(clients). We now detail the operations of the system, including
discovering neighboring nodes, searching desired web objects,
and maintaining the trust level.
Fig. 1. Overview of the P2P web caching system
A. Neighbor Discovery
Discovering other online nodes is quite an important issue in
decentralized P2P network. A careless design of discovery
protocol, like pinging a range of IP addresses and ports, would
cause heavy network traffic overhead. Motivated by the JXTA
Peer Discovery Protocol (PDP) [11], we have implemented two
ways for discovering peers. One is active, in which a peer isallowed to request new peer information from its existing
neighbors. The other method is a passive one where a peer
advertises itself to other peers periodically.
There is no single dedicated bootstrap server in our system.
Every peer will keep a list of cached address for start up. We
assume that every peer in the network knows at least one other
On Peer-to-Peer Client Web Cache SharingJiangchuan Liu1, Xiaowen Chu2, Ke Xu3
1School of Computing Science, Simon Fraser University, BC, Canada
2Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
3Department of Computer Science and Technology, Tsinghua University, Beijing, China
306-7803-8938-7/05/$20.00 (C) 2005 IEEE
Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Downloaded on August 25, 2009 at 06:31 from IEEE Xplore. Restrictions apply.
7/29/2019 On Peer-To-peer Client Web Cac
http://slidepdf.com/reader/full/on-peer-to-peer-client-web-cac 2/5
peer, i.e. there is at least one entry of cached address in the list
which could be configured manually at the very beginning.
Afterwards, maintaining the number of valid entries in the list
would rely on the neighbor discovery protocol. A lower bound
of the number of online neighbors is also configured. Once the
number of neighbors is below the lower bound, the peer must
get information of new peers in an active way. It requests its
neighbors for other peers’ information and a list of nodes is
returned. The peer can then try to connect with new neighbors.
In the passive method, broadcast is adopted to advertise a
peer itself. Advertisement is forwarded peer by peer, but limited
by TTL, in a certain radius of network. Peers receiving the
advertisement will add the node to its list of address. Thus,
others may connect to the node in the future. The time interval
of advertisements and the value of TTL should be chosen care-
fully to avoid creating too much traffic on the network.
B. Searching
In our web caching system, every node has a 16-byte node ID
generated by applying the SHA-1 hash function [8, 9] on its
string of IP address concatenated with a colon and its port
number dedicated for the P2P communications. Searching isinitiated by a web client with a search request. Upon receiving
the request, a node will first pass the URL to the SHA-1 hash
function to get a 16-byte GUID of the requesting object. The
GUID is then used to search the object in its local cache. If the
object is found, the search is finished and an OK response is sent
back to the web browser with the object attached.
If local search misses, the node would have to determine a
neighbor whose node ID is closest to GUID by comparing the
two strings lexicographically. It will then send a search request
to ask that neighbor for the object. If that neighbor has the object,
search is then finished, and the neighbor will send the node a
copy of the object. The node can cache a copy of that in its own
cache and forward a copy to the web client. One point to note isthat the node whose node ID is closest to the GUID must be a
trusty node. If this closest neighboring node is distrustful, the
node with ID second closest to the GUID will be asked instead.
If the neighbor does not have the object, it will look into its
search history of other peers. Adopting the Probabilistic Search
Protocol [10], each node maintains a search history of the other
nodes. If more than one node has asked for the object, the
neighbor will reply with the address of the latest node asking for
the object, as asking this node will have greater chance of search
hit with lower chance of replacement of the target object from
the cache. The node receiving the reply can then ask that node
for the object. If the object is found in that node’s cache, a copy
of the file object can be retrieved and the search can be finished.
Otherwise, the object should be retrieved from the originating
server immediately.
At the point of looking into the search history, the neighbor
may not be able to find a node entry for the requested object. If
there is such a case, the neighbor will reply with a miss, indi-
cating that it can provide no information for the search. The
node receiving the miss will then retrieve the object from the
originating server. See Fig. 2 for an illustration of the opera-
tions.
Fig. 2. An illustration of the search operation. Node BBB this time is
responsible to get object c for Client B. It has not cached the object and
thus sends a search request to Node CCC. Node CCC now knows Node
AAA have previously asked for object c. It redirects Node BBB to ask
Node AAA for the object.
We are currently incorporating Distributed Hash Table into
our system to further improve its search efficiency. In addition,
during the above operations, the neighbor should record the
nodes in the search history. Later, if some nodes ask the same
requested node for the object, this node will have a chance of
being introduced to other nodes for holding the object. There-
fore each node will have a responsibility of providing objects in
its cache to other peers, and keeping the search history. In long
run, a large request history will be generated. All nodes in this
history list have a chance of holding an object.
Finally, for each search, if there are no replies from peers in
any circumstances, the searching node should get the objectfrom the original source immediately so as to minimize the
object retrieval time.
C. Opinion-based Sampling
To increase the trustiness of our system, we introduce a
sampling technique to prevent dishonest node distributing fake
web file copies to other peers. Before implementing sampling,
we need a representation of the trustworthiness of a peer in the
network. Therefore we apply a concept of opinion to express the
subjective beliefs of a node about the others [12]. An opinion
consists of three elements: belief, disbelief, and uncertainty.
This concept of opinion is different from the traditional prob-
ability calculation used in many existing trust models, which
does not express uncertainty. When a node first joins into a
network, it gets no idea about the trustworthiness of its
neighboring nodes. This state of uncertainty about a peer is
represented by the uncertainty part of the opinion. A peer can
start to evaluate the trustworthiness of another peer when it gets
any file resources from the cache of that peer. If the returned
object from a peer is checked to be genuine comparing with the
307
Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Downloaded on August 25, 2009 at 06:31 from IEEE Xplore. Restrictions apply.
7/29/2019 On Peer-To-peer Client Web Cac
http://slidepdf.com/reader/full/on-peer-to-peer-client-web-cac 3/5
copy in the original server, the peer will gain trust from the peer
issuing the request. This means the belief parts of the opinion to
the corresponding peer increases. However, if there exists evil
nodes in the network which return fake copies of file to the
others, and fakeness of the object is being detected, the disbelief
about this peer will increase. Let ( , , )A A AB B B b d u be the opinion of
peer A about peer B which consists of belief, disbelief, and
uncertainty respectively. These three elements always satisfythe following condition:
1A A AB B B b d u + + =
When a new peer joins into the network, it will be uncertain
with the trustworthiness of all its neighboring peers, so its
opinion about all the others will start from (0, 0, 1). Similarly,
the opinion of all existing peers about the new peer will start
from (0, 0, 1).
The evaluation of a peer’s trustworthiness involves the
checking of accuracy of a returned file object. To do this, a peer
needs to get a copy of the object from the original web server
immediately and compare the two copies. It is not wise to check
the object every time as this will downgrade the performance of the system and violating the spirit of our design, which wants to
reduce the request to the original server by getting cache copies
from the peer. Therefore we adopt a scheme of sampling, which
performs the checking process occasionally. If a peer A already
got a relatively high belief in its opinion about a neighboring
peer B, peer A will reduce the probability of checking B’s re-
turning object. On the other hand, if peer A got a relatively low
belief about peer B, the probability of checking B’s returning
object will increase. A simple representation of this probability
will be1 AB b− . Every time when we got a file from a peer, we
will generate a random number between 0 and 1. If this number
is smaller than 1 AB b− , we will take this file as a sample and do
the accuracy checking.
An accuracy checking will have exactly two results: the file
object from the peer is exactly the same as the one from the
original server, which is regarded as a “positive event” ( p); or
the file is found to be different from that in original server,
which is regarded as a “negative event” (n). Now we can ex-
press the elements of the opinion of node A about a neighboring
node B as functions of p and n:
2, ,
2 2 2A A AB B B
p n b d u
p n p n p n = = =
+ + + + + +
Apart from this, we have implemented the feature that allows
a peer to collect neighboring peers’ opinions about the others
periodically. Combining the others’ opinions and its own, a peer
can get a relatively objective opinion about the other peers. The
combination method we used is discounting combination which
works as follows: Suppose node A has already got the opinion
about node B. Then node A wants to know node C’s trustwor-
thiness and ask node B to give node A its opinion on node C.
There are two opinions here, one is A about B and the other is B
about C. Let the opinion of A about B, AB ω , to be ( , , )A A A
B B B b d u
and opinion of B about C, B C ω to be ( , , )B B B
C C C b d u . We define
the discounting of B C ω by A
B ω to be ( , , )AB AB AB C C C b d u and is
calculated by the following equations:
, ,AB A B AB A B AB A A A B C B C C B C C B B B C b b b d b d u d u b u = = = + +
Then the final opinion of peer A about peer C, ( , , )A A AC C C b d u ′ ′ ′
will be a combination of ( , , )AB AB AB C C C b d u and original opinion
of peer A about peer C, ( , , )A A AC C C b d u :
, ,2 2 2
A AB A AB A AB C C C C C C A A A
C C C
b b d d u u b d u
+ + +′ ′ ′= = =
By the above calculation, there may be a situation that peer A
is totally uncertain with peer C’s trustworthiness in the begin-
ning, but its opinion about C is affected by the opinion of peer B
about C if peer A has already got some belief about peer B.
As mentioned above, the higher the belief about a peer, the
lower will be the chance of performing accuracy checking of the
peer’s returning file. However, if a peer is found to return forged
files, the corresponding disbelief about that peer will increase. If
the disbelief about a peer exceeds a certain threshold, the peer
will be distrusted for an expiration time. During this period the
opinion about this node will be set to (0, 1, 0). In every step of a
search we must make sure that the node being requested is
trusty. The file of a distrusted node will not be requested even
though the ID of this distrusted node is closest to the GUID of
the file. At the same time a distrusted node will not be intro-
duced to other nodes upon search request. There is a chance that
a neighbor node introduces a node to the searching node that it
thinks to be trustworthy, but the searching node may find that it
should distrust this node according to the opinion it previously
had. In this case the searching node will not request this dis-
trusted node for the object either. When this distrust period
expires, the opinion about this node will be set back to (0, 0, 1).
In our implementation the threshold of distrust is set to 0.3.
D. Consistency Control
For consistency control, we let the clients periodically check
back with the server to determine if cached objects are still
valid. A state diagram for the operations is shown in Fig. 3.
Assuming older files will expire after a longer time, and
younger files will expire after a shorter time, each file is asso-
ciated with an expiration period and an expiration timestamp.
The expiration timestamp is the sum of the last modified time of
the file and the expiration period.
By arranging the expiration timestamp in ascending order, a
job list is formed. The list is checked periodically to find out
which files have their timestamp expired. For a file with time-
stamp expired, a request with the last modified date of the file in
the “If-Modified-Since” header field is sent to the original
server. This indicates that the server should only return the
requested document if the document has been changed since the
308
Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Downloaded on August 25, 2009 at 06:31 from IEEE Xplore. Restrictions apply.
7/29/2019 On Peer-To-peer Client Web Cac
http://slidepdf.com/reader/full/on-peer-to-peer-client-web-cac 4/5
specified date. Thus, if no new file is returned, the expiration
period should be longer and it would be multiplied by a factor. If
a file is returned, the expiration period should be shorter. The
old file is replaced by the new one and the expiration period is
divided by a factor. The expiration period for each file is esti-
mated in this way and the period is changing dynamically.
Fig. 3. State diagram illustrating the dynamic changes of
expiration period of a file in the cache.
III. PERFORMANCE EVALUATION
We have built a prototype of the proposed system. To inves-
tigate its searching efficiency and trustiness, we have conducted
a series of tests with a simulator that synthesizes requests as
well as other node activities. As shown in Fig. 4, the simulator
keeps a list of node addresses and ports to which it can request
web objects from the web caching system. It also keeps a list of
URL. Periodically, it randomly selects a URL and randomly
selects a node to request for the web objects. Then it acts as a
common web browser. To create a dishonest node in the net-
work, another program is implemented to forge cached files.
To find out the efficiency of searching, hit rate is measured in
the experiment. A request will be issued in every 500ms, and
hence the request rate in our experiment is 120 requests per
minute. The variation of hit rate against time with different
number of nodes is shown in Fig. 5. We also show the time to
reach 99% hit rate in Table 1.
Fig. 4. Experiment setup.
Variation of Hit Rate against Time
0%
20%
40%
60%
80%
100%
1 3 5 7 9 11 13 15 17 19 21 23
Time (in minute)
H i t R a t e 4 nodes
8 nodes
12 nodes
16 nodes
Fig. 5. Variation of Hit Rate against Time
Table 1. Amount of time needed to reach 99% hit rate for
different number of nodes in the network.
It can be seen that the hit rate is generally increasing and
approaching 100% for each test. However, with fewer numbers
of nodes, the hit rate approaches 100% in a shorter time interval.
There are several causes for this. First, the number of requests
reaching each node decreased when the number of nodes in-
creased (as the number of requests generated by the simulator in
one minute is fixed). Therefore, the number of cached objects in
each node is smaller. Second, when the peer-to-peer network
becomes larger, the node with the desired object may not be in
the vicinity. These both contribute to a lower hit rate.
To investigate the trustiness of the sampling algorithm, we
then measure the disbelief level of the dishonest node in each
network. The disbelief levels against time in each test are shown
in Figs.6-9. A dishonest node is identified if its disbelief is
greater than a threshold, depending on the applications. It can be
seen that the dishonest node is figured out in a shorter interval if
the network is smaller. Moreover, in a smaller network, the
frequency of figuring out the dishonest node is higher. The
causes for these trends are similar to those for searching. The
number of requests reaching each node decreased when the
number of nodes increased. Thus, the frequency of file retrieval
from the dishonest node decreased, leading to a slower in-
creasing rate of disbelief.
Number of nodes 4 8 12 16
Time for reaching 99% hit rate 5 15 15 >24
309
Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Downloaded on August 25, 2009 at 06:31 from IEEE Xplore. Restrictions apply.
7/29/2019 On Peer-To-peer Client Web Cac
http://slidepdf.com/reader/full/on-peer-to-peer-client-web-cac 5/5
Variation of Disbelief against Time for 4 Nodes
0
0.05
0.1
0.15
0.2
0.25
0.3
1 3 5 7 9 11 13 15 17 19 21 23 25
Time (in minute)
D i s b e l i e f
Node0
Node1
Node2
Fig. 6. Disbelief level against time for 4 nodes
Variation of Disbelief against Time for 8 Nodes
0
0.05
0.1
0.15
0.2
0.25
0.3
1 3 5 7 9 11 13 15 17 19 21 23 25
Time (in minute)
D i s b e l i e f
Node4
Node5
Node9
Fig. 7. Disbelief level against time for 8 nodes
Variation of Disbelief against Time for 12 Nodes
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
Time (in minute)
D i s b e l i e f
Node9
Node10
Node12
Fig. 8. Disbelief level against time for 12 nodes
IV. CONCLUSION
In this paper, we have presented a trustable peer-to-peer web
caching system, in which peers in the network share their web
cache contents. To increase the trust-level of the system, we
propose to use a sampling technique to minimize the chance of
distributing fake web file copies among the peers. We further
introduce the concept of opinion to represent the trustworthiness
of individual peer. We have built a prototype of the proposed
system, and our experimental results demonstrate that it has fast
response time with low overhead, and can effectively identify
and block malicious peers.
Variation of Disbelief against Time for 16 Nodes
0
0.05
0.1
0.15
0.2
0.25
0.3
1 3 5 7 9 11 13 15 17 19 21 23 25
Time (in minute)
D i s b e l i e f
Node0
Node1
Node5
Node7
Node8
Node9
Node15
Fig. 9. Disbelief level against time for 16 nodes
ACKNOWLEDGEMENT
The authors would like to thank for C. Ng and P. Choi for
their effort in building the prototype.
R EFERENCES
[1] A. Luotonen, Web Proxy Servers, Prentice Hall, Englewood Cliffs, NewJersey, 1998
[2] D. Wessels and K. Claffy, “Internet Cache Protocol. (v2)”, RFC2186 ,September 1997.
[3] D. Wessels and K. Claffy, “Application of Internet Cache Protocol. (v2)”,RFC2187, September 1997.
[4] K. W. Ross, “Hash Routing for Collections of Shared Web Caches”, IEEE Network , vol. 11, pp. 37-44, November /December 1997.
[5] T. Asaka, H. Miwa, and Y. Tanaka, “Distributed Web Caching usingHash-based Query Caching Method”, in Proceedings of IEEE Interna-tional Conference on Control Applications, vol. 2, pp 1620-1625, 1999.
[6] I. Clarke et al., “Protecting Free Expression Online with Freenet”, IEEE Internet Computing , vol. 5, no. 1, pp. 40-49, 2002.
[7] S. Lyer, A. Rowstron, and P. Druschel, “Squirrel: A Decentralized
Peer-to-peer Web Cache”, in Proceedings of ACM Symposium on Prin-ciples of Distributed Computing , 2002.
[8] National Institute of Standards and Technology, “Secure Hash Standard”,April 17, 1995.
[9] D. Eastlake 3rd and P. Jones, “US Secure Hash Algorithm 1 (SHA1)”, RFC3174, September 2001.
[10] D. A. Menascé, “Scalable P2P Search”, IEEE Internet Computing , vol. 7, pp. 83-87, March / April 2003.
[11] R. Flenner, Java P2P unleashed , Indianapolis: Sams, pp. 122-135, 2003.
[12] X. Li, M. R. Lyu, and J. Liu, “A Trust Model Based Routing Protocol for Secure Ad Hoc Network ”, in Proceedings of IEEE Aerospace Confer-ence, Big Sky, MT, March 6-13, 2004.
[13] J. Gwertzman and M. Seltzer, “World-Wide Web Cache Consistency”, in Proceedings of the USENIX Annual Technical Conference, pp. 141-152,San Diego, CA, January 1996.
[14] J. Liu and J. Xu, “Proxy Caching for Media Streaming over the Internet,” IEEE Communications, Feature Topic on Proxy Support for Streaming onthe Internet , August 2004.
[15] J. Xu, J. Liu, B. Li, and X. Jia, “Caching and Prefetching for Web ContentDistribution,” IEEE Computing in Science and Engineering, Special issueon Web Engineering , August 2004.
310