on peer-to-peer client web cac

7/29/2019 On Peer-To-peer Client Web Cac

http://slidepdf.com/reader/full/on-peer-to-peer-client-web-cac 1/5

Abstract -Conventional web caching systems based on cli-

ent-server model often suffer from the limited cache space and the

single point of failure. In this paper, we present a novel

peer-to-peer client web caching system, in which end-hosts collec-

tively share their web cache contents. Aggregating these individual

web caches, a huge virtual cache space is formed, and the burden

on web servers can be greatly lightened. We design an efficient

algorithm for managing and searching in the aggregated cache.

We also implement consistency control to prevent sharing stale

web objects in peers’ caches. Finally and most importantly, con-

sidering that end-hosts are generally not trusty as servers or

proxies, we employ an opinion-based sampling technique to

minimize the chance of distributing forged copies from maliciousnodes. We have built a prototype of the proposed system, and our

experimental results demonstrate that it has fast response time

with low overhead, and can effectively identify and block mali-

cious peers. 1

I. I NTRODUCTION

In the past decade, the web is growing with tremendous speed

and the contents are becoming enormously rich. To reduce

network traffic and user latency, web caching systems have

been widely deployed [14,15]. However, existing caching sys-

tems often suffer from the limited cache space and the risk of

single point failure. There have been many proposals on coop-

erative caching among proxies [2]-[6], yet they may still suffer

from the similar problems in a traditional client-server model.

In this paper, we present a novel peer-to-peer (P2P) client

web caching system, in which end-hosts in the network collec-

tively share their web cache contents. Aggregating these indi-

vidual web caches, a huge virtual cache space is formed, and the

burden of the web server can be greatly lightened. Yet there are

several critical issues to solve: First, how shall we manage the

client caches for ease of search? Second, how to control the

consistency with dynamic nodes? Third, how to maintain a

reasonable trust-level of the system, especially considering that

the clients are generally not trusty as servers or proxies?

To this end, we design an efficient algorithm for managing

and searching the aggregated cache. We also implement con-sistency control to prevent sharing stale web objects in peers’

web cache. Finally and most importantly, we propose an opin-

1 J. Liu’s work is partially supported by a Canadian NSERC Discovery Grant

and a SFU President’s Research Grant.2 X. Chu’s work is partially supported by a Research Grant Council, Hong

Kong, China, under Grant RGC/HKBU2159/04E, and a Faculty ResearchGrant of Hong Kong Baptist University (FRG/03-04/II-22).

ion-based sampling technique to minimize the chance of dis-

tributing forged copies from malicious nodes. We have built a

prototype of the proposed system, and our experimental results

demonstrate that it has fast response time with low overhead,

and can effectively identify and block malicious nodes.

The remainder of this paper is organized as follows. In Sec-

tion II, we present our trustable P2P web caching system in

detail. The performance evaluation of the system is presented in

Section III. Finally, Section IV concludes the paper.

II. THE P2PCLIENT WEB CACHING SYSTEM Fig. 1 depicts a generic P2P web caching system. With a P2P

network, the storage spaces of several machines are virtually

combined to form a huge web cache space to serve all the peers

(clients). We now detail the operations of the system, including

discovering neighboring nodes, searching desired web objects,

and maintaining the trust level.

Fig. 1. Overview of the P2P web caching system

A. Neighbor Discovery

Discovering other online nodes is quite an important issue in

decentralized P2P network. A careless design of discovery

protocol, like pinging a range of IP addresses and ports, would

cause heavy network traffic overhead. Motivated by the JXTA

Peer Discovery Protocol (PDP) [11], we have implemented two

ways for discovering peers. One is active, in which a peer isallowed to request new peer information from its existing

neighbors. The other method is a passive one where a peer

advertises itself to other peers periodically.

There is no single dedicated bootstrap server in our system.

Every peer will keep a list of cached address for start up. We

assume that every peer in the network knows at least one other

On Peer-to-Peer Client Web Cache SharingJiangchuan Liu1, Xiaowen Chu2, Ke Xu3

1School of Computing Science, Simon Fraser University, BC, Canada

2Department of Computer Science, Hong Kong Baptist University, Hong Kong, China

3Department of Computer Science and Technology, Tsinghua University, Beijing, China

306-7803-8938-7/05/$20.00 (C) 2005 IEEE

Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Downloaded on August 25, 2009 at 06:31 from IEEE Xplore. Restrictions apply.



peer, i.e. there is at least one entry of cached address in the list

which could be configured manually at the very beginning.

Afterwards, maintaining the number of valid entries in the list

would rely on the neighbor discovery protocol. A lower bound

of the number of online neighbors is also configured. Once the

number of neighbors is below the lower bound, the peer must

get information of new peers in an active way. It requests its

neighbors for other peers’ information and a list of nodes is

returned. The peer can then try to connect with new neighbors.

In the passive method, broadcast is adopted to advertise a

peer itself. Advertisement is forwarded peer by peer, but limited

by TTL, in a certain radius of network. Peers receiving the

advertisement will add the node to its list of address. Thus,

others may connect to the node in the future. The time interval

of advertisements and the value of TTL should be chosen care-

fully to avoid creating too much traffic on the network.

B. Searching

In our web caching system, every node has a 16-byte node ID

generated by applying the SHA-1 hash function [8, 9] on its

string of IP address concatenated with a colon and its port

number dedicated for the P2P communications. Searching isinitiated by a web client with a search request. Upon receiving

the request, a node will first pass the URL to the SHA-1 hash

function to get a 16-byte GUID of the requesting object. The

GUID is then used to search the object in its local cache. If the

object is found, the search is finished and an OK response is sent

back to the web browser with the object attached.

If local search misses, the node would have to determine a

neighbor whose node ID is closest to GUID by comparing the

two strings lexicographically. It will then send a search request

to ask that neighbor for the object. If that neighbor has the object,

search is then finished, and the neighbor will send the node a

copy of the object. The node can cache a copy of that in its own

cache and forward a copy to the web client. One point to note isthat the node whose node ID is closest to the GUID must be a

trusty node. If this closest neighboring node is distrustful, the

node with ID second closest to the GUID will be asked instead.

If the neighbor does not have the object, it will look into its

search history of other peers. Adopting the Probabilistic Search

Protocol [10], each node maintains a search history of the other

nodes. If more than one node has asked for the object, the

neighbor will reply with the address of the latest node asking for

the object, as asking this node will have greater chance of search

hit with lower chance of replacement of the target object from

the cache. The node receiving the reply can then ask that node

for the object. If the object is found in that node’s cache, a copy

of the file object can be retrieved and the search can be finished.

Otherwise, the object should be retrieved from the originating

server immediately.

At the point of looking into the search history, the neighbor

may not be able to find a node entry for the requested object. If

there is such a case, the neighbor will reply with a miss, indi-

cating that it can provide no information for the search. The

node receiving the miss will then retrieve the object from the

originating server. See Fig. 2 for an illustration of the opera-

tions.

Fig. 2. An illustration of the search operation. Node BBB this time is

responsible to get object c for Client B. It has not cached the object and

thus sends a search request to Node CCC. Node CCC now knows Node

AAA have previously asked for object c. It redirects Node BBB to ask

Node AAA for the object.

We are currently incorporating Distributed Hash Table into

our system to further improve its search efficiency. In addition,

during the above operations, the neighbor should record the

nodes in the search history. Later, if some nodes ask the same

requested node for the object, this node will have a chance of

being introduced to other nodes for holding the object. There-

fore each node will have a responsibility of providing objects in

its cache to other peers, and keeping the search history. In long

run, a large request history will be generated. All nodes in this

history list have a chance of holding an object.

Finally, for each search, if there are no replies from peers in

any circumstances, the searching node should get the objectfrom the original source immediately so as to minimize the

object retrieval time.

C. Opinion-based Sampling

To increase the trustiness of our system, we introduce a

sampling technique to prevent dishonest node distributing fake

web file copies to other peers. Before implementing sampling,

we need a representation of the trustworthiness of a peer in the

network. Therefore we apply a concept of opinion to express the

subjective beliefs of a node about the others [12]. An opinion

consists of three elements: belief, disbelief, and uncertainty.

This concept of opinion is different from the traditional prob-

ability calculation used in many existing trust models, which

does not express uncertainty. When a node first joins into a

network, it gets no idea about the trustworthiness of its

neighboring nodes. This state of uncertainty about a peer is

represented by the uncertainty part of the opinion. A peer can

start to evaluate the trustworthiness of another peer when it gets

any file resources from the cache of that peer. If the returned

object from a peer is checked to be genuine comparing with the

307




copy in the original server, the peer will gain trust from the peer

issuing the request. This means the belief parts of the opinion to

the corresponding peer increases. However, if there exists evil

nodes in the network which return fake copies of file to the

others, and fakeness of the object is being detected, the disbelief

about this peer will increase. Let ( , , )A A AB B B b d u be the opinion of

peer A about peer B which consists of belief, disbelief, and

uncertainty respectively. These three elements always satisfythe following condition:

1A A AB B B b d u + + =

When a new peer joins into the network, it will be uncertain

with the trustworthiness of all its neighboring peers, so its

opinion about all the others will start from (0, 0, 1). Similarly,

the opinion of all existing peers about the new peer will start

from (0, 0, 1).

The evaluation of a peer’s trustworthiness involves the

checking of accuracy of a returned file object. To do this, a peer

needs to get a copy of the object from the original web server

immediately and compare the two copies. It is not wise to check

the object every time as this will downgrade the performance of the system and violating the spirit of our design, which wants to

reduce the request to the original server by getting cache copies

from the peer. Therefore we adopt a scheme of sampling, which

performs the checking process occasionally. If a peer A already

got a relatively high belief in its opinion about a neighboring

peer B, peer A will reduce the probability of checking B’s re-

turning object. On the other hand, if peer A got a relatively low

belief about peer B, the probability of checking B’s returning

object will increase. A simple representation of this probability

will be1 AB b− . Every time when we got a file from a peer, we

will generate a random number between 0 and 1. If this number

is smaller than 1 AB b− , we will take this file as a sample and do

the accuracy checking.

An accuracy checking will have exactly two results: the file

object from the peer is exactly the same as the one from the

original server, which is regarded as a “positive event” ( p); or

the file is found to be different from that in original server,

which is regarded as a “negative event” (n). Now we can ex-

press the elements of the opinion of node A about a neighboring

node B as functions of p and n:

2, ,

2 2 2A A AB B B

p n b d u

p n p n p n = = =

+ + + + + +

Apart from this, we have implemented the feature that allows

a peer to collect neighboring peers’ opinions about the others

periodically. Combining the others’ opinions and its own, a peer

can get a relatively objective opinion about the other peers. The

combination method we used is discounting combination which

works as follows: Suppose node A has already got the opinion

about node B. Then node A wants to know node C’s trustwor-

thiness and ask node B to give node A its opinion on node C.

There are two opinions here, one is A about B and the other is B

about C. Let the opinion of A about B, AB ω , to be ( , , )A A A

B B B b d u

and opinion of B about C, B C ω to be ( , , )B B B

C C C b d u . We define

the discounting of B C ω by A

B ω to be ( , , )AB AB AB C C C b d u and is

calculated by the following equations:

, ,AB A B AB A B AB A A A B C B C C B C C B B B C b b b d b d u d u b u = = = + +

Then the final opinion of peer A about peer C, ( , , )A A AC C C b d u ′ ′ ′

will be a combination of ( , , )AB AB AB C C C b d u and original opinion

of peer A about peer C, ( , , )A A AC C C b d u :

, ,2 2 2

A AB A AB A AB C C C C C C A A A

C C C

b b d d u u b d u

+ + +′ ′ ′= = =

By the above calculation, there may be a situation that peer A

is totally uncertain with peer C’s trustworthiness in the begin-

ning, but its opinion about C is affected by the opinion of peer B

about C if peer A has already got some belief about peer B.

As mentioned above, the higher the belief about a peer, the

lower will be the chance of performing accuracy checking of the

peer’s returning file. However, if a peer is found to return forged

files, the corresponding disbelief about that peer will increase. If

the disbelief about a peer exceeds a certain threshold, the peer

will be distrusted for an expiration time. During this period the

opinion about this node will be set to (0, 1, 0). In every step of a

search we must make sure that the node being requested is

trusty. The file of a distrusted node will not be requested even

though the ID of this distrusted node is closest to the GUID of

the file. At the same time a distrusted node will not be intro-

duced to other nodes upon search request. There is a chance that

a neighbor node introduces a node to the searching node that it

thinks to be trustworthy, but the searching node may find that it

should distrust this node according to the opinion it previously

had. In this case the searching node will not request this dis-

trusted node for the object either. When this distrust period

expires, the opinion about this node will be set back to (0, 0, 1).

In our implementation the threshold of distrust is set to 0.3.

D. Consistency Control

For consistency control, we let the clients periodically check

back with the server to determine if cached objects are still

valid. A state diagram for the operations is shown in Fig. 3.

Assuming older files will expire after a longer time, and

younger files will expire after a shorter time, each file is asso-

ciated with an expiration period and an expiration timestamp.

The expiration timestamp is the sum of the last modified time of

the file and the expiration period.

By arranging the expiration timestamp in ascending order, a

job list is formed. The list is checked periodically to find out

which files have their timestamp expired. For a file with time-

stamp expired, a request with the last modified date of the file in

the “If-Modified-Since” header field is sent to the original

server. This indicates that the server should only return the

requested document if the document has been changed since the

308




specified date. Thus, if no new file is returned, the expiration

period should be longer and it would be multiplied by a factor. If

a file is returned, the expiration period should be shorter. The

old file is replaced by the new one and the expiration period is

divided by a factor. The expiration period for each file is esti-

mated in this way and the period is changing dynamically.

Fig. 3. State diagram illustrating the dynamic changes of

expiration period of a file in the cache.

III. PERFORMANCE EVALUATION

We have built a prototype of the proposed system. To inves-

tigate its searching efficiency and trustiness, we have conducted

a series of tests with a simulator that synthesizes requests as

well as other node activities. As shown in Fig. 4, the simulator

keeps a list of node addresses and ports to which it can request

web objects from the web caching system. It also keeps a list of

URL. Periodically, it randomly selects a URL and randomly

selects a node to request for the web objects. Then it acts as a

common web browser. To create a dishonest node in the net-

work, another program is implemented to forge cached files.

To find out the efficiency of searching, hit rate is measured in

the experiment. A request will be issued in every 500ms, and

hence the request rate in our experiment is 120 requests per

minute. The variation of hit rate against time with different

number of nodes is shown in Fig. 5. We also show the time to

reach 99% hit rate in Table 1.

Fig. 4. Experiment setup.

Variation of Hit Rate against Time

0%

20%

40%

60%

80%

100%

1 3 5 7 9 11 13 15 17 19 21 23

Time (in minute)

H i t R a t e 4 nodes

8 nodes

12 nodes

16 nodes

Fig. 5. Variation of Hit Rate against Time

Table 1. Amount of time needed to reach 99% hit rate for

different number of nodes in the network.

It can be seen that the hit rate is generally increasing and

approaching 100% for each test. However, with fewer numbers

of nodes, the hit rate approaches 100% in a shorter time interval.

There are several causes for this. First, the number of requests

reaching each node decreased when the number of nodes in-

creased (as the number of requests generated by the simulator in

one minute is fixed). Therefore, the number of cached objects in

each node is smaller. Second, when the peer-to-peer network

becomes larger, the node with the desired object may not be in

the vicinity. These both contribute to a lower hit rate.

To investigate the trustiness of the sampling algorithm, we

then measure the disbelief level of the dishonest node in each

network. The disbelief levels against time in each test are shown

in Figs.6-9. A dishonest node is identified if its disbelief is

greater than a threshold, depending on the applications. It can be

seen that the dishonest node is figured out in a shorter interval if

the network is smaller. Moreover, in a smaller network, the

frequency of figuring out the dishonest node is higher. The

causes for these trends are similar to those for searching. The

number of requests reaching each node decreased when the

number of nodes increased. Thus, the frequency of file retrieval

from the dishonest node decreased, leading to a slower in-

creasing rate of disbelief.

Number of nodes 4 8 12 16

Time for reaching 99% hit rate 5 15 15 >24

309




Variation of Disbelief against Time for 4 Nodes

0

0.05

0.1

0.15

0.2

0.25

0.3

1 3 5 7 9 11 13 15 17 19 21 23 25

Time (in minute)

D i s b e l i e f

Node0

Node1

Node2

Fig. 6. Disbelief level against time for 4 nodes


0

0.05

0.1

0.15

0.2

0.25

0.3

1 3 5 7 9 11 13 15 17 19 21 23 25

Time (in minute)

D i s b e l i e f

Node4

Node5

Node9



0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

Time (in minute)

D i s b e l i e f

Node9

Node10

Node12


IV. CONCLUSION

In this paper, we have presented a trustable peer-to-peer web

caching system, in which peers in the network share their web

cache contents. To increase the trust-level of the system, we

propose to use a sampling technique to minimize the chance of

distributing fake web file copies among the peers. We further

introduce the concept of opinion to represent the trustworthiness

of individual peer. We have built a prototype of the proposed

system, and our experimental results demonstrate that it has fast

response time with low overhead, and can effectively identify

and block malicious peers.


0

0.05

0.1

0.15

0.2

0.25

0.3

1 3 5 7 9 11 13 15 17 19 21 23 25

Time (in minute)

D i s b e l i e f

Node0

Node1

Node5

Node7

Node8

Node9

Node15


ACKNOWLEDGEMENT

The authors would like to thank for C. Ng and P. Choi for

their effort in building the prototype.

R EFERENCES

[1] A. Luotonen, Web Proxy Servers, Prentice Hall, Englewood Cliffs, NewJersey, 1998

[2] D. Wessels and K. Claffy, “Internet Cache Protocol. (v2)”, RFC2186 ,September 1997.

[3] D. Wessels and K. Claffy, “Application of Internet Cache Protocol. (v2)”,RFC2187, September 1997.

[4] K. W. Ross, “Hash Routing for Collections of Shared Web Caches”, IEEE Network , vol. 11, pp. 37-44, November /December 1997.

[5] T. Asaka, H. Miwa, and Y. Tanaka, “Distributed Web Caching usingHash-based Query Caching Method”, in Proceedings of IEEE Interna-tional Conference on Control Applications, vol. 2, pp 1620-1625, 1999.

[6] I. Clarke et al., “Protecting Free Expression Online with Freenet”, IEEE Internet Computing , vol. 5, no. 1, pp. 40-49, 2002.

[7] S. Lyer, A. Rowstron, and P. Druschel, “Squirrel: A Decentralized

Peer-to-peer Web Cache”, in Proceedings of ACM Symposium on Prin-ciples of Distributed Computing , 2002.

[8] National Institute of Standards and Technology, “Secure Hash Standard”,April 17, 1995.

[9] D. Eastlake 3rd and P. Jones, “US Secure Hash Algorithm 1 (SHA1)”, RFC3174, September 2001.

[10] D. A. Menascé, “Scalable P2P Search”, IEEE Internet Computing , vol. 7, pp. 83-87, March / April 2003.

[11] R. Flenner, Java P2P unleashed , Indianapolis: Sams, pp. 122-135, 2003.

[12] X. Li, M. R. Lyu, and J. Liu, “A Trust Model Based Routing Protocol for Secure Ad Hoc Network ”, in Proceedings of IEEE Aerospace Confer-ence, Big Sky, MT, March 6-13, 2004.

[13] J. Gwertzman and M. Seltzer, “World-Wide Web Cache Consistency”, in Proceedings of the USENIX Annual Technical Conference, pp. 141-152,San Diego, CA, January 1996.

[14] J. Liu and J. Xu, “Proxy Caching for Media Streaming over the Internet,” IEEE Communications, Feature Topic on Proxy Support for Streaming onthe Internet , August 2004.

[15] J. Xu, J. Liu, B. Li, and X. Jia, “Caching and Prefetching for Web ContentDistribution,” IEEE Computing in Science and Engineering, Special issueon Web Engineering , August 2004.

310

on peer-to-peer client web cac

Documents