175 high performance p2p web caching
Post on 03-Apr-2018
218 Views
Preview:
TRANSCRIPT
-
7/28/2019 175 High Performance P2P Web Caching
1/21
High Performance
P2P Web Caching
Erik GarrisonJared Friedman
CS264 PresentationMay 2, 2006
-
7/28/2019 175 High Performance P2P Web Caching
2/21
SETI@Home
Basic Idea: people donate computer time to look foraliens
Delivered more than 9 million CPU-years
Guinness BWR largest computation ever Many other successful projects (BOINC, Google
Compute) The point: many people are willing to donate
computer resources for a good cause
-
7/28/2019 175 High Performance P2P Web Caching
3/21
Wikipedia
About 200 servers required to keep the sitelive
Hosting & Hardware costs over 1$M per year All revenue from donations Hard to make ends meet Other not-for-profit websites in similar
situation
-
7/28/2019 175 High Performance P2P Web Caching
4/21
HelpWikipedia@Home
What if people could donate idle computerresources to help host not-for-profitwebsites?
They probably would! This is the goal of our project
-
7/28/2019 175 High Performance P2P Web Caching
5/21
Prior Work
This doesn't exist But some things are similar
Content Distribution Networks (Akamai) Distributed web hosting for big companies
CoralCDN/CoDeeN P2P web caching, like our idea, But a very different design Both have some problems
-
7/28/2019 175 High Performance P2P Web Caching
6/21
Akamai, the opportunity
Internet traffic is 'bursty' Expensive to build infrastructure to handle
flash crowds International audience, local servers
Sites run slowly in other countries
-
7/28/2019 175 High Performance P2P Web Caching
7/21
Akamai, how it works
Akamai put >10,000 servers around theglobe
Companies subscribe as Akamai clients Client content (mostly images, other media)
is cached on Akamai's servers Tricks with DNS make viewers download
content from nearby Akamai servers Result: Website runs fast everywhere, no
worries about flash crowds But VERY expensive!
-
7/28/2019 175 High Performance P2P Web Caching
8/21
CoralCDN
P2P web caching Probably the closest system to our goal Currently in late-stage testing on PlanetLab Uses an overlay and a 'distributed sloppy
hash table' Very easy to use just append '.nyud.net' to
a URL and Coral handles it Unfortunately ...
-
7/28/2019 175 High Performance P2P Web Caching
9/21
Coral: Problems
Currently very slow This might improve in later versions Or it might be due to the overlay structure
Security: volunteer nodes can respond withfake data
Any site can use Coral to help reduce load Just append .nyud.net to their internal links
Decentralization makes optimization hard more on this later
-
7/28/2019 175 High Performance P2P Web Caching
10/21
Our Design Goals
Fast: Akamai level performance Secure: Pages served are always genuine Fast updates possible Must greatly reduce demands on main site
But this cannot compromise first 3
-
7/28/2019 175 High Performance P2P Web Caching
11/21
Our Design
Node/Supernode structure Take advantage of extremely heterogeneous
performance characteristics
Custom DNS server redirects incomingrequests to nearby super node
Super node forwards request to nearbyordinary node
Node replies to user
-
7/28/2019 175 High Performance P2P Web Caching
12/21
Our Design
User goes to wikipedia.org
DNS server resolveswikipedia.org to a super node
Super node forwards request toordinary node that has therequested document
Node retrieves documentand sends to user
-
7/28/2019 175 High Performance P2P Web Caching
13/21
Performance
Requests are answered in only 2 hops DNS server resolves to a geographically
close supernode Supernode avoids sending requests to slow
or overloaded nodes All parts of a page (e.g., html and images)
should be served by a single node
-
7/28/2019 175 High Performance P2P Web Caching
14/21
Security
Have to check nodes' accuracy First line of defense: encrypt local content May delay attacks, but won't stop them
-
7/28/2019 175 High Performance P2P Web Caching
15/21
Security
More serious defense: let users check thevolunteer nodes!
Add a javascript wrapper to the website that
requests the pages using AJAX With some probability, the AJAX script will
compute the MD5 of the page it got and sendit to a trusted central node
Central node kicks out nodes that frequentlyget invalid MD5sum's
Offload processing not just to nodes, but tousers, with zero-install
-
7/28/2019 175 High Performance P2P Web Caching
16/21
A Tricky Part
Supernodes get requests, have to decidewhat node should answer what requests
Have to load-balance nodes no overloading Popular documents should be replicated
across many nodes But don't want to replicate unpopular
documents much conserve storage space Lots of conflicting goals!
-
7/28/2019 175 High Performance P2P Web Caching
17/21
On the plus side...
Unlike Coral & CoDeeN, supernodes know alot of nodes (maybe 100-1000?)
They can track performance characteristics
of each node Make object placement decisions from a
central point Lots of opportunity to make really intelligent
decisions Better use of resources Higher total system capacity Faster response times
-
7/28/2019 175 High Performance P2P Web Caching
18/21
Object Placement Problem
This kind of problem is known as an objectplacement problem What nodes do we put what files on?
Also related to the request routing problem Given the files currently on the nodes, what
node do we send this particular request to?
These problems are basically unsolved for
our scenario Analytical solutions have been done for very
simplified, somewhat different cases We suspect a useful analytic solution is
impossible here
-
7/28/2019 175 High Performance P2P Web Caching
19/21
Simulation
Too hard to solve analytically, so do asimulation
Goal is to explore different object placement
algorithms under realistic scenarios Also want to model the performance of the
whole system What cache hit ratios can we get?
How does number/quality of peers affect cachehit ratios?
How is user latency affected?
Built a pretty involved simulation in Erlang
-
7/28/2019 175 High Performance P2P Web Caching
20/21
Simulation Results
So far, encouraging! Main results using a heuristic object
placement algorithm Can load-balance without creating hotspots
up to about 90% of theoretical capacity Documents rarely requested more than once
from central server Close to theoretical optimum
-
7/28/2019 175 High Performance P2P Web Caching
21/21
Next Steps
Add more detail to simulation Node churn Better internet topology
Explore update strategies Obviously, an actual implementation would
be nice, but not likely to happen this week What do you think?
top related