performance analysis of a parallel downloading scheme from mirror sites throughout the internet...
TRANSCRIPT
Performance Analysis of a Parallel Downloading Scheme from Mirror
Sites Throughout the Internet
Allen Miu, Eugene Shih6.892 Class Project
December 3, 1999
Overview• Problem Statement• Advantages/Disadvantages• Operation of Paraloading• Goals of Experiment• Setup of Experiment• Current Results• Summary• Questions
Problem Statement: Is “Paraloading” Good?
Paraloading is the downloading from multiple mirror sites in parallel.
Paraloader
Mirror B
Mirror C
Mirror A
Advantages of Paraloading• Performance is proportional to the realized
aggregate bandwidth of the parallel connections
• Less prone to complete download failures compared to the single connection download
• Facilitates dynamic load balancing among parallel connections
• Facilitates reliable, out-of-order delivery (similar to Netscape)
Disadvantages of Paraloading
• Can be overly aggressive
• Consumes more server resources
• Overhead costs for scheduling, maintaining buffers, and sending block request messages
• Only effective when mirror servers are available
Goals of Experiment
• Main goal: To compare the performance of serial and parallel downloading
• To verify the results of Rodriguez et al.• To examine whether varying the degree of
parallelism, the number of mirror servers used, affects performance
• To gain experience with paraloading and to find out what issues are involved in designing efficient paraloading systems
Experiment Setup• Implemented a paraloader application in Java,
using HTTP1.1 (range-requests and persistent connections)
• Files are downloaded at MIT from 3 different sets (kernel, mars, tucows) of 7 mirror servers
• Degree of parallelism examined: M = 1, 3, 5, 7• Downloaded a 1MB and a 300KB file (S =
1MB, 300KB) in 1 hour intervals for 7 days• Block Size = 32KB
Results• Paraloading decreases download time over
the average single connection case• Speedup is far from optimal case (aggregate
bandwidth)– Block request gaps result in wasted
bandwidth• Gaps are proportional to RTT
– Congestion at client? Possible but unlikely.
Summary of Contributions
• Implemented a paraloader
• Verified that paraloading indeed provides performance gain… sometimes– Increasing degree of parallelism improves
overall performance
• Performance gains are not as good as those reported by Rodriguez et al.
Future Work
• Examine how block size affects performance gain
• Examine cost of paraloading• Implement and test various
optimization techniques• Perform measurements at different
client sites
Paraloading Will Not Be Effective In All Situations
• Clients should have enough “slack” bandwidth capacity to open more than one connection
• Parallel connections are bottleneck disjoint• Target data on mirror servers is consistent and static• Security and authentication services are installed
where appropriate• Data transport is reliable• Mirror locations are quickly and easily obtained
Step-by-step Process of the Block Scheduling Paraloading Scheme
1. Obtain a list of mirror sites2. Open a connection to a mirror server and
obtain file length3. Divide file length into blocks4. Send a block request to each open connection 5. Wait for a response6. Send a new block request to the first connection
that finished downloading a block7. Loop back to 5 until all blocks are retrieved