performance analysis of a parallel downloading scheme from mirror sites throughout the internet
Post on 19-Mar-2016
48 Views
Preview:
DESCRIPTION
TRANSCRIPT
Performance Analysis of a Parallel Downloading Scheme from Mirror
Sites Throughout the Internet
Allen Miu, Eugene Shih6.892 Class ProjectDecember 3, 1999
Overview• Problem Statement• Advantages/Disadvantages• Operation of Paraloading• Goals of Experiment• Setup of Experiment• Current Results• Summary• Questions
Problem Statement: Is “Paraloading” Good?
Paraloading is the downloading from multiple mirror sites in parallel.
ParaloaderMirror B
Mirror C
Mirror A
Advantages of Paraloading• Performance is proportional to the realized
aggregate bandwidth of the parallel connections
• Less prone to complete download failures compared to the single connection download
• Facilitates dynamic load balancing among parallel connections
• Facilitates reliable, out-of-order delivery (similar to Netscape)
Disadvantages of Paraloading• Can be overly aggressive• Consumes more server resources • Overhead costs for scheduling, maintaining
buffers, and sending block request messages
• Only effective when mirror servers are available
Step 1: Obtain Mirror List
Paraloader
MirrorList
Mirror A Mirror B
Mirror C
•Hard-coded•DNS?
Step 2: Obtain File Length
Paraloader
Mirror A Mirror B
Mirror C
Step 3: Send Block Requests
Paraloader
Mirror A Mirror B
Mirror C
Step 4: Re-order
Paraloader
Mirror A Mirror B
Mirror C
Step 5: Send Next Request
Paraloader
Mirror A Mirror B
Mirror C
Goals of Experiment• Main goal: To compare the performance of
serial and parallel downloading• To verify the results of Rodriguez et al.• To examine whether varying the degree of
parallelism, the number of mirror servers used, affects performance
• To gain experience with paraloading and to find out what issues are involved in designing efficient paraloading systems
Experiment Setup• Implemented a paraloader application in Java,
using HTTP1.1 (range-requests and persistent connections)
• Files are downloaded at MIT from 3 different sets (kernel, mars, tucows) of 7 mirror servers
• Degree of parallelism examined: M = 1, 3, 5, 7• Downloaded a 1MB and a 300KB file (S =
1MB, 300KB) in 1 hour intervals for 7 days• Block Size = 32KB
Results• Paraloading decreases download time over
the average single connection case• Speedup is far from optimal case (aggregate
bandwidth)– Block request gaps result in wasted
bandwidth• Gaps are proportional to RTT
– Congestion at client? Possible but unlikely.
S = 1MB
S = 1MB
S - 763KS = 763KB, B = 30, M = 4
Acknowledgements
• Dave Anderson• Dorothy Curtis• Wendi Heinzelmann• WIND Group
Questions
Summary of Contributions
• Implemented a paraloader• Verified that paraloading indeed provides
performance gain… sometimes– Increasing degree of parallelism improves
overall performance• Performance gains are not as good as those
reported by Rodriguez et al.
Future Work
• Examine how block size affects performance gain
• Examine cost of paraloading• Implement and test various
optimization techniques• Perform measurements at different
client sites
Paraloading Will Not Be Effective In All Situations
• Clients should have enough “slack” bandwidth capacity to open more than one connection
• Parallel connections are bottleneck disjoint• Target data on mirror servers is consistent and static• Security and authentication services are installed
where appropriate• Data transport is reliable• Mirror locations are quickly and easily obtained
Step-by-step Process of the Block Scheduling Paraloading Scheme
1. Obtain a list of mirror sites2. Open a connection to a mirror server and
obtain file length3. Divide file length into blocks4. Send a block request to each open connection 5. Wait for a response6. Send a new block request to the first connection
that finished downloading a block7. Loop back to 5 until all blocks are retrieved
Paraloading is Not a Well-studied Concept
• Byers et al. proposed using Tornado codes to facilitate paraloading.
• Rodriguez et al. proposed the block scheduling paraloading scheme that is used in our project
top related