performance analysis of a parallel downloading scheme from mirror sites throughout the internet

24
Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet Allen Miu, Eugene Shih 6.892 Class Project December 3, 1999

Upload: unity

Post on 19-Mar-2016

48 views

Category:

Documents


3 download

DESCRIPTION

Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet. Allen Miu, Eugene Shih 6.892 Class Project December 3, 1999. Overview. Problem Statement Advantages/Disadvantages Operation of Paraloading Goals of Experiment Setup of Experiment - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Performance Analysis of a Parallel Downloading Scheme from Mirror

Sites Throughout the Internet

Allen Miu, Eugene Shih6.892 Class ProjectDecember 3, 1999

Page 2: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Overview• Problem Statement• Advantages/Disadvantages• Operation of Paraloading• Goals of Experiment• Setup of Experiment• Current Results• Summary• Questions

Page 3: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Problem Statement: Is “Paraloading” Good?

Paraloading is the downloading from multiple mirror sites in parallel.

ParaloaderMirror B

Mirror C

Mirror A

Page 4: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Advantages of Paraloading• Performance is proportional to the realized

aggregate bandwidth of the parallel connections

• Less prone to complete download failures compared to the single connection download

• Facilitates dynamic load balancing among parallel connections

• Facilitates reliable, out-of-order delivery (similar to Netscape)

Page 5: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Disadvantages of Paraloading• Can be overly aggressive• Consumes more server resources • Overhead costs for scheduling, maintaining

buffers, and sending block request messages

• Only effective when mirror servers are available

Page 6: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Step 1: Obtain Mirror List

Paraloader

MirrorList

Mirror A Mirror B

Mirror C

•Hard-coded•DNS?

Page 7: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Step 2: Obtain File Length

Paraloader

Mirror A Mirror B

Mirror C

Page 8: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Step 3: Send Block Requests

Paraloader

Mirror A Mirror B

Mirror C

Page 9: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Step 4: Re-order

Paraloader

Mirror A Mirror B

Mirror C

Page 10: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Step 5: Send Next Request

Paraloader

Mirror A Mirror B

Mirror C

Page 11: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Goals of Experiment• Main goal: To compare the performance of

serial and parallel downloading• To verify the results of Rodriguez et al.• To examine whether varying the degree of

parallelism, the number of mirror servers used, affects performance

• To gain experience with paraloading and to find out what issues are involved in designing efficient paraloading systems

Page 12: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Experiment Setup• Implemented a paraloader application in Java,

using HTTP1.1 (range-requests and persistent connections)

• Files are downloaded at MIT from 3 different sets (kernel, mars, tucows) of 7 mirror servers

• Degree of parallelism examined: M = 1, 3, 5, 7• Downloaded a 1MB and a 300KB file (S =

1MB, 300KB) in 1 hour intervals for 7 days• Block Size = 32KB

Page 13: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Results• Paraloading decreases download time over

the average single connection case• Speedup is far from optimal case (aggregate

bandwidth)– Block request gaps result in wasted

bandwidth• Gaps are proportional to RTT

– Congestion at client? Possible but unlikely.

Page 14: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

S = 1MB

Page 15: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

S = 1MB

Page 16: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

S - 763KS = 763KB, B = 30, M = 4

Page 17: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Acknowledgements

• Dave Anderson• Dorothy Curtis• Wendi Heinzelmann• WIND Group

Page 18: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Questions

Page 19: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet
Page 20: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Summary of Contributions

• Implemented a paraloader• Verified that paraloading indeed provides

performance gain… sometimes– Increasing degree of parallelism improves

overall performance• Performance gains are not as good as those

reported by Rodriguez et al.

Page 21: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Future Work

• Examine how block size affects performance gain

• Examine cost of paraloading• Implement and test various

optimization techniques• Perform measurements at different

client sites

Page 22: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Paraloading Will Not Be Effective In All Situations

• Clients should have enough “slack” bandwidth capacity to open more than one connection

• Parallel connections are bottleneck disjoint• Target data on mirror servers is consistent and static• Security and authentication services are installed

where appropriate• Data transport is reliable• Mirror locations are quickly and easily obtained

Page 23: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Step-by-step Process of the Block Scheduling Paraloading Scheme

1. Obtain a list of mirror sites2. Open a connection to a mirror server and

obtain file length3. Divide file length into blocks4. Send a block request to each open connection 5. Wait for a response6. Send a new block request to the first connection

that finished downloading a block7. Loop back to 5 until all blocks are retrieved

Page 24: Performance Analysis of a Parallel Downloading Scheme from Mirror Sites Throughout the Internet

Paraloading is Not a Well-studied Concept

• Byers et al. proposed using Tornado codes to facilitate paraloading.

• Rodriguez et al. proposed the block scheduling paraloading scheme that is used in our project