storkcloud - pdfs.semanticscholar.org€¦ · places job in queue for processing ! passes jobs ......
TRANSCRIPT
![Page 1: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/1.jpg)
StorkCloud Data Transfer Scheduling and
Optimization as a Service Presented by: Brandon Ross
![Page 2: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/2.jpg)
Contents 1. Introduction 2. Components 3. Optimization 4. Conclusion
2 of 23
![Page 3: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/3.jpg)
StorkCloud? ü Multi-protocol data transfer scheduler ü Remote metadata retrieval and
caching service ü Dynamic, protocol-agnostic transfer
optimization to improve speed ü All in the cloud – accessible through
thin client GUIs and public REST API
3 of 23
![Page 4: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/4.jpg)
Why StorkCloud? ü Storage and computation are both in the
cloud – cloud transfer is the missing link ü Data transfer is usually inconvenient – Many different protocols – Many different applications
ü Transferring large files requires monitoring whole process – StorkCloud is “fire-and-forget”
4 of 23
![Page 5: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/5.jpg)
Why optimize? ü Transfers are usually suboptimal
– Inadequacies of underlying protocols – End-system misconfiguration
ü Not designed for high-speed networks ü Some applications can be specially
configured, but it’s mostly guesswork ü Network environments can vary – dynamic
optimization is important ü StorkCloud aims to solve these problems
5 of 23
![Page 6: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/6.jpg)
Who? ü Scientists in data-oriented fields
(astrophysics, genomics, climatology, biochemistry, etc.)
ü Data centers looking to outsource replication and data placement
ü Application developers who might want to offload data transfer tasks
ü Anyone with a lot of data to move
6 of 23
![Page 7: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/7.jpg)
Similar Work ü Only similar service we know of is
Globus Online – Mature, popular service
• Over 18.3 petabytes transferred! – Designed to support FTP and GridFTP – Statically optimized transfers – No prefetching/caching available for
directory listings
7 of 23
![Page 8: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/8.jpg)
Contents 1. Introduction 2. Components 3. Optimization 4. Conclusion
8 of 23
![Page 9: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/9.jpg)
9 of 23
![Page 10: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/10.jpg)
Stork Data Scheduler ü Accepts file transfer jobs – source,
destination, and other options ü Places job in queue for processing ü Passes jobs off to transfer module ü Job status can be queried by clients ü Currently first come, first served
10 of 23
![Page 11: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/11.jpg)
Directory Listing Service ü Conceptually: unified metadata
interface to many file systems ü Retrieves file and directory metadata
from remote file systems ü Returns results as JSON ü Uses caching and prefetching to
improve responsiveness
11 of 23
![Page 12: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/12.jpg)
12 of 23
DLS Performance
![Page 13: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/13.jpg)
Transfer Modules ü Pluggable transfer modules perform
transfers for specific protocols ü Communicate progress and
messages back to scheduler ü Either Java bytecode or external
executable – can be any language!
13 of 23
![Page 14: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/14.jpg)
Client Interfaces ü Thin clients communicate with server REST API
using JSON – Starting or canceling transfers, or querying transfer
status – Browsing remote directories – Initializing credentials (e.g. GSI proxies)
ü Currently have Android and web applications, and command line tools
ü Our GUI applications can browse remote files and check transfer progress
14 of 23
![Page 15: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/15.jpg)
Contents 1. Introduction 2. Components 3. Optimization 4. Conclusion
15 of 23
![Page 16: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/16.jpg)
Optimization in StorkCloud ü Each pluggable optimizer is an
implementation of a different algorithm ü Each targets a set of file transfer
parameters ü Feedback loop; optimizer and TM work
together to optimize transfer ü Optimizers are protocol-agnostic – only
care about whether transfers support targeted parameters
16 of 23
![Page 17: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/17.jpg)
Parameters and Techniques ü Pipelining – “queuing up” transfer
commands at a remote system ü Parallelism – transferring file data
over multiple connections ü Concurrency – transferring multiple
files at a time
17 of 23
![Page 18: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/18.jpg)
18 of 23
Parameters Visualized
![Page 19: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/19.jpg)
Algorithms ü Optimal parallelism prediction: – Samples points, performs regression
analysis to predict optimal parallelism – 2nd order and c-order analysis
ü Parallelism-Concurrency-Pipelining – Uses historical database and clustering
19 of 23
![Page 20: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/20.jpg)
Algorithms ü Single/Multi Chunk Concurrency
– Newest algorithms, designed for multi-file transfers with mixed sizes
– Partition file sets into “chunks” based on file size and transfer concurrently on multiple channels
– Each channel is configured heuristically – SCC: transfer chunks one at a time, split across all
concurrent channels – MCC: each chunk gets a dedicated channel;
reduces effect of small files
20 of 23
![Page 21: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/21.jpg)
Contents 1. Introduction 2. Components 3. Optimization 4. Conclusion
21 of 23
![Page 22: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/22.jpg)
22 of 23
SCC/MCC Performance
0 500
1000 1500 2000 2500 3000 3500 4000 4500 5000
1 2 4 6 8 10
Thro
ughp
ut (M
bps)
Concurrency
(a) (XSEDE)Globus-Online
SCCMCC
SCC_bufferMCC_buffer
0
100
200
300
400
500
600
700
800
900
1 2 4 6 8 10
Thro
ughp
ut (M
bps)
Concurrency
(b) (LONI)SCCMCC
XSEDE LONI
![Page 23: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/23.jpg)
23 of 23
Effects of Parameters
![Page 24: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/24.jpg)
Future Work ü Additional scheduling algorithms and priority ü Date-based scheduling and routine jobs ü Direct file upload/download through client
interfaces – currently only 3rd party transfers ü SSL/TLS for secure communications (HTTPS) ü Additional protocols (SFTP, HTTP, AFP, etc.) ü Temporary file parking – storage on server in
case of destination issues ü Historical performance database
24 of 23
![Page 25: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/25.jpg)
The End Thank you! Any questions?
![Page 26: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/26.jpg)
26
![Page 27: StorkCloud - pdfs.semanticscholar.org€¦ · Places job in queue for processing ! Passes jobs ... from remote file systems ! Returns results as JSON ! Uses caching and prefetching](https://reader034.vdocuments.site/reader034/viewer/2022050507/5f986da2e02f990a610657de/html5/thumbnails/27.jpg)
27