hamdy fadl1 بسم الله الرحمن الرحيم stork data scheduler: mitigating the data...
TRANSCRIPT
Hamdy Fadl 1
الرحيم الرحمن الله بسم
Stork
data scheduler:
mitigating the
data
bottleneck in e-ScienceHamdy Fadl
500512011
Dr. Esma YILDIRIM
Hamdy Fadl 2
Outline
- Introduction - The motivation of the paper
- what is stork- Previous studies
- Efforts to achieve reliabe and effecient data placement- Why stork data scheduler
-Initial end-to-end implementation using Stork -Piplineing
-Estimation of data movement speed -The effect of connection time in data transfers.
-Two unique features of Stork: -Request aggregation
-Throughput optimization -Concurrency vs. Parallelism
-Conclusion
Hamdy Fadl 4
Introduction
•TThe inadequacy of traditional distributed computing he inadequacy of traditional distributed computing systems in dealing with complex data-handlingsystems in dealing with complex data-handling problems in our new data-rich world has motivated this new paradigm
•Traditional distributed computing Data (resources 2nd class entity , access is a side-effect)
•placement ( delay , its script privileges)
•The DOE report says: – ‘It is essential that at all levels data placement tasks be
treated in the same way computing tasks are treated’
•research and development is still required to create schedulers and planners for storage space allocation and the transfer of data
Hamdy Fadl 5
The motivation of the paper
•Enhancing the data management infrastructure since: The users focus their attention on the information rather than how to discover, access, and use it.
Hamdy Fadl 6
What is Stork ?
•Stork is considered one of the very first examples of ‘data-aware scheduling’ and has been very actively used:
•in many e-Science application areas including: – coastal hazard prediction– storm surge modeling ; – oil flow and reservoir uncertainty analysis ; – Digital sky imaging ; – educational video processing and behavioral
assessment ;– numerical relativity and black hole collisions ; – and multi-scale computational fluid dynamics
Hamdy Fadl 7
Using Stork, the users Using Stork, the users can transfer very large can transfer very large datasets via a single datasets via a single
commandcommand
Hamdy Fadl 8
Previous Studies •S
everal previous studies address data management for large-scale applications. However, scheduling of data storage scheduling of data storage and networking resources and and networking resources and optimization of data transfer tasksoptimization of data transfer tasks has
been an open problemproblem.
Hamdy Fadl 9
Previous studies Studies for optimal number of streams
Hacker et al. model can’t predict congestion
Lu et al need two different throughput measurements of different stream numbers to predict the others.
Others , three streams are sufficient to get a 90 per cent utilization.
Hamdy Fadl 10
•The studies for optimal number of streams for data scheduling are limited and , based on approximate theoretical models
•NNone of the existing studies are able to one of the existing studies are able to accurately predict the optimal accurately predict the optimal numbernumber of parallel streams for best data of parallel streams for best data throughput in a congested network.throughput in a congested network.
Previous Studies
Hamdy Fadl 11
DRS :data replication serviceLDR :lightweight data replicatorRFT :reliable file transfer
GridFTP (7app)
RFTLDRDRS
Grid file transfer protocol
Byte streams to be transferred in a reliable manner
•secure and reliable data transfer protocol especially developed for high-bandwidth WANs.
Effort to achieve reliable and efficient data placement
Hamdy Fadl 12
Why Stork data scheduler!
Implements Techniques for
Scheduling
Queuing
optimization of data placement job
Hamdy Fadl 13
Why Stork data scheduler!p
rovides:
•hhigh reliability in data transfers and igh reliability in data transfers and
•ccreates a level of abstraction between the reates a level of abstraction between the user applications and the underlying data user applications and the underlying data transfer and storage resourcestransfer and storage resources via a modular, uniform interface.
–ReliabilityReliability–abstractionabstraction
Hamdy Fadl 14
•The checkpointing, error recovery and retry mechanisms ensure the completioncompletion of the tasks even in the case of unexpected failures.
•Multi-protocol support makes Stork a very powerful data transfer tool.
•This feature allows Stork not only to access and manage different data storage systems, but also to be used as a fall-backfall-back mechanism when one of the protocols fails in transferring the desired data.
Why Stork data scheduler!
Hamdy Fadl 15
•Optimizations such as concurrent transfers, parallel streaming ,request aggregation and data fusion provide enhanced performance compared with other data transfer tools.
•Interact with higher level planners and workflow managers for the coordination of compute and data tasks.
•This allows the users to schedule both CPU resources and storage resources asynchronously as two parallel universes, overlapping computation and I/O.
Why Stork data scheduler!
Hamdy Fadl 16
Initial end-to-end implementation using Stork
data subsystem(data management and scheduling)
compute subsystem(compute management and scheduling)
Hamdy Fadl 17
Initial end-to-end implementation using Stork
•In cases where multiple workflows need to be executed on the same system, users may want to prioritize among multiple workflows or make other scheduling decisions. This is handled by the workflow scheduler at the highest level.
• An example for such a case would be hurricanes of different urgency levels arriving at the same time.
Hamdy Fadl 18
Pipelining
Generic 4-stage pipeline; the colored boxes represent instructions independent of each other
different types of jobs and execute them concurrently while preserving the task sequence and dependenciesdifferent types of jobs and execute them concurrently while preserving the task sequence and dependencies,
Hamdy Fadl 19
How to use Pipelining here ?
A distributed workflow system can be viewed as a large pipeline consisting large pipeline consisting of many tasks divided into sub- of many tasks divided into sub- stagesstages, where the main bottleneck is remote data access/retrieval owing to network latency and communication overhead.
Hamdy Fadl 20
How to use Pipelining here ?•B
y Pipelining we can order and schedule data movement jobs in distributed systems independent of compute tasks to exploit parallel execution while preserving data dependencies.
•The scientific workflows will include the necessary data-placement stepsnecessary data-placement steps, such as– stage-in and stage-out, as well as other
important steps that support data movement,
– such as allocating and de-allocating storage space for the data and reserving and releasing the network links.
Hamdy Fadl 21
Estimation of data movement speed
In order to estimate the speed at which data movement can take place, it is necessary to estimate the bandwidth capability at the source storage system, at the target storage system and the network in between.
Hamdy Fadl 22
The effect of connection time in data transfers.
+Plus symbols, file size 10MB (rrt5.131 ms); X crosses, file size 10MB (rrt 0.548 ms); * asterisks, file size 100MB (rrt 5.131 ms); □squares, file size 100MB (rrt 0.548 ms); ○circles, file size 1GB (rrt 0.548 ms).
Hamdy Fadl 23
The effect of connection time in data transfers.
Plus symbols, file size 10MB(rrt 5.131 ms); crosses, file size 10MB (rrt 0.548 ms); asterisks, file size 100MB (rrt 5.131 ms);squares, file size 100MB (rrt 0.548 ms); circles, file size 1GB (rrt 0.548 ms); filled squares, file size1GB (rrt 5.131 ms).
Hamdy Fadl 24
Two unique features of Stork:
•aggregation of data transfer jobs considering their source and destination addresses
•and the application-level throughput estimation and optimization service.–Aggregation–Estimation and optimization
Hamdy Fadl 25
Stork Feature :Request aggregation
•In order to minimize the overhead of connection set-up and tear-down for each transfer.
•(MP , insp) data placement jobs are combined and embedded into a single request in which we increase the overall performance especially for transfer of datasets with small file sizes.– minimize the overhead– increase the overall performance
Hamdy Fadl 26
How to aggregate •A
ccording to the file size and source/destination pairs, data placement requests are combined and processed as a single transfer job.
•Information about aggregated requests is stored in a transparent manner.
•A main job that includes multiple requests is defined virtually and it is used to perform the transfer operation
•Therefore, users can query and get status reports individually without knowing that their requests are aggregated and being executed with others.
Hamdy Fadl 27
How to aggregate
Req1 Size 1
Source/Dest.1
Req2 Size 1
Source/Dest.1
Req3Size 1
Source/Dest.1
Req4Size 1
Source/Dest.1
Aggregated Request
Hamdy Fadl 28
How to aggregation •C
onnection time is small but it becomes quite important if there are hundreds of jobs to be scheduled.
•We have minimized the load put by tsetup by multiple transfer jobs. Instead of having separate set-up operations such as: – t1 =tsetup + t1transfer, t2 =tsetup +t2transfer, . . .
, tn =tsetup + tntransfer,
•we aggregate multiple requests to improve the total transfer time– t =tsetup + t1transfer + t2transfer +· · ·+tntransfer.
Hamdy Fadl 29
Benifits of aggregation •A
ggregating data placement jobs and combining data transfer requests into a single operation also has its benefits in benefits in terms of improving the overall scheduler terms of improving the overall scheduler performance, by reducing the total number performance, by reducing the total number of requests that the data scheduler needs of requests that the data scheduler needs toto execute separately.– Instead of initiating each request one at a time, it
is more beneficial to execute them as a single operation if they are requesting data from the same storage site or using the same protocol to access data.
Hamdy Fadl 30
Request aggregation and the overhead of connection time in data transfers. fnum, numberof data files sent in a single transfer operation.
The test environment includes host machines from the Louisiana Optical Networking
Initiative) LONI (network .Transfer operations are performed using the GridFTP
protocol .The average round-trip delay times between test hosts are 5.131 and 0.548 msAs latency increases, the effect of overhead in
connection time increases .We see better throughput results with aggregated
requests .The main performance gain comes from
decreasing the amount of protocol usage and
reducing the number of independent network connections.
Results:
Hamdy Fadl 31
Note about aggregation results:
•We have successfully applied job aggregation in the Stork scheduler such that the total throughput is increased by reducing the number of transfer operations
Hamdy Fadl 32
Stork Feature: Throughput optimization
•In data scheduling, the effective the effective use of available network use of available network throughput and optimization of throughput and optimization of data transfer speeddata transfer speed is crucial for end-to-end application end-to-end application performance.performance.
Hamdy Fadl 33
How to do throughput optimization
•By opening parallel streams and setting the optimal parallel stream number specific to each transfer.
•The intelligent selection of this number is based on a novel mathematical model they have developed that predicts the peak point of the throughput of parallel streams and the corresponding stream number for that value.
Hamdy Fadl 34
The throughput of n streams (Thn) is calculated by the following equation:
The unknown variables a, b and c are calculated based on the throughputsamplings of three different parallelism data points.
The throughput increases as the stream number is increased, it reaches its peak point either when congestion occurs or the end-system capacities are reached.
Further increasing the stream number does not increase the throughput but it causes a drop down owing to losses.
Hamdy Fadl 35
The characteristics of the throughput curve for local area network (LAN) and WAN transfers
LAN–LAN Newton’s method model
Red solid line, GridFTPGreen dashed line, Newton 1_8_16; Blue dashed line, Dinda 1_16.
Hamdy Fadl 36
The characteristics of the throughput curve for local area network (LAN) and WAN transfers
LAN–WAN Newton’s method model.
Hamdy Fadl 37
Notes about The characteristics of the throughput curve
•In both cases, the throughput has a peak point that indicates the optimal parallel stream number and it starts to drop down for larger numbers of streams.
Hamdy Fadl 38
Concurrency•I
n cases where network optimization is not sufficient and the end-system characteristics play a decisive role in limiting the achievable throughput, making concurrent requests for multiple data transfers can improve the total throughput.
•The Stork data scheduler also enables the users to set a concurrency level in the server set-up and provides multiple data transfer jobs to be handled concurrently.– However, it is up to the user to set this
property.
Hamdy Fadl 39
Concurrency vs. Parallelism •P
arallelism : you have a single file and you divide this file to multiple stream ) for example send it to a TCP connection(
•Concurrency : we have multiple files are going to multiple destinations
Hamdy Fadl 40
Concurrency versus parallelism in memory-to-
memory transfers
Dsl-condor–Eric 100 Mbps . Fig 3.a
Red solid line, con-level=1;Green dashed line, con-level=2;Blue dashed line, con-level=4.
In figure 3.a, increasing the concurrency level does not provide a more significant improvement than increasing the parallel stream values.
Thus, the throughput of four parallel streams is almost as good as the throughput of concurrency levels 2 and 4 because the throughput is bound by the interface and the CPU is fast enough and does not present a bottleneck.
Results
Hamdy Fadl 41
Concurrency versus parallelism in memory-to-memory transfers
Oliver–Poseidon 10 Gbps. Fig 3.bRed solid line, con-level=1;Green dashed line, con-level=2;Blue dashed line, con-level=4.
With parallel streams, they were able to achieve a throughput value of 6 Gbps. However, when they increase the concurrency level, they were be able to achieve around 8 Gbps with a combination of a concurrency level of 2
and a parallel stream number of 32 .This significant increase is due to the fact that a single CPU reaches its upper limit with a single request but, through concurrency, multiple CPUs are used until the network limit is reached.
Results
Hamdy Fadl 42
concurrency for disk-to-disk transfers
•The effect of concurrency is much more significant for disk-to-disk transfers where multiple parallel disks are available and managed by parallel file systems.
Hamdy Fadl 43
Concurrency versus parallelism in disk-to-disk
transfers
Red solid line, conlevel/datasize=1/12G; green dashed line, 2/6G;blue dashedline, 4/3G; magenta dashed line, 6/2G; cyan dashed-dotted line, 8/1.5G; yellow dasheddotted line, 10/1.2G.
Total throughput Fig 4.a
Parallel streams improve the throughput for a single concurrency level by increasing it from 500 to 750 Mbps. However, owing to serial disk access, this improvement is limited.Only by increasing the concurrencyconcurrency level can we improve the total throughputthroughput.
The throughput in figure 4a is increased to 2 Gbps at a concurrency level of 4 .After that point, increasing the concurrency level causes the throughput to be unstable with sudden
ups and downs in throughput; however, it is always around 2 Gbps .This value is due to the end-system CPU limitations.If we increase the node number better throughput results could be seen .
As we look into the figure for average throughput, the transfer speed per request falls as we increase the concurrency level (figure 4b).
Average throughput. Fig 4.b
Hamdy Fadl 44
Concurrency versus parallel stream optimization in the Stork data scheduler
Total throughput; fig 5.aAverage Throughput . Fig 5.b
Average Stream Number . Fig 5.c
The optimization service provides 750 Mbps throughput for a single concurrency level and it increases up to 2 Gbps for a concurrency
level of 4 .The average throughput decreases more steeply after that level
(figure 5b) .Also the optimal parallel stream number decreases and adapts to the concurrency level (figure 5.c)From the experiments, it can be seen that From the experiments, it can be seen that an appropriately chosen an appropriately chosen concurrency level may improve the transfer throughput concurrency level may improve the transfer throughput significantlysignificantly..
Results
Hamdy Fadl 45
Conclusion•T
heir results show that the optimizations performed by the Stork data scheduler help to achieve much higher end-to-end throughput in data transfers than non-optimized approaches.
•They believe that the Stork data scheduler and this new ‘data-aware distributed computing’ paradigm will have an impact on all traditional compute and data intensive e-Science disciplines, as well as new emerging computational areas in the arts, humanities, business and education that need to deal with increasingly large amounts of data.