parallel file system simulator in order to test the parallel file system (pfs) scheduling algorithms...
TRANSCRIPT
1HEC FSIO 2009
Parallel File System Simulator
In order to test the Parallel File System (PFS) scheduling algorithms in a light-weighted approach, we have developed the PFS simulator.
We use discrete event simulator and accurate disk modeling software to construct an agile parallel file system simulator.
This simulator is capable of simulating enough details while having an acceptable simulation time.
tests. tests.
2HEC FSIO 2009
Existing PFS simulators
The IMPIOUS simulator, by E Molina-Estolano, al. et[1]. It does not model the metadata server. The scheduler modules are lacking, so scheduling
algorithms are hard to model. The simulator developed in PVFS improvement paper
by Carns P. H., al. et[2]. It over simulates the network, which extends the
simulation time. It uses real PVFS in simulation, which introduces too much
details, while not flexible.
3HEC FSIO 2009
Simulator Details
We use the discrete event simulation library OMNeT++ 4.0 to simulate the network. It is capable of simulating the network topology with
bandwidth and delay. We use Disksim to simulate the data server disks.
Disksim accurately estimates the time for data transactions on the physical disks.
Disksim allows users to extract disk characteristics from real disks.
4HEC FSIO 2009
Simulator Architecture
Client
trace
MetadataServer
StrippingStrategy
Client
trace
Client
trace
Simulated Network
OMNeT++
Disksims instances
Data ServersSchedulingAlgorithm
Socket Connections
MetadataServer
Local FS
Diskq
ue
ue
Local FS
Disk
qu
eu
e
Local FS
Disk
qu
eu
eoutput
New request arrives.
Query file layout.
Send the request to data server.
Dispatch the job.
Send the results back.
5HEC FSIO 2009
Implementation of Scheduling Algorithms
Similarly, we have also implemented FIFO, Distributed SFQ[3], MinSFQ[4] algorithms, etc.
/* SFQ algorithm, at each data server */systime = 0waitQ.initiate()while(!simulation_end) { if reqArrive(), then: R = getReq() R.start_tag = min { R.getPrevReq().finish_tag, systime } R.finish_tag = R.start_tag + R.cost / R.getFlow().weight pushReq(R, waitQ) if diskHasSlot(), then: R = popReqwithMinStartTag(waitQ) systime = R.start_tag dispatch(R)}
The request from client arrives at the data server.
Disksim tells OMNet++ that it still has free slot.
OMNet++ dispatches the request to Disksim.
6HEC FSIO 2009
Simulation 1: Weighted Scheduling
2 groups of clients, 16 each. 4 Data servers and 1 Metadata server. Parallel Virtual File System(v2) is modeled. All files striped to all servers. Stripe size: 256KB. The trace files are generated by IOR.
Each client has 100MB checkpoint-write operation. SFQ with different weight assignments have been
simulated. The simulated weight ratios are 1:1, 2:1 and 10:1.
7HEC FSIO 2009
Throughput ResultsFIFO
0
10000
20000
30000
40000
50000
60000
70000
1 3 5 7 9 11 13 15 17 19 21 23 25
Time (s)
Th
rou
gh
pu
t(b
yte
/s)
Group 1
Group 2
SFQ (1:1)
0
10000
20000
30000
40000
50000
60000
70000
1 3 5 7 9 11 13 15 17 19 21 23 25
Time (s)
Th
rou
gh
pu
t(b
yte
/s)
Group 1
Group 2
SFQ (2:1)
0
20000
40000
60000
80000
100000
1 3 5 7 9 11 13 15 17 19
Time (s)
Th
rou
gh
pu
t(b
yte
/s)
Group 1
Group 2
SFQ (10:1)
0
20000
40000
60000
80000
100000
120000
1 3 5 7 9 11 13
Time (s)
Th
rou
gh
pu
t(b
yte
/s)
Group 1
Group 2
8HEC FSIO 2009
Simulation 2: Scheduling Fairness
Same as Simulation 1, except that the two groups do not have the same resources: Group 1’s files are stripped to all servers [1, 2, 3, 4]. Group 2’s files are stripped to 3 servers [1, 2, 3].
We add Distributed SFQ algorithm[3], in which all servers share the scheduling information.
We are able to show that if the workloads are not evenly distributed, the Distributed SFQ algorithm achieves better fairness than FIFO and SFQ.
9HEC FSIO 2009
Throughput Results
FIFO
0
10000
20000
30000
40000
50000
60000
70000
1 3 5 7 9 11 13 15 17 19 21 23 25
Time (s)
Th
rou
gh
pu
t(b
yte
/s)
Group 1
Group 2
SFQ (1:1)
0
10000
20000
30000
40000
50000
60000
70000
1 3 5 7 9 11 13 15 17 19 21 23 25
Time (s)
Th
rou
gh
pu
t(b
yte
/s)
Group 1
Group 2
DSFQ (1:1)
0
10000
20000
30000
40000
50000
60000
70000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Time (s)
Th
rou
gh
pu
t(b
yte
/s)
Group 1
Group 2
10HEC FSIO 2009
References[1] E Molina-Estolano, C Maltzahn, J Bent and S A Brandt, “Building a parallel file system
simulator”, Journal of Physics: Conference Series 180 (2009) 012050.[2] Carns P H, Ligon W B, al. et. “Using Server-to-Server Communication in Parallel File Systems to
Simplify Consistency and Improve Performance”, Proceedings of the 4th Annual Linux Showcase and Conference (Atlanta, GA) pp 317-327.
[3] Yin Wang and Arif Merchant, “Proportional Share Scheduling for Distributed Storage Systems”, File and Storage Technologies (FAST’07), San Jose, CA, February 2007.
[4] W. Jin, J. S. Chase and J. Kaur, “Interposed Proportional Sharing For A Storage Service Utility”, SIGMETRICS, E. G. C. Jr., Z. Liu, and A. Merchant, Eds. ACM, 2004, pp. 37-48.