a workflow-aware storage system
DESCRIPTION
Emalayan Vairavanathan. A Workflow-Aware Storage System. Samer Al- Kiswany , Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu. Workflow Example - ModFTDock. Protein docking application Simulates a more complex protein model from two known proteins - PowerPoint PPT PresentationTRANSCRIPT
1
A Workflow-Aware Storage System
Emalayan Vairavanathan
Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu
2
Workflow Example - ModFTDock
• Protein docking application
• Simulates a more complex protein model from two known proteins
• Applications
Drugs design
Protein interaction prediction
Background – ModFTDock in Argonne BG/P
3
Backend file system (e.g., GPFS, NFS)
Scale: 40960 Compute nodes
File based communication
Large IO volumeWorkflow Runtime
Engine
1.2 M Docking
Tasks
IO rate : 8GBps= 51KBps / core
App. task
Local storage
App. task
Local storage
App. task
Local storage
App. task
Local storage
App. task
Local storage
4Source [Zhao et. al]
Background – Backend Storage Bottleneck
• Storage is one of the main bottlenecks for workflows
Montage workflow (512 BG/P cores, GPFS backend file system)
Data manage-ment30%
Execution29% Scheduling and
Idle 40%
Intermediate Storage Approach
5Backend file system (e.g., GPFS, NFS)
App. task
Local storage
App. task
Local storage
App. task
Local storage
Intermediate Storage
…
POSIX API
Workflow Runtime
EngineScale: 40960 Compute nodes
Stage In
Stage Out
Source [Zhao et. al] MTAGS 2008
6
Research Question
How can we improve the storage performance for workflow applications?
7
IO-Patterns in Workflow Applications – by Justin Wozniak et al PDSW’09
• Pipeline
• Broadcast
• Reduce
• Scatter
and Gather
Locality andlocation-aware scheduling
Replication
Collocation and location-aware scheduling
Block-level data placement
IO-Patterns in ModFTDock
• 1.2 M Dock, 12000 Merge and Score instances at large run• Average file size 100 KB– 75 MB
Stage - 1Broadcast
pattern
Stage - 2Reduce pattern
Stage - 3Pipelinepattern
8
ModFTDock
9
Research Question
How can we improve the storage performance for workflow applications?
Workflow-aware storage: Optimizing the storage for IO patterns
Our Answer
Traditional approach: One size fits allOur approach: File / block-level optimizations
10
Integrating with the workflow runtime engine
Backend file system (e.g., GPFS, NFS)
Workflow Runtime
Engine
App. task
Local storage
App. task
Local storage
App. task
Local storage
Workflow-aware storage (shared)
Compute Nodes
…
Stage In/Out
Storage hints(e.g., location information)
Application hints (e.g., indicating access patterns)
POSIX API
11
Outline
• Background
• IO Patterns
• Workflow-aware storage system: Implementation
• Evaluation
12
Implementation: MosaStore
• File is divided into fixed size chunks.
• Chunks: stored on the storage nodes.
• Manager maintains a block-map for each file
• POSIX interface for accessing the system
MosaStore distributed storage architecture
13
Implementation: Workflow-aware Storage System
Workflow-aware storage architecture
14
Implementation: Workflow-aware Storage System
• Optimized data placement for the pipeline pattern
Priority to local writes and reads
• Optimized data placement for the reduce pattern
Collocating files in a single storage node
• Replication mechanism optimized for the broadcast pattern
Parallel replication
• Exposing file location to workflow runtime engine
15
Outline
• Background
• IO Patterns
• Workflow-aware storage system: Implementation
• Evaluation
16
Evaluation - Baselines
MosaStore, NFS and Node-local storage
vs Workflow-aware storage
Backend file system (e.g., GPFS, NFS)
App. task
Local storage
App. task
Local storage
App. task
Local storage
Intermediate storage (shared)
Compute Nodes
…
Stage In/Out
MosaStore
NFS
Local storage
Workflow-aware storage
17
Evaluation - Platform
• Cluster of 20 machines. Intel Xeon 4-core, 2.33-GHz CPU, 4-GB RAM, 1-Gbps NIC, and a RAID-
1 on two 300-GB 7200-rpm SATA disks
• Backend storage NFS server Intel Xeon E5345 8-core, 2.33-GHz CPU, 8-GB RAM, 1-Gbps NIC, and
a 6 SATA disks in a RAID 5 configuration
NFS server is better provisioned
18
Evaluation – Benchmarks and Application
Synthetic benchmark
Application and workflow run-time engine ModFTDock
Workload Pipeline Broadcast Reduce
Small 100KB, 200KB, 10KB 100KB, 1KB 10KB, 100KB
Medium 100 MB, 200 MB, 1MB 100 MB, 1MB 10MB, 200 MB
Large 1GB, 2GB, 10MB 1 GB, 10 MB 100MB, 2 GB
19
Synthetic Benchmark - Pipeline
Average runtime for medium workload
Optimization: Locality and location-aware scheduling
20
Synthetic Benchmarks - Reduce
Optimization: Collocation and location-aware scheduling
Average runtime for medium workload
Synthetic Benchmarks - Broadcast
21
Optimization: Replication
Average runtime for medium workload
22
Not everything is perfect !
Average runtime for small workload (pipeline, broadcast and reduce benchmarks)
23
Evaluation – ModFTDock
ModFTDock workflow
Total application time on three different systems
24
Evaluation – Highlights
• WASS shows considerable performance gain with all the benchmarks on medium and large workload (up to 18x faster than NFS and up to 2x faster than MosaStore).
• ModFTDock is 20% faster on WASS than on MosaStore, and more than 2x faster than running on NFS.
• WASS provides lower performance with small benchmarks due to metadata overheads and manager latency.
25
Summary
Problem• How can we improve the storage performance for workflow
applications?
Approach• Workflow aware storage system (WASS)
From backend storage to intermediate storage Bi-directional communication using hints
Future work• Integrating more applications• Large scale evaluation
26
THANK YOUMosaStore: netsyslab.ece.ubc.ca/wiki/index.php/MosaStore
Networked Systems Laboratory: netsyslab.ece.ubc.ca