![Page 1: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/1.jpg)
A Comparative Experimental Study of
Parallel File Systems for Large-Scale Data
Processing
Z. Sebepou, K. Magoutis, M. Marazakis, A. Bilas
Institute of Computer Science (ICS)
Foundation for Research and Technology Hellas (FORTH)
Heraklion, Crete, Greece
![Page 2: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/2.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 2
Evolution
• Distributed file systems
– NFS versions 2, 3, 4; AFS; etc.
• Shared-disk parallel file systems
– Frangipani/Petal; GPFS; GFS; etc.
• Separating data/metadata paths to object storage
– NASD; pNFS; Panassas; Lustre; PVFS; etc.
![Page 3: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/3.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 3
NASD-style Parallel File Systems
Open, Lookup, etc.Read, Write
![Page 4: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/4.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 4
Lustre, PVFS
• Open-source systems, following the NASD paradigm
– Lustre: Cluster File Systems, Inc.; acquired (2007) by Sun
– PVFS: Clemson University, ANL, OSC
• Targeted for large-scale data processing
– LLNL, ORNL, ANL, CERN
• Representative of different approaches to filesystem design
– Client caching
– Statelessness
– Consistency and file access semantics
– Portability
![Page 5: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/5.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 5
Lustre Architecture
Linux ext3 (modified)Linux ext3 (modified)
Intent-based locking
POSIX semantics
Pre-allocation of MD objs
active/backup
Stripping
Data/MD
CachingTCP/IP, RDMA, etc.
Kernel-based
Client Structure
![Page 6: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/6.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 6
PVFS2 Architecture
Stripping
No Data Caching
DBPF (“database plus files”)Any Linux file system
No file locking;
“non-conflicting writes”
multiple active
User-level
Client Structure
TCP/IP, RDMA, etc.
![Page 7: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/7.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 7
Benchmarks
• Streaming; one client, many servers
– IOZone
• Streaming: many clients, many servers
– Parallel I/O (MPI)
• Metadata-intensive
– Cluster PostMark
• Near-random I/O, optionally with data overlap
– Tile I/O (MPI)
• User-perceived response time
– ls –lR on Linux kernel tree
![Page 8: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/8.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 8
Experimental Testbed
PVFS 2.6.3
Lustre 1.6.0.1
MPICH 2Metadata Server
Object Server Object Server Object Server Object Server
Linux ext3 or
modified ext3
Lustre stripe: 64KB, 256KB, 1MB (default)
PVFS2 stripe: 64KB (default)
![Page 9: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/9.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 9
Streaming: One Client, Many Servers
(IOZone Benchmark)
PVFS 2.6.3
Lustre 1.6.0.1
MPICH 2Metadata Server
Object Server Object Server Object Server Object Server
Linux ext3 or
modified ext3
Lustre stripe: 64KB, 256KB, 1MB (default)
PVFS2 stripe: 64KB (default)
1, 4 threads
Block sizes 1KB, 64KB, 1MB, 4MB
![Page 10: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/10.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 10
Streaming: One Client, Many Servers
(IOZone Benchmark) – Lustre
4 Threads
1 Thread
1 client, 12 servers
95% CPU
70% CPU
![Page 11: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/11.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 11
Streaming: One Client, Many Servers
(IOZone Benchmark) – PVFS
1 Thread
4 Threads
1 client, 12 servers
50% CPU
20% CPU
![Page 12: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/12.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 12
Streaming: Many Clients, Many Servers
(Parallel I/O Benchmark)
PVFS 2.6.3
Lustre 1.6.0.1
MPICH 2Metadata Server
Object Server Object Server Object Server Object Server
Linux ext3 or
modified ext3
Lustre stripe: 64KB, 256KB, 1MB (default)
PVFS2 stripe: 64KB (default)
Block sizes 64KB-16MB
1. Writes to separate files
2. Reads from single file
3. Read-modify-write to single file
![Page 13: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/13.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 13
Streaming: Many Clients, Many Servers
(Parallel I/O Benchmark)
Lustre
PVFS2
Writes to Separate Files; 12 clients, 12 servers
![Page 14: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/14.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 14
Streaming: Many Clients, Many Servers
(Parallel I/O Benchmark)
Lustre
PVFS2
Reads from Single File; 12 clients, 12 servers
Lustre
PVFS2
![Page 15: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/15.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 15
Cluster PostMark
• Three configurations:
(a) Few (100), large (10-100MB) files
(b) Moderate number (800) of medium-size (1-10MB) files
(c) Many (8000), small (4-128KB) files
![Page 16: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/16.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 16
Cluster PostMark – Bandwidth
Block Size
12 clients, 12 servers
![Page 17: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/17.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 17
Cluster PostMark – Transactions
Block Size
12 clients, 12 servers
![Page 18: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/18.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 18
Near-random I/O, optionally with data overlap
(Tile I/O)
block element tile
• two-dimensional logical structure overlaid on a single file
• each tile assigned to a separate client process
![Page 19: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/19.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 19
Tile I/O - Reads
![Page 20: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/20.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 20
Tile I/O - Writes
![Page 21: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/21.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 21
User-Perceived Response Time
80PVFS2
58Lustre
5.5Linux ext3 (local)
Response Time (sec)File System
ls –lR on Linux 2.6.12 kernel tree (~25,000 files)
![Page 22: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/22.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 22
Conclusions
• Scalable I/O bandwidth is achievable through
parallel I/O paths to file servers
• Lustre’s efficient metadata management is critical
for metadata-intensive applications
• Lustre’s consistency semantics are useful to some
applications but cause unnecessary overhead to
others that do not require them
![Page 23: A Comparative Experimental Study of Parallel File Systems for … · 2019-02-25 · A Comparative Experimental Study of Parallel File Systems for Large-Scale Data Processing Z. Sebepou,](https://reader033.vdocuments.site/reader033/viewer/2022041819/5e5c81f3a31fc119506817c1/html5/thumbnails/23.jpg)
FORTH-ICS First USE�IX LaSCo Workshop 2008 23
Thank You!