client configuration lustre benchmarking. client setup and package installation client lustre...

105
Client Configuration Lustre Benchmarking

Upload: earl-wheeler

Post on 13-Jan-2016

267 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Client ConfigurationLustre Benchmarking

Page 2: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

• Client Setup and Package Installation• Client Lustre Configuration• Client Tuning Parameters• Lustre Striping• Benchmarking Base-lining for MDRAID

• obdfilter-survery• IOR

Agenda

Page 3: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Client Configuration for SSHSetting up ssh keys

Page 4: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•All clients must have functioning SSH server which allows both direct root access and key based authentication.

•You still need to generate a master key on your head node, then copy this key into the ~/.ssh/authorized_keys file.

Required ssh keys on Clients

Page 5: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Login to the Client head node•Generate the master key on Client Head node

•# ssh-keygen -t rsaGenerating public/private rsa key pair.

Enter file in which to save the key (/root/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /root/.ssh/id_rsa.

Your public key has been saved in /root/.ssh/id_rsa.pub.

The key fingerprint is:

8c:d1:39:a7:68:a3:e1:5f:d9:95:b3:e6:13:6a:8e:cc [email protected]

•Read the generated public key from the head node•# cat ~/.ssh/id_rsa.pubssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAwZsS68UMWSXaybwAxnaHq30VIL0uM54VVgiJmTLZQ/qFhH0/GP6WTSUPk5U/eiRRc1Lhfp7AY3VWdKQ2wv084EMC+9uPuFht9ugOPaPI4yVFYskZ+NNYKb6v07hGW10wD25jMPZ/omxsVx1cHt25KlDc+FA2Wj1mxK6x61vQayPxQh4WFHhCgM30TsllrAB9SHh37+ookHTeY8xpQpbunRGCyBrRFqVLcusnho4P5zZrtSrKlPLjKIy1kg43hVgzSk6ae5FVSvaYQmubQb1Q31ftrwne7zqCLjfhudkgsETBDJtteWZPFUpRZYpbtvOkfCqa/XiSrOY8Xc/nxq0Dvw== [email protected]

How to setup SSH keys on the Client

Page 6: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Logon to each client and make the .ssh folder and copy the pub key generated on head node to each client.

•For example, looking at a sample configuration for one client, client2

[root@fvt-client1 ~]# ssh client2

root@client2's password: <type Xyratex>

Last login: Fri May 17 01:52:53 2013 from 10.106.54.18

[root@fvt-client2 ~]# mkdir -m 0700 ~/.ssh

[root@fvt-client2 ~]# cat >> ~/.ssh/authorized_keys

<Paste the key copied from Head node>

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAwZsS68UMWSXaybwAxnaHq30VIL0uM54VVgiJmTLZQ/qFhH0/GP6WTSUPk5U/eiRRc1Lhfp7AY3VWdKQ2wv084EMC+9uPuFht9ugOPaPI4yVFYskZ+NNYKb6v07hGW10wD25jMPZ/omxsVx1cHt25KlDc+FA2Wj1mxK6x61vQayPxQh4WFHhCgM30TsllrAB9SHh37+ookHTeY8xpQpbunRGCyBrRFqVLcusnho4P5zZrtSrKlPLjKIy1kg43hVgzSk6ae5FVSvaYQmubQb1Q31ftrwne7zqCLjfhudkgsETBDJtteWZPFUpRZYpbtvOkfCqa/XiSrOY8Xc/nxq0Dvw== [email protected]

<cltr-d>

]0;root@fvt-client2:~[root@fvt-client2 ~]# exit

Use the key to configured all clients

Page 7: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Client Configuration for pdshSetting up pdsh

Page 8: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Install pdsh on the head node•# yum install –y pdsh•/etc/hosts is already configured with hostname for you•How to use pdsh for each Group to Confirm if SSH is configured correctly

•Group 1: # pdsh –w super[00-03] date

Installing and using pdsh

Page 9: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Client Installation of MPI

Page 10: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•OpenMPI is needed to execute IOR on all clients using the command ‘mpirun’

•From the head node, run the following commands:[root@fvt-client1 .ssh]# pdsh -w client[1-12] ‘yum install -y openmpi openmpi-devel’

•Add openmpi to your path[root@fvt-client1 .ssh]# pdsh -w client[1-12] ‘ldconfig /usr/lib64/openmpi/lib’

root@fvt-client1 .ssh]# pdsh -w client[1-12] ’export PATH=$PATH:/usr/lib64/openmpi/bin:/usr/lib64/openmpi/lib’

•Check Path before continuing[root@fvt-client1 .ssh]# pdsh -w client[1-12] ‘echo $PATH’

•If Path is not correct, might need to look at shell and edit the .bashrc file for example and source the file, than copy to all clients

•export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib

Install openmpi on all nodes

Page 11: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Client InfiniBand InstallationStock OFED

Page 12: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•InfiniBand is essential to run IOR using IBoRDMA. From the head node, run the following command to install IB.

[root@fvt-client1 ~]# pdsh –w client [1-12] ‘yum groupinstall -y "Infiniband Support”’

[root@fvt-client1 ~]# pdsh –w client [1-12] ‘yum install -y infiniband-diags’

•Start RDMA and Bring Up ib0 interface[root@fvt-client1 ~]# pdsh –w client [1-12] ‘service rdma start’

[root@fvt-client1 ~]# pdsh –w client [1-12] ‘ifup ib0’

Install the OFED/InfiniBand Packages

Page 13: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Client Download and Installation of IOR

Page 14: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•From the Head node, download build IOR•To build IOR, need to install autoconf and make package, and development tools which installs the make utility[root@fvt-client1 ~]# yum install -y make autoconf[root@fvt-client1 ~]# yum groupinstall -y “Development Tools”

•Download the IOR tool[root@fvt-client1 ~]# wget http://downloads.sourceforge.net/project/ior-sio/IOR%20latest/IOR-2.10.3/IOR-2.10.3.tgz

Download and build IOR

Page 15: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•IOR is downloaded to the local directory, Untar/ungzip it and run the make utility to build IOR

[root@fvt-client1 ~]# tar -zxvf IOR-2.10.3.tgz

Goto the IOR Directory that was untarred/unzipped

[root@fvt-client1 IOR]# make

(cd ./src/C && make posix)

make[1]: Entering directory `/root/IOR/src/C’

mpicc -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -c IOR.c

mpicc -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -c utilities.c

mpicc -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -c parse_options.c

mpicc -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -c aiori-POSIX.c

mpicc -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -c aiori-noMPIIO.c

mpicc -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -c aiori-noHDF5.c

mpicc -g -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -c aiori-noNCMPI.c

mpicc -o IOR IOR.o utilities.o parse_options.o

aiori-POSIX.o aiori-noMPIIO.o aiori-noHDF5.o aiori-noNCMPI.o \

-lm

make[1]: Leaving directory `/root/IOR/src/C’

[root@fvt-client1 ~]# cp /root/IOR/src/C/IOR .

Download and build IOR

Page 16: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Copy IOR to all Clients[root@fvt-client1 ~]# pdcp –w client[2-12] IOR /root/.

•Confirm IOR linked with the correct library[root@fvt-client1 ~]# pdsh -w client[1-12] "ldd IOR | grep mpi"

client5: libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1 (0x0000003b45800000)

client7: libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1 (0x00000031e6c00000)

client8: libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1 (0x000000375a400000)

client6: libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1 (0x00000038e7200000)

client1: libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1 (0x0000003ebee00000)

client3: libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1 (0x00000033dbe00000)

client2: libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1 (0x0000003fe0000000)

client11: libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1 (0x00007fa077379000)

client12: libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1 (0x00007f8c5bf6e000)

client4: libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1 (0x00000030a8400000)

client9: libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1 (0x00007f8f658e3000)

client10: libmpi.so.1 => /usr/lib64/openmpi/lib/libmpi.so.1 (0x00007f475a019000)

Download and build IOR

Page 17: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Client Lustre Client InstallationInstalling Lustre Client RPMs

Page 18: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Two packages are required to be installed or built on the Clients

•lustre-client-modules-<release-version>.rpm -- Lustre Patchless Client Modules

•lustre-client-<release-version>.rpm -- Lustre utilities

•Needs to be confirmed through the site survey and work with ClusterStor Support organization to obtain the correct RPMs to install at the customer site

•If you don’t have a pre-build Lustre client for your particular client, than you will have to build the client using the SRC RPM package

•Download Clients from:•http://downloads.whamcloud.com/public/lustre/

Client Lustre Packages

Page 19: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

©Xyratex 2013

•Just to make sure:•Unmount Lustre (if already mounted)

•# umount /mnt/lustre

•Unload Lustre Modules•# lustre_rmmod

•Install Kernel development and Compilers• [root@hvt-super00 ]# yum install –y kernel-devel libselinux-devel rpm-build

• [root@hvt-super00 ]# yum groupinstall –y “Development Tools”

•Download Lustre SRC RPM and install it•# rpm -ivh --nodeps lustre-client-2.4.3-2.6.32_358.23.2.el6.x86_64.src.rpm1:lustre-client warning: user jenkins does not exist - using root

warning: group jenkins does not exist - using root

########################################### [100%]

warning: user jenkins does not exist - using root

warning: group jenkins does not exist - using root

• The above output states the rpmbuild directory for this client is using /root, only a warning

Building Lustre Clients

Page 20: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

©Xyratex 2013

•Go to the following directory and do the following[root@hvt-super00 SOURCES]# cd /root/rpmbuild/SOURCES/

[root@hvt-super00 SOURCES]# ls

lustre-2.4.3.tar.gz

[root@hvt-super00 SOURCES]# gunzip lustre-2.4.3.tar.gz

[root@hvt-super00 SOURCES]# tar xvf lustre-2.4.3.tar

•CD into the following directory:[root@hvt-super00 SOURCES]# cd lustre-2.4.3

[root@hvt-super00 lustre-2.4.3]# pwd

/root/rpmbuild/SOURCES/lustre-2.4.3

•Go to the SRC RPM Directory installation and run the command to build the RPM for your specific Kernel and assuming OS Stock OFED

•# make distclean

•# ./configure --disable-server --with-linux=/usr/src/kernels/2.6.32-358.el6.x86_64

•# make && make rpms

Building Lustre Clients

Page 21: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

©Xyratex 2013

•RPM just built can be found in the following directory:# /root/rpmbuild/RPMS/x86_64

•Two packages are required to be installed or built on the Clients

•lustre-client-modules-<release-version>.rpm -- Lustre Patchless Client Modules

•lustre-client-<release-version>.rpm -- Lustre utilities

•Copy the built RPMs on the other Clients and install, if all clients are the same OS and Kernel

# rpm –ivh lustre-client-modules-2.4.3-2.6.32_358.el6.x86_64.x86_64.rpm

# rpm –ivh lustre-client-2.4.3-2.6.32_358.el6.x86_64.x86_64.rpm

•NOTE: 2.4.3 can be 1.8.9 or 2.5.1

Install the Lustre Client RPMs

Page 22: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

©Xyratex 2013

•Edit or Create /etc/modprobe.d/lnet.conf•options lnet networks=“o2ib0(ib0)” – For IB Nodes•options lnet networks=“tcp(eth20)” – For Ethernet Nodes

•Install the two Lustre Client RPMs just build•Start Lustre

•# modprobe lustre

•Possible start RDMA on IB system and bringup ib0•# service rdma start•# ifup ib0

Configuration and Starting Lustre on Clients

Page 23: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

# ssh [email protected]

[root@tsesys2n00 ~]# cscli fs_info

-------------------------------------------------------------------------------------Information about "tsefs2" file system:

-------------------------------------------------------------------------------------Node Node type Targets Failover partner Devices

-------------------------------------------------------------------------------------tsesys2n02 mgs 0 / 0 tsesys2n03

tsesys2n03 mds 1 / 1 tsesys2n02 /dev/md66

tsesys2n04 oss 1 / 1 tsesys2n05 /dev/md0

tsesys2n05 oss 1 / 1 tsesys2n04 /dev/md1

tsesys2n06 oss 1 / 1 tsesys2n07 /dev/md0

tsesys2n07 oss 1 / 1 tsesys2n06 /dev/md1

[root@tsesys2n00 ~]# ssh tsesys2n02 'lctl list_nids'

172.21.2.3@o2ib

Mount Command from Clients:

mount –t lustre 172.21.2.3@o2ib:172.21.2.4@o2ib:/tsefs2 /mnt/tsefs2

CS6000 GridRAID System

Page 24: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

# ssh [email protected]

[root@hvt1sys00 ~]# cscli fs_info

----------------------------------------------------------------------------------------------------

Information about "fs1" file system:

----------------------------------------------------------------------------------------------------

Node Node type Targets Failover partner Devices

----------------------------------------------------------------------------------------------------

hvt1sys02 mgs 0 / 0 hvt1sys03

hvt1sys03 mds 1 / 1 hvt1sys02 /dev/md66

hvt1sys04 oss 4 / 4 hvt1sys05 /dev/md0, /dev/md2, /dev/md4, /dev/md6

hvt1sys05 oss 4 / 4 hvt1sys04 /dev/md1, /dev/md3, /dev/md5, /dev/md7

[root@hvt1sys00 ~]# ssh hvt1sys02 'lctl list_nids'

172.18.1.4@o2ib

Mount Command from Clients:

mount –t lustre 172.18.1.4@o2ib:172.18.1.5@o2ib:/fs1 /mnt/fs1

CS6000 MDRAID System

Page 25: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•To find out the InfiniBand IP address and LNET name to mount Lustre on the client, logon to the MGS node that has MGT mounted and run the following command

[root@mgs ~]# lctl list_nids

172.18.1.3@o2ib0

172.18.1.3@tcp

•172.18.1.3 – IP address of ib0 on MGS node (MGS and MDS can run on the same node or different nodes)

•o2ib0 – Default LNET network for RDMAoIB•Good practice is use a mount option of MGS primary and secondary, to allow the clients to still access the filesystem in the event of MGS target failing over from primary to secondary node

mount -t lustre 172.18.1.3@o2ib0:172.18.1.4@o2ib0:/fsname /mnt/lustre

Mount Lustre on the Clients

Page 26: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•We first need to create the mount point, then issue the mount command

[root@client~]# pdsh -w client[1-200] ‘mkdir /mnt/lustre’

[root@client~]# pdsh -w client[1-200] ‘mount -t lustre 172.18.1.3@o2ib0:172.18.1.4@o2ib0:/fsname /mnt/lustre’

•Check if all clients mounted successfully[root@client~]# pdsh -w client[1-200] 'mount -t lustre' | wc –l

200

•Check the state of the filesystem from one client with the following command. We have 36 OSS servers, the output should be 144 OSTs

[root@fvt-client1 ~]# lfs check servers | grep OST | wc –l

146

Example of Mounting Lustre on All client

Page 27: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Client Lustre Tuning

Page 28: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Network Checksums• Default is turned on and impacts performance. Disabling this is first thing we do

for performance

•LRU Size• Typically we disable this parameter

• Parameter used to control the number of client-side locks in an LRU queue

•Max RPCs in Flight• Default is 8

• RPC is remote procedure call

• This tunable is the maximum number of concurrent RPCs in flight from from clients.

•Max Dirty MB• Default is 32, good rule of thumb is 4x the value of max_rpcs_in_flight.

• Defines the amount of MBs of dirty data can be written and queued up on the client

Client Lustre Parameters

Page 29: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•First thing to always do is disable Wire Checksums on the client and disable LRU

•max_rpcs_in_flight and max_dirty_mb are a product of number of clients available for benchmarks.

•Typically, we increase max_rpcs_in_flight to 32 for 1.8.9, and to 256 for 2.4.x/2.5.x Clients

•In some cases if we still don’t get performance, than we increase max_dirty_mb to 4x the current value for 1.8.9 or the same as 2.4.x/2.5.1 Clients of max_rpcs_in_flight

•Procedures for Benchmarking1. Disable Checksums

2. Disable LRU

3. Increase max_rpcs_in_flight for specific client

4. Increase max_dirty_mb for specific client

Procedure to optimize Client Side Tuning

Page 30: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Disable Client Checksums with the specific FS name of cstorfs

[root@fvt-client1 ~]# pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/checksums; do echo 0 > $n; done’

•Increase Max RPCs in flight from default 8 to 32[root@fvt-client1 ~]# pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/max_rpcs_in_flight; do echo 32 > $n; done'

•Disable LRU[root@fvt-client1 ~]# pdsh -w client[1-12] ‘lctl set_param ldlm.namespaces.*osc*.lru_size=$((NR_CPU*100))’

•Increase Max Dirty MB from 32 to 128[root@fvt-client1 ~]# pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/max_dirty_mb; do echo 128 > $n; done'

•NOTE: These settings are not persistent and will need to be reset if re-mount Lustre or reboot the Client.

1.8.9 Client Lustre tuning Parameters

Page 31: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Disable Client Checksums with the specific FS name of cstorfs

[root@fvt-client1 ~]# pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/checksums; do echo 0 > $n; done’

•Increase Max RPCs in flight from default 8 to 256[root@fvt-client1 ~]# pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/max_rpcs_in_flight; do echo 256 > $n; done'

•Disable LRU[root@fvt-client1 ~]# pdsh -w client[1-12] ‘lctl set_param ldlm.namespaces.*osc*.lru_size=$((NR_CPU*100))’

•Increase Max Dirty MB from 32 to 256[root@fvt-client1 ~]# pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/max_dirty_mb; do echo 256 > $n; done'

•NOTE: These settings are not persistent and will need to be reset if re-mount Lustre or reboot the Client.

2.4.x/2.5.x Client Lustre tuning Parameters

Page 32: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Based on the LUG 2014 Client Performance and Comparison, surprising results keeping Checksums Enabled only has about an upto 5% impact on performance

•Default algorithm is Adler32, but CRC32 is also available, and suggest using CRC32 due to HW support for acceleration on CPU technologies

•Enable Client Checksums with the specific FS name of cstorfs

[root@fvt-client1 ~]# pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/checksums; do echo 1 > $n; done’

•Select CRC32 Client Checksum Type with the specific FS name of cstorfs

[root@fvt-client1 ~]# pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/checksum_type; do echo crc32 > $n; done’

A Note on Checksums

Page 33: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Jumbo Frames has a >= 30% improvement on Lustre Performance compared to standard MTU of 1500

•Change MTU on Client and Servers to 9000•Change MTU on the Switches to 9214 (or max MTU size) to accommodate for payload overhead

•Never set the MTU on the switch the same on the Clients and Servers

Ethernet Tuning

Page 34: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Server Side Benchmark for MDRAID

Using obdfilter-survey

Page 35: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Using obdfilter-survey is a Lustre benchmark tool that measures OSS and backend OST performance and does not measure LNET or Client performance

•This is a good benchmark to isolate network and client from the server

•This run from the primary management node •Must run as root to execute obdfilter-survey on the OSS nodes.

Server Side Benchmark

Page 36: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Before running obdfilter-survey, want to make sure all Targets are mounted on their primary servers.

•CS6000 Health[root@lmtest400 ~]# pdsh -g lustre 'grep -c lustre /proc/mounts' | dshbak –c

----------------

lmtest[402-403]

----------------

1

----------------

lmtest[404-409]

----------------

4

•If the output is different from above, use HA to failover/failback resources to their primary servers before proceeding

Check CS6000 Configuration for Health

Page 37: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

nobjlo=1 nobjhi=1 thrlo=256 thrhi=256 size=65536 obdfilter-survey

• Parameters Defines

• Size=65536         // file size which is 2x Controller Memory

• nobjhi=1 nobjlo=1     // number of files

• thrhi=256 thrlo=256     // number of worker threads when testing OSS and SSU

•The results for each OSS should be in the range of 3000MB/s on write and 3500MB/s on read

• If you see results significantly lower, rerun the test multiple times to ensure those anomalies are not consistent.

•NOTE: obdfilter-survey is intrusive and requires to be run as root, and occasionally can induce a LBUG on the OSS node, don’t be alarmed.

obdfilter-survey Setup for CS6000 MDRAID

Page 38: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

[root@lmtest400 ~]# pdsh -g oss ’nobjlo=1 nobjhi=1 thrlo=256 thrhi=256 size=65536 obdfilter-survey'

lmtest408: Sun May 19 17:01:47 PDT 2013 Obdfilter-survey for case=disk from lmtest408

lmtest409: Sun May 19 17:01:47 PDT 2013 Obdfilter-survey for case=disk from lmtest409

lmtest404: Sun May 19 17:01:47 PDT 2013 Obdfilter-survey for case=disk from lmtest404

lmtest406: Sun May 19 17:01:47 PDT 2013 Obdfilter-survey for case=disk from lmtest406

lmtest407: Sun May 19 17:01:48 PDT 2013 Obdfilter-survey for case=disk from lmtest407

lmtest405: Sun May 19 17:01:47 PDT 2013 Obdfilter-survey for case=disk from lmtest405

lmtest406: ost 4 sz 134217728K rsz 1024K obj 4 thr 1024 write 3271.13 [ 670.75, 953.31] rewrite 3224.33 [ 664.84, 966.80] read 3840.65 [ 647.85,1228.86]

lmtest409: ost 4 sz 134217728K rsz 1024K obj 4 thr 1024 write 3151.11 [ 557.89, 910.83] rewrite 3130.93 [ 649.86, 911.84] read 4004.42 [ 966.89,1040.84]

lmtest408: ost 4 sz 134217728K rsz 1024K obj 4 thr 1024 write 3131.36 [ 574.93, 926.86] rewrite 3127.69 [ 585.89, 923.82] read 4016.74 [ 965.76,1053.87]

lmtest407: ost 4 sz 134217728K rsz 1024K obj 4 thr 1024 write 3258.88 [ 607.92, 941.74] rewrite 3159.85 [ 669.24, 909.82] read 3766.40 [ 753.85,1233.84]

lmtest406: done!

lmtest408: done!

lmtest409: done!

lmtest407: done!

lmtest405: ost 4 sz 134217728K rsz 1024K obj 4 thr 1024 write 3121.14 [ 583.42, 920.82] rewrite 2967.64 [ 618.92, 902.00] read 3605.18 [ 769.83,1207.71]

lmtest404: ost 4 sz 134217728K rsz 1024K obj 4 thr 1024 write 3119.20 [ 607.75, 916.85] rewrite 2951.78 [ 539.93, 913.83] read 3560.45 [ 505.90,1231.84]

lmtest404: done!

lmtest405: done!

obdfilter-survey Results for CS6000 MDRAID with 256 worker threads

Page 39: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Server Side Benchmark for GridRAID

Using obdfilter-survey

Page 40: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Using obdfilter-survey is a Lustre benchmark tool that measures OSS and backend OST performance and does not measure LNET or Client performance

•This is a good benchmark to isolate network and client from the server

•This run from the primary management node •Must run as root to execute obdfilter-survey on the OSS nodes.

Server Side Benchmark

Page 41: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Before running obdfilter-survey, want to make sure all Targets are mounted on their primary servers.

•CS9000 Health[root@lmtest400 ~]# pdsh -g lustre 'grep -c lustre /proc/mounts' | dshbak –c

----------------

lmtest[402-403]

----------------

1

----------------

lmtest[404-409]

----------------

1

•If the output is different from above, use HA to failover/failback resources to their primary servers before proceeding

Check ClusterStor Configuration for Health

Page 42: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•The pre-allocation size of LDISKFS is not set correct for GridRAID and will need to be changed for optimal performance.

•NOTE: This is fixed in ClusterStor 1.5

•First, check the pre-allocation size on each OSS:[root@tsesys2n00 ~]# pdsh -g oss 'cat /proc/fs/ldiskfs/md*/prealloc_table'

tsesys2n04: 32 64 128

tsesys2n06: 32 64 128

tsesys2n05: 32 64 128

tsesys2n07: 32 64 128

•To correct the pre-allocation size:•[root@tsesys2n00 ~]# pdsh -g oss 'echo "256 512 1024 2048 4096" > /proc/fs/ldiskfs/md*/prealloc_table’

ClusterStor 1.4 Tuning Change Required

Page 43: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

nobjlo=1 nobjhi=1 thrlo=512 thrhi=512 size=131072 obdfilter-survey

• Parameters Defines

• Size=131072         // file size which is 2x Controller Memory

• nobjhi=1 nobjlo=1     // number of files

• thrhi=512 thrlo=512     // number of worker threads when testing OSS and SSU

•The results for each CS9000 OSS should be in the range of 4300MB/s on write and read

•The results for each CS6000 OSS should be in the range of 3100MB/s on writes, and 3700MB/s reads

• If you see results significantly lower, rerun the test multiple times to ensure those anomalies are not consistent.

•NOTE: obdfilter-survey is intrusive and requires to be run as root, and occasionally can induce a LBUG on the OSS node, don’t be alarmed.

•Use this for CS6000 and CS9000 SSUs

obdfilter-survey Setup for ClusterStor GridRAID SSU

Page 44: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

[root@tsesys2n00 ~]# pdsh -g oss 'nobjlo=1 nobjhi=1 thrlo=512 thrhi=512 size=131072 obdfilter-survey'

tsesys2n05: Mon May 5 10:43:04 PDT 2014 Obdfilter-survey for case=disk from tsesys2n05

tsesys2n07: Mon May 5 10:43:04 PDT 2014 Obdfilter-survey for case=disk from tsesys2n07

tsesys2n06: Mon May 5 10:43:04 PDT 2014 Obdfilter-survey for case=disk from tsesys2n06

tsesys2n04: Mon May 5 10:43:04 PDT 2014 Obdfilter-survey for case=disk from tsesys2n04

tsesys2n06: ost 1 sz 134217728K rsz 1024K obj 1 thr 512 write 3339.93 [3300.63,3505.66] rewrite 3315.21 [3220.80,3464.34] read 3815.69 [3691.21,4115.70]

tsesys2n05: ost 1 sz 134217728K rsz 1024K obj 1 thr 512 write 3313.65 [3245.19,3500.17] rewrite 3296.45 [2877.87,3495.95] read 3783.49 [3508.85,4071.69]

tsesys2n07: ost 1 sz 134217728K rsz 1024K obj 1 thr 512 write 3294.08 [3246.90,3459.60] rewrite 3280.59 [3231.68,3434.76] read 3802.78 [3519.90,3990.34]

tsesys2n04: ost 1 sz 134217728K rsz 1024K obj 1 thr 512 write 3271.72 [3195.85,3422.54] rewrite 3269.96 [3256.71,3464.56] read 3772.00 [3600.70,3972.05]

tsesys2n06: done!

tsesys2n05: done!

tsesys2n07: done!

tsesys2n04: done!

obdfilter-survey Results for GridRAID with 512 worker threads

Page 45: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Client Side Benchmark for MDRAID

Typical IOR Client Configuration

Page 46: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•At customer sites, typically all clients have the same architecture, same number of CPU cores, and same amount of memory.

• NOTE: Our configuration for this training is a bit unique and required additional thought to get performance per SSU

•With a uniform client architecture, the parameters for IOR are simpler to tune and optimize for benchmarking

•A minimum of 8 Clients per SSU MDRAID

Typical Client Configuration

Page 47: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Always want to transfer 2x the memory size of the total number of clients used to avoid any client side caching effect

•In our example:•(8_clients*32GB_memory)*2 = 512GB

•Total file size for the IOR benchmark will be 512GB

•NOTE: Typically all nodes are uniform.

IOR Rule of Thumb

Page 48: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•mpirun is used to execute IOR across all clients from the head node•Within mpirun we use the following options

•-machinefile machinefile.txt•-np : total number of tasks to execute on all clients (e.g. –np 64 state 8 tasks per client with 8 clients

•--byslot•-machinefile option is a simple text file listing all clients to execute the IOR benchmark

•-np defines the number of tasks•--byslot defines how many tasks are executed on the first node before starting additional tasks on the second node, so on and so forth. This is tied to how the machinefile options are defined

•--bynode is another option which executes 1 task per node before executing additional tasks per node.

Using MPI to execute IOR

Page 49: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Create a simple file called ‘machinefile.txt’ listing all the clients with the following options

•slots=4•max_slots=‘Max Number of CPU Cores’

• In an example, 16 cores.

•Because we edited the /etc/host file with the client IP address, we only need to use the associated hostname for each client listed in the /etc/host file.

•This is also true if DNS is used at the customer site, no need to define node names in /etc/host, or one can use the IPv4 address in the machine file.

Creating the machinefile on the head node

Page 50: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

[root@fvt-client1 ~]# vi machinefile.txt

client1 slots=4 max_slots=16

client2 slots=4 max_slots=16

client3 slots=4 max_slots=16

client4 slots=4 max_slots=16

…………

client13 slots=4 max_slots=16

client14 slots=4 max_slots=16

client15 slots=4 max_slots=16

client16 slots=4 max_slots=16

:wq!

•With the Xyratex defined machinefile, and using --byslot option, we will use 4 slots the first node, then 4 slots in the second node, so on and so forth

•If using --bynode, we will round-robin the number of slots per node regardless of the machinefile configuration

•Slots = Tasks per node

Sample machinefile for ClusterStor

Page 51: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Typical IOR Parameter for 16 nodes with 32GB of memory is

/usr/lib64/openmpi/bin/mpirun -machinefile machinefile.txt –np 128 –byslot ./IOR -v -F -t 1m –b 8g -o /mnt/lustre/test.`date+"%Y%m%d.%H%M%S“‘

•-np 128 = all 16 nodes used with 4 slots (tasks) per node•8 slots per node (tasks)

•-b 8g = (2x32GB*16_Clients)/128_tasks•Typically all nodes will be uniform, so we have to use lowest common denominator

•-o /mnt/lustre/test.`date +"%Y%m%d.%H%M%S“`•Found using different output test files provides better performance than reusing the same filename for each run

•-F : File per Process•-t 1m: File transfer size of 1m•-v : verbose output

Defining IOR Parameters

Page 52: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Jumbo Frames has a >= 30% improvement on Lustre Performance compared to standard MTU of 1500

•Change MTU on Client and Servers to 9000•Change MTU on the Switches to 9214 (or max MTU size) to accommodate for payload overhead

•Never set the MTU on the switch the same on the Clients and Servers

Ethernet Tuning

Page 53: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Writes: Buffered IO•-F : file per process•-t 1m•Default is Buffered IO•--byslot option•-np and –b will be a product of each other to achieve 6GB/s per SSU or better

•Reads: Direct IO•-F: File per Process•-t 64m•-B : DirectIO instead of the default Buffered IO•-np and –b will be a product of each other to achieve 6GB/s per SSU or better

•--byslot option

Baseline IOR Performance for CS6000 MDRAID

Page 54: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Use the –w flag in IOR to for Only Write results with Buffered IOR

•-F : file per process•-t 1m•Write only operation: -w•Default is Buffered IO•--byslot option•-np and –b will be a product of each other to achieve 6GB/s per SSU or better

•First, ensure Lustre stripe size of 1m and stripe count of 1lfs setstripe -s 1m –c 1 /mnt/lustre/fpp

•Execute IOR for Writesmpirun –np 64 --byslot /bin/IOR -F -t 1mb -b 16g -w -o /mnt/lustre/fpp/testw.out

IOR Write for MDRAID

Page 55: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•We first need to write the data, than read back using Direct IO

•-F: File per Process•Use the write and read flag: -w -r•-t 64m•-B : DirectIO instead of the default Buffered IO•-np and –b will be a product of each other to achieve 6GB/s per SSU or better

•First, ensure Lustre stripe size of 1m and stripe count of 1lfs setstripe -s 1m –c 1 /mnt/lustre/fpp

•Execute IOR for Readsmpirun –np 64 --byslot /bin/IOR –F -B –t 64mb -b 16g –w -r -o /mnt/lustre/fpp/testr.out

IOR Reads for MDRAID

Page 56: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Stonewall can be used to perform a Write or Read test under a fixed time in seconds to ensure only maximum performance is measured

•Removes unbalanced task completion that can effect performance results

•Good to use when Clients are non-uniform architecture

•If used, ensure to specify a much bigger block size in IOR (-b) and a long enough time to write or read 2x client memory

•Typically, I increase –b in IOR by a factor of 4x or more

Advanced IOR Option: Stonewall

Page 57: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Use the –w flag in IOR to for Only Write results with Buffered IOR

•-F : file per process•-t 1m•Write only operation: -w•Default is Buffered IO•-D 240 (4 min)•-np and –b will be a product of each other to achieve 6GB/s per SSU or better

•First, ensure Lustre stripe size of 1m and stripe count of 1lfs setstripe -s 1m –c 1 /mnt/lustre/fpp

•Execute IOR for Writesmpirun –np 64 --byslot /bin/IOR -F -t 1mb –b 512g -w -o /mnt/lustre/fpp/testw.out –D 240

IOR Write for MDRAID w/ Stonewall

Page 58: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•We first need to write the data, than read back using Direct IO

•-F: File per Process

• Use the write and read flag: -w -r• -t 64m• With Stonewall, we need to write without stonewall first, than read back in a separate IOR command using Stonewall

• -k: Keep the Write output test files to read back using Stonewall• -B : DirectIO instead of the default Buffered IO• -np and –b will be a product of each other to achieve 6GB/s per SSU or better

IOR Reads for MDRAID w/ Stonewall

Page 59: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Step 1: Set Lustre Stripe Size and Stripe Countlfs setstripe -s 1m –c 1 /mnt/lustre/fpp

•Step 2: IOR Write with Direct IO with large enough block size to read back at least 2x client memory and keep the output file (Want the write to complete)

mpirun –np 64 --byslot /bin/IOR –F -B –t 64mb –b 64g –w -k -o /mnt/lustre/fpp/testr.out

•Step 3: IOR Read back the output test file from Step 2 with Stonewall Option

mpirun –np 64 --byslot /bin/IOR –F -B –t 64mb –b 64g –r -o /mnt/lustre/fpp/testr.out – D 240

IOR Reads for MDRAID w/ Stonewall

Page 60: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Client Side Benchmark for GridRAID

Typical IOR Client Configuration

Page 61: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•At customer sites, typically all clients have the same architecture, same number of CPU cores, and same amount of memory.

• NOTE: Our configuration for this training is a bit unique and required additional thought to get performance per SSU

•With a uniform client architecture, the parameters for IOR are simpler to tune and optimize for benchmarking

•A minimum of 16 Clients per SSU GridRAID

Typical Client Configuration

Page 62: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Always want to transfer 2x the memory size of the total number of clients used to avoid any client side caching effect

•In our example:•(16*32GB)*2 = 1024GB

•Total file size for the IOR benchmark will be 1024GB

•NOTE: Typically all nodes are uniform.

IOR Rule of Thumb

Page 63: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•mpirun is used to execute IOR across all clients from the head node•Within mpirun we use the following options

•-machinefile machinefile.txt•-np : total number of tasks to execute on all clients (e.g. –np 64 state 8 tasks per client with 8 clients

•--byslot•-machinefile option is a simple text file listing all clients to execute the IOR benchmark

•-np defines the number of tasks•--byslot defines how many tasks are executed on the first node before starting additional tasks on the second node, so on and so forth. This is tied to how the machinefile options are defined

•--bynode is another option which executes 1 task per node before executing additional tasks per node.

Using MPI to execute IOR

Page 64: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Create a simple file called ‘machinefile.txt’ listing all the clients with the following options

•slots=4•max_slots=‘Max Number of CPU Cores’

• In an example, 16 cores.

•Because we edited the /etc/host file with the client IP address, we only need to use the associated hostname for each client listed in the /etc/host file.

•This is also true if DNS is used at the customer site, no need to define node names in /etc/host, or one can use the IPv4 address in the machine file.

Creating the machinefile on the head node

Page 65: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

[root@fvt-client1 ~]# vi machinefile.txt

client1 slots=4 max_slots=16

client2 slots=4 max_slots=16

client3 slots=4 max_slots=16

client4 slots=4 max_slots=16

…………

client13 slots=4 max_slots=16

client14 slots=4 max_slots=16

client15 slots=4 max_slots=16

client16 slots=4 max_slots=16

:wq!

•With the Xyratex defined machinefile, and using --byslot option, we will use 4 slots the first node, then 4 slots in the second node, so on and so forth

•If using --bynode, we will round-robin the number of slots per node regardless of the machinefile configuration

•Slots = Tasks per node

Sample machinefile for ClusterStor

Page 66: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Typical IOR Parameter for 16 nodes with 32GB of memory is

/usr/lib64/openmpi/bin/mpirun -machinefile machinefile.txt –np 128 --byslot./IOR -v -F -t 1m –b 8g -o /mnt/lustre/test.`date+"%Y%m%d.%H%M%S“‘

•-np 128 = all 16 nodes used with 4 slots (tasks) per node•8 slots per node (tasks)

•-b 8g = (2x32GB*16_Clients)/128_tasks•Typically all nodes will be uniform, so we have to use lowest common denominator

•-o /mnt/lustre/test.`date +"%Y%m%d.%H%M%S“`•Found using different output test files provides better performance than reusing the same filename for each run

•-F : File per Process•-t 1m: File transfer size of 1m•-v : verbose output

Defining IOR Parameters

Page 67: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Jumbo Frames has a >= 30% improvement on Lustre Performance compared to standard MTU of 1500

•Change MTU on Client and Servers to 9000•Change MTU on the Switches to 9214 (or max MTU size) to accommodate for payload overhead

•Never set the MTU on the switch the same on the Clients and Servers

Ethernet Tuning

Page 68: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Writes and Read in a single Operation: Direct IO•-F : file per process•-t 64m•-B : DirectIO instead of the default Buffered IO•--byslot option•-np and –b will be a product of each other to achieve 6GB/s per SSU or better

Baseline IOR Performance for ClusterStor GridRAID

Page 69: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Use the –w -r flag in IOR to for Write and Read results with Direct IOR

•-F : file per process•-t 64m•Write and Read operation: -w -r•Direct IO: -B•--byslot option•-np and –b will be a product of each other to achieve 6GB/s per SSU or better

•First, ensure Lustre stripe size of 1m and stripe count of 1lfs setstripe -s 1m –c 1 /mnt/lustre/fpp

•Execute IOR for Writes and Readsmpirun –np 256 --byslot /bin/IOR –F -B –t 64mb -b 16g –w -r -o /mnt/lustre/fpp/testw.out

IOR Write/Read for GridRAID

Page 70: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Stonewall can be used to perform a Write or Read test under a fixed time in seconds to ensure only maximum performance is measured

•Removes unbalanced task completion that can effect performance results

•Good to use when Clients are non-uniform architecture

•If used, ensure to specify a much bigger block size in IOR (-b) and a long enough time to write or read 2x client memory

•Typically, I increase –b in IOR by a factor of 4x or more

Advanced IOR Option: Stonewall

Page 71: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Use the –w flag in IOR to for Only Write results with Direct IOR

•-F : file per process•-t 1m•Write only operation: -w•Direct IO: -B•-D 240 (4 min)•Keep the output files: -k•-np and –b will be a product of each other to achieve 6GB/s per SSU or better

•First, ensure Lustre stripe size of 1m and stripe count of 1lfs setstripe -s 1m –c 1 /mnt/lustre/fpp

•Execute IOR for Writesmpirun –np 64 --byslot /bin/IOR –F -B –t 64mb –b 512g –w -k -o /mnt/lustre/fpp/testw.out –D 240

IOR Write for GridRAID w/ Stonewall

Page 72: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Use the –r flag in IOR to for Only Read results with Direct IO

•-F : file per process•-t 1m•Read only operation: -r•Direct IO: -B•-D 240 (4 min)•-np and –b will be a product of each other to achieve 6GB/s per SSU or better

•First, ensure Lustre stripe size of 1m and stripe count of 1lfs setstripe -s 1m –c 1 /mnt/lustre/fpp

•Execute IOR for Writesmpirun –np 64 --byslot /bin/IOR –F -B –t 64mb –b 512g –r -o /mnt/lustre/fpp/testw.out –D 120

IOR Read for GridRAID w/ Stonewall

Page 73: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Training Systems

Page 74: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

10.0.159.100 hvt-super00 super00 hvt-client001 (SL6.4, 2.6.32-358.el6.x86_64, 128GB, 8 Cores)

10.0.159.101 hvt-super01 super01 hvt-client002 (SL6.4, 2.6.32-358.el6.x86_64, 128GB, 8 Cores)

10.0.159.102 hvt-super02 super02 hvt-client00 (SL6.4, 2.6.32-358.el6.x86_64, 128GB, 8 Cores)

10.0.159.103 hvt-super03 super03 hvt-client004 (SL6.4, 2.6.32-358.el6.x86_64, 128GB, 8 Cores)

Group 1: Ron C. / Tony

10.0.159.104 hvt-asus100 asus100 hvt-client005 (SL6.4, 2.6.32-358.el6.x86_64, 256GB, 24 Cores)

10.0.159.105 hvt-asus101 asus101 hvt-client006 (SL6.4, 2.6.32-358.el6.x86_64, 256GB, 24 Cores)

10.0.159.106 hvt-asus102 asus102 hvt-client007 (SL6.4, 2.6.32-358.el6.x86_64, 256GB, 24 Cores)

Group 2: Rex T / Bill L

10.0.159.107 hvt-asus103 asus103 hvt-client008 (SL6.4, 2.6.32-358.el6.x86_64, 256GB, 24 Cores)

10.0.159.108 hvt-asus200 asus200 hvt-client009 (SL6.4, 2.6.32-358.el6.x86_64, 256GB, 24 Cores)

10.0.159.109 hvt-asus201 asus201 hvt-client010 (SL6.4, 2.6.32-358.el6.x86_64, 256GB, 24 Cores)

Group 3: Randy / Mike S

10.0.159.112 hvt-asus300 asus300 hvt-client013 (SL6.4, 2.6.32-358.el6.x86_64, 256GB, 24 Cores)

10.0.159.113 hvt-asus301 asus301 hvt-client014 (SL6.4, 2.6.32-358.el6.x86_64, 256GB, 24 Cores)

10.0.159.114 hvt-asus302 asus302 hvt-client016 (SL6.4, 2.6.32-358.el6.x86_64, 256GB, 24 Cores)

10.0.159.115 hvt-asus303 asus303 hvt-client016 (SL6.4, 2.6.32-358.el6.x86_64, 256GB, 24 Cores)

Group 4: Ron M / Dan N

Compute Clients Clients

Page 75: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

# ssh [email protected]

[root@tsesys2n00 ~]# cscli fs_info

-------------------------------------------------------------------------------------Information about "tsefs2" file system:

-------------------------------------------------------------------------------------Node Node type Targets Failover partner Devices

-------------------------------------------------------------------------------------tsesys2n02 mgs 0 / 0 tsesys2n03

tsesys2n03 mds 1 / 1 tsesys2n02 /dev/md66

tsesys2n04 oss 1 / 1 tsesys2n05 /dev/md0

tsesys2n05 oss 1 / 1 tsesys2n04 /dev/md1

tsesys2n06 oss 1 / 1 tsesys2n07 /dev/md0

tsesys2n07 oss 1 / 1 tsesys2n06 /dev/md1

[root@tsesys2n00 ~]# ssh tsesys2n02 'lctl list_nids'

172.21.2.3@o2ib

Mount Command from Clients:

mount –t lustre 172.21.2.3@o2ib:172.21.2.4@o2ib:/tsefs2 /mnt/tsefs2

CS6000 GridRAID System

Page 76: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

# ssh [email protected]

[root@hvt1sys00 ~]# cscli fs_info

----------------------------------------------------------------------------------------------------

Information about "fs1" file system:

----------------------------------------------------------------------------------------------------

Node Node type Targets Failover partner Devices

----------------------------------------------------------------------------------------------------

hvt1sys02 mgs 0 / 0 hvt1sys03

hvt1sys03 mds 1 / 1 hvt1sys02 /dev/md66

hvt1sys04 oss 4 / 4 hvt1sys05 /dev/md0, /dev/md2, /dev/md4, /dev/md6

hvt1sys05 oss 4 / 4 hvt1sys04 /dev/md1, /dev/md3, /dev/md5, /dev/md7

[root@hvt1sys00 ~]# ssh hvt1sys02 'lctl list_nids'

172.18.1.4@o2ib

Mount Command from Clients:

mount –t lustre 172.18.1.4@o2ib:172.18.1.5@o2ib:/fs1 /mnt/fs1

CS6000 MDRAID System

Page 77: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Always want to transfer 2x the memory size of the total number of clients used to avoid any client side caching effect

•In our example:•Group 1: (4_nodes*128GB)*2 = 1024GB•Group 2: (3_nodes*256GB)*2 = 1536GB•Group 3: (3_nodes*256GB)*2 = 1536GB•Group 4: (4_nodes*256GB)*2 = 2048GB

•This will be used to determine the IOR –b flag based on the mpirun –np flag with number of tasks

IOR Transfer Size

Page 78: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Group 1

[root@fvt-super00 ~]# vi machinefile.txt

super00 slots=4 max_slots=8

super01 slots=4 max_slots=8

super02 slots=4 max_slots=8

super03 slots=4 max_slots=8

Group 2

[root@fvt-asus100 ~]# vi machinefile.txt

asus100 slots=4 max_slots=24

asus101 slots=4 max_slots=24

asus102 slots=4 max_slots=24

Group 3

[root@fvt-asus103 ~]# vi machinefile.txt

asus103 slots=4 max_slots=24

asus201 slots=4 max_slots=24

asus202 slots=4 max_slots=24

Group 4

[root@fvt-asus300 ~]# vi machinefile.txt

asus300 slots=4 max_slots=24

asus301 slots=4 max_slots=24

asus302 slots=4 max_slots=24

asus302 slots=4 max_slots=24

Lab machinefile for Each Group

Page 79: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Before we run IOR, we want to confirm our configuration we just created.

•For base lining performance for each SSU using IOR, we need to make sure the stripe count for each directory is set to 1 and a stripe size set to 1M

•The command and output to do this is on the next slide.•For Example, create a directory called benchmark under Lustre mount point on a Client

# mkdir /mnt/lustre/benchmark

•Set Lustre stripe count to 1 and stripe size to 1m on a Client

# lfs setstripe –c 1 –s 1m /mnt/lustre/benchmark

Create and Confirm Lustre Stripe/Count from Client Client

Page 80: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Group 1•mpirun flags

• -np 32 = 4 clients will execute 8 tasks (slots) each•--byslot distribution

•IOR flags• -b 32g = We are transferring 1024GB total, 1024GB/32 = 32g

• -o /mnt/lustre/test.`date +"%Y%m%d.%H%M%S“`• Found using different output test files provides better

performance than reusing the same filename for each run• Or can explicitly use –o /mnt/lustre/test.0

• -F : File per Process•-t 1m: File transfer size of 1m•-w –r: Write and Read Flag in IOR•-k: Keep output files

Defining IOR Parameters for Group 1

Page 81: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Group 2•mpirun flags

• -np 24 = 3 clients will execute 8 tasks (slots) each•--byslot distribution

•IOR flags• -b 64g = We are transferring 1536GB total, 536GB/24 = 64g

• -o /mnt/lustre/test.`date +"%Y%m%d.%H%M%S“`• Found using different output test files provides better

performance than reusing the same filename for each run• Or can explicitly use –o /mnt/lustre/test.0

• -F : File per Process•-t 1m: File transfer size of 1m•-w –r: Write and Read Flag in IOR•-k: Keep output files

Defining IOR Parameters for Group 2

Page 82: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Group 3•mpirun flags

• -np 24 = 3 clients will execute 8 tasks (slots) each•--byslot distribution

•IOR flags• -b 64g = We are transferring 1536GB total, 536GB/24 = 64g

• -o /mnt/lustre/test.`date +"%Y%m%d.%H%M%S“`• Found using different output test files provides better

performance than reusing the same filename for each run• Or can explicitly use –o /mnt/lustre/test.0

• -F : File per Process•-t 1m: File transfer size of 1m•-w –r: Write and Read Flag in IOR•-k: Keep output files

Defining IOR Parameters for Group 3

Page 83: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Group 4•mpirun flags

• -np 32 = 4 clients will execute 8 tasks (slots) each•--byslot distribution

•IOR flags• -b 64g = We are transferring 2048GB total, 2048GB/32 = 64g

• -o /mnt/lustre/test.`date +"%Y%m%d.%H%M%S“`• Found using different output test files provides better

performance than reusing the same filename for each run• Or can explicitly use –o /mnt/lustre/test.0

• -F : File per Process•-t 1m: File transfer size of 1m•-w –r: Write and Read Flag in IOR•-k: Keep output files

Defining IOR Parameters for Group 4

Page 84: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•First parameters changed•disabled Wire Checksums•disabled LRU•Increase max_rpcs_in_flight to 32 or 256 (1.8.9 or 2.4.x/2.5.1)

•Increase max_dirty_mb to 128 or 256 (1.8.9 or 2.4.x/2.5.1)

Procedure to optimize Client Side Tuning

Page 85: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Disable Client Checksums with the specific FS name of cstorfs

[root@fvt-client1 ~]# pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/checksums; do echo 0 > $n; done’

•Increase Max RPCs in flight from default 8 to 32[root@fvt-client1 ~]# pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/max_rpcs_in_flight; do echo 32 > $n; done'

•Disable LRU[root@fvt-client1 ~]# pdsh -w client[1-12] ‘lctl set_param ldlm.namespaces.*osc*.lru_size=$((NR_CPU*100))’

•Increase Max Dirty MB from 32 to 128pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/max_dirty_mb; do echo 128 > $n; done'

•NOTE: These settings are not persistent and will need to be reset if re-mount Lustre or reboot the Client.

Client Lustre tuning Parameters for 1.8.9

Page 86: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Disable Client Checksums with the specific FS name of cstorfs

[root@fvt-client1 ~]# pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/checksums; do echo 0 > $n; done’

•Increase Max RPCs in flight from default 8 to 256[root@fvt-client1 ~]# pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/max_rpcs_in_flight; do echo 256 > $n; done'

•Disable LRU[root@fvt-client1 ~]# pdsh -w client[1-12] ‘lctl set_param ldlm.namespaces.*osc*.lru_size=$((NR_CPU*100))’

•Increase Max Dirty MB from 32 to 256pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/max_dirty_mb; do echo 256 > $n; done'

•NOTE: These settings are not persistent and will need to be reset if re-mount Lustre or reboot the Client.

Client Lustre tuning Parameters for 2.4.x/2.5.1

Page 87: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Checking the Checksum algorithm on the client[root@fvt-client1 ~]# pdsh -w client[1-12] 'cat /proc/fs/lustre/osc/cstorfs-OST00*/checksum_type’

•Default is adler, but can change to crc32c•[root@fvt-client1 ~]# pdsh -w client[1-12] ’for n in /proc/fs/lustre/osc/cstorfs-OST00*/checksum type; do echo crc32 > $n; done’

Checksum Algorithm on Client

Page 88: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

[root@fvt-client1 ~]# cat clientprep.sh

#! /bin/bash

echo "Mount Lustre"

pdsh -w client[1-12] 'mount -t lustre 172.18.1.3@o2ib0:172.18.1.4@o2ib0:/cstorfs /mnt/lustre'

sleep 3

echo "Disable Client Checksums"

pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/checksums; do echo 0 > $n; done'

sleep 3

echo "Increase Max RPCs in Flight from 8 to 32"

pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/max_rpcs_in_flight; do echo 32 > $n; done'

sleep 3

echo "Increase Max Dirty MB from 32 to 128"

pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/max_dirty_mb; do echo 128 > $n; done'

sleep 3

echo "Disable LRU"

pdsh -w client[1-12] 'lctl set_param ldlm.namespaces.*osc*.lru_size=$((NR_CPU*100))'

sleep 3

echo "Done Prepping Lustre Clients[1-12] for Benchmarking"

Script to Mount and Tune Clients for 1.8.9

Page 89: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

[root@fvt-client1 ~]# cat clientprep.sh

#! /bin/bash

echo "Mount Lustre"

pdsh -w client[1-12] 'mount -t lustre 172.18.1.3@o2ib0:172.18.1.4@o2ib0:/cstorfs /mnt/lustre'

sleep 3

echo "Disable Client Checksums"

pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/checksums; do echo 0 > $n; done'

sleep 3

echo "Increase Max RPCs in Flight from 8 to 256"

pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/max_rpcs_in_flight; do echo 256 > $n; done'

sleep 3

echo "Increase Max Dirty MB from 32 to 256"

pdsh -w client[1-12] 'for n in /proc/fs/lustre/osc/cstorfs-OST00*/max_dirty_mb; do echo 256 > $n; done'

sleep 3

echo "Disable LRU"

pdsh -w client[1-12] 'lctl set_param ldlm.namespaces.*osc*.lru_size=$((NR_CPU*100))'

sleep 3

echo "Done Prepping Lustre Clients[1-12] for Benchmarking"

Script to Mount and Tune Clients for 2.4.x/2.5.x

Page 90: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

References

Page 91: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Benchmarking Best Practices (http://goo.gl/3wSY8M)•Rice Oil and Gas Talk : Tuning and Measuring Performance (http://goo.gl/CKoodO)

•LUG 2014 Client Performance Results (http://goo.gl/e7XVLG)

•IEEE Paper, Torben Kling-Petersen and John Fragalla, on Optimizing Performance for a HPC Storage System (http://goo.gl/e7XVLG)

•Neo Performance Folder (http://goo.gl/CKoodO)

References

Page 92: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Training Module:Client Connectivity/Configuration

Benchmarking

[email protected]

[email protected]

Thank You

Page 93: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Server Lustre ConfigurationServer Side Storage Pool Configuration

Page 94: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•On the client using lsof and find the user still attached to the Lustre mount point

# lsof | grep /mnt/lustre

•Kill the process, using a -9 if needed.# kill -9 <PID_from LSOF>

Can’t umount lustre on Client

Page 95: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Listing all Storage Pools can be done running a command from the MDS node 3 server.

[admin@lmtest403 ~]$ lctl dl

0 UP mgc MGC172.18.1.3@o2ib c857b2ef-f624-e14a-9fb1-8ad7525e4fe4 5

1 UP lov cstorfs-MDT0000-mdtlov cstorfs-MDT0000-mdtlov_UUID 4

2 UP mdt cstorfs-MDT0000 cstorfs-MDT0000_UUID 3

3 UP mds mdd_obd-cstorfs-MDT0000 mdd_obd_uuid-cstorfs-MDT0000 3

4 UP osc cstorfs-OST0015-osc-MDT0000 cstorfs-MDT0000-mdtlov_UUID 5

5 UP osc cstorfs-OST000d-osc-MDT0000 cstorfs-MDT0000-mdtlov_UUID 5

6 UP osc cstorfs-OST0005-osc-MDT0000 cstorfs-MDT0000-mdtlov_UUID 5

7 UP osc cstorfs-OST0012-osc-MDT0000 cstorfs-MDT0000-mdtlov_UUID 5

8 UP osc cstorfs-OST000a-osc-MDT0000 cstorfs-MDT0000-mdtlov_UUID 5

9 UP osc cstorfs-OST0002-osc-MDT0000 cstorfs-MDT0000-mdtlov_UUID 5

10 UP osc cstorfs-OST0016-osc-MDT0000 cstorfs-MDT0000-mdtlov_UUID 5

11 UP osc cstorfs-OST000e-osc-MDT0000 cstorfs-MDT0000-mdtlov_UUID 5

12 UP osc cstorfs-OST0006-osc-MDT0000 cstorfs-MDT0000-mdtlov_UUID 5

13 UP osc cstorfs-OST0013-osc-MDT0000 cstorfs-MDT0000-mdtlov_UUID 5

14 UP osc cstorfs-OST000b-osc-MDT0000 cstorfs-MDT0000-mdtlov_UUID 5

•<TO MANY TO PRINT OUT>

Storage Pools: List all OSTs

Page 96: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•From the output from the MDS node, we see the following OST assignment per SSU

•SSU1: OST[0000-0007]•SSU2: OST[0008-OST000f]•SSU3: OST[0010-0017]

Storage Pools: OSTs per SSU

Page 97: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Creating Storage Pools are done on the node with MGS target is mounted, which is primarily node 02. We will create 3 storage pools, 1 per SSU, with the following command as root

[root@lmtest402 ~]# lctl pool_new cstorfs.ssu1

[root@lmtest402 ~]# lctl pool_new cstorfs.ssu2

[root@lmtest402 ~]# lctl pool_new cstorfs.ssu3

[root@lmtest402 ~]# lctl pool_add cstorfs.ssu1 OST[0000-0007]

[root@lmtest402 ~]# lctl pool_add cstorfs.ssu2 OST[0008-000f]

[root@lmtest402 ~]# lctl pool_add cstorfs.ssu3 OST[0010-0017]

•We will see an output similar to thesePool cstorfs.ssu3 not found

Can't verify pool command since there is no local MDT or client, proceeding anyhow…

Pool cstorfs.ssu3 not found

Can't verify pool command since there is no local MDT or client, proceeding anyhow...

•This is OK because MGS is not mounted with MDT. If MGS and MDT was mounted on the same node, we will not see the above warnings.

Storage Pools: Creation

Page 98: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•First remove the OSTs from a Storage Pool that was created on the MGS node

[root@lmtest402 ~]# lctl pool_list cstorfs.ssu1

[root@lmtest402 ~]# lctl pool_list cstorfs.ssu2

[root@lmtest402 ~]# lctl pool_list cstorfs.ssu3

[root@lmtest402 ~]# lctl pool_remove cstorfs.ssu1 OST[0000-0007]

[root@lmtest402 ~]# lctl pool_remove cstorfs.ssu2 OST[0008-000f]

[root@lmtest402 ~]# lctl pool_remove cstorfs.ssu3 OST[0018-0017]

•Destroy (delete) the existence of the Storage Pool[root@lmtest402 ~]# lctl pool_destroy cstorfs.ssu1

[root@lmtest402 ~]# lctl pool_destroy cstorfs.ssu2

[root@lmtest402 ~]# lctl pool_destroy cstorfs.ssu3

Storage Pools: Deletion

Page 99: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Server Lustre ConfigurationClient Side Storage Pool Configuration

Page 100: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Pick any client with Lustre mounted, and first we need to change directories into the mount point of Lustre

[root@fvt-client1 lustre]# cd /mnt/lustre

•Lets first list the storage pools that we just created on the MGS node from this client

[root@fvt-client1 lustre]# lctl pool_list cstrofs

Pools from cstorfs:

cstorfs.ssu3

cstorfs.ssu2

cstorfs.ssu1

•Lets check the OST assignment of 1 of the storage pools[root@fvt-client1 lustre]# lctl pool_list cstrofs.ssu1

Pool: cstorfs.ssu1

cstorfs-OST0000_UUID

cstorfs-OST0001_UUID

cstorfs-OST0002_UUID

cstorfs-OST0003_UUID

cstorfs-OST0004_UUID

cstorfs-OST0005_UUID

cstorfs-OST0006_UUID

cstorfs-OST0007_UUID

Configuration Lustre to use Storage Pools

Page 101: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•We are still in the /mnt/lustre directory and on the same Lustre Client, create 3 sub directories we will assign to each storage pool

[root@fvt-client1 lustre]# mkdir ssu1

[root@fvt-client1 lustre]# mkdir ssu2

[root@fvt-client1 lustre]# mkdir ssu3

•Assign the storage pool we created to each directory with a stripe count of 1 and stripe size of 1m

[root@fvt-client1 lustre]# lfs setstripe -p cstorfs.ssu1 -c 1 -s 1m /mnt/lustre/ssu1

[root@fvt-client1 lustre]# lfs setstripe -p cstorfs.ssu2 -c 1 -s 1m /mnt/lustre/ssu2

[root@fvt-client1 lustre]# lfs setstripe -p cstorfs.ssu3 -c 1 -s 1m /mnt/lustre/ssu3

Assigning a Storage Pool to a Sub Directory

Page 102: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•Before we run IOR, we want to confirm our configuration we just created.

•For base lining performance for each SSU using IOR, we need to make sure the stripe count for each directory is set to 1 and a stripe size set to 1M

•The command and output to do this is on the next slide.

Confirm Lustre Stripe/Count and Pools

Page 103: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•[root@fvt-client1 lustre]# lfs getstripe /mnt/lustre

•/mnt/lustre

•stripe_count: 1 stripe_size: 1048576 stripe_offset: -1

•/mnt/lustre/ssu3

•stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 pool: ssu3

•/mnt/lustre/ssu1

•stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 pool: ssu1

•/mnt/lustre/ssu2

•stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 pool: ssu2

Confirm Lustre Stripe/Count and Pools

Page 104: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

Server Lustre ConfigurationCreating Storage Pools for Testing

Page 105: Client Configuration Lustre Benchmarking. Client Setup and Package Installation Client Lustre Configuration Client Tuning Parameters Lustre Striping Benchmarking

•We need to create Lustre Storage Pools because 8 clients cannot stress our 3 SSU test system.

•Storage Pools allows us to test each SSU individually by dedicating the IOR output to the Storage Pool to we can baseline each SSU individually.

•Configuring Storage Pools is a combination of MGS and Client side configurations.

•NOTE: This is typically not needed for baseline performance at Customer sites.

Lustre Storage Pools