linux io performance

30
© Copyright IBM Corporation, 2012 Linux I/O performance An end-to-end methodology for maximizing Linux I/O performance on the IBM System x servers in a typical SAN environment. David Quenzler IBM Systems and Technology Group ISV Enablement June 2012

Upload: mgarg

Post on 17-Jul-2016

78 views

Category:

Documents


12 download

DESCRIPTION

Linux IO

TRANSCRIPT

Page 1: Linux IO Performance

© Copyright IBM Corporation, 2012

Linux I/O performance An end-to-end methodology for maximizing Linux I/O performance on the

IBM System x servers in a typical SAN environment.

David Quenzler IBM Systems and Technology Group ISV Enablement

June 2012

Page 2: Linux IO Performance

Linux I/O Performance

Table of contentsAbstract........................................................................................................................................1 Introduction .................................................................................................................................1 External storage subsystem - XIV .............................................................................................2 External SAN switches ...............................................................................................................4

Bottleneck monitoring .............................................................................................................................. 4 Fabric parameters.................................................................................................................................... 5 Basic port configuration ........................................................................................................................... 5 Advanced port configuration .................................................................................................................... 5

Host adapter placement rules....................................................................................................6 System BIOS settings.................................................................................................................6 HBA BIOS settings......................................................................................................................6 Linux kernel parameters.............................................................................................................9 Linux memory settings...............................................................................................................9

Page size ................................................................................................................................................. 9 Transparent huge pages.......................................................................................................................... 9

Linux module settings - qla2xxx..............................................................................................10 Linux SCSI subsystem tuning - /sys .......................................................................................12 Linux XFS file system create options .....................................................................................14 Linux XFS file system mount options .....................................................................................17 Red Hat tuned............................................................................................................................18

ktune.sh ................................................................................................................................................. 18 ktune.sysconfig ...................................................................................................................................... 19 sysctl.ktune ............................................................................................................................................ 19 tuned.conf .............................................................................................................................................. 19

Linux multipath .........................................................................................................................20 Sample scripts...........................................................................................................................22 Summary....................................................................................................................................24 Resources..................................................................................................................................25 About the author .......................................................................................................................26 Trademarks and special notices..............................................................................................27

Page 3: Linux IO Performance

Linux I/O performance 1

Abstract

This white paper discusses an end-to-end approach for Linux I/O tuning in a typical data center environment consisting of external storage subsystems, storage area network (SAN) switches, IBM System x Intel servers, Fibre Channel host bus adapters (HBAs) and 64-bit Red Hat Enterprise Linux.

Anyone with an interest in I/O tuning is welcome to read this white paper.

Introduction Linux® I/O tuning is complex. In a typical environment, I/O makes several transitions from the client application out to disk and vice versa. There are many pieces to the puzzle.

We will examine the following topics in detail:

External storage subsystems External SAN switches

Host adapter placement rules System BIOS settings Adapter BIOS settings

Linux kernel parameters Linux memory settings Linux module settings

Linux SCSI subsystem settings Linux file system create options Linux file system mount options

Red Hat tuned Linux multipath

You should follow an end-to-end tuning methodology in order to minimize the risk of poor tuning.

Recommendations in this white paper are based on the following environment under test:

IBM® System x® 3850 (64 processors and 640 GB RAM) Red Hat Enterprise Linux 6.1 x86_64

The Linux XFS file system IBM XIV® external storage subsystem, Fibre Channel (FC) attached

An architecture comprising IBM hardware and Red Hat Linux provides a solid framework for maximizing

I/O performance.

Page 4: Linux IO Performance

Linux I/O performance 2

External storage subsystem - XIV The XIV has few manual tunables. Here are a few tips:

Familiarize yourself with the XIV command-line interface (XCLI) as documented in the IBM XIV

Storage System User Manual. Ensure that you connect the XIV system to your environment in the FC fully redundant

configuration as documented in the XIV Storage System: Host Attachment and Interoperability

guide from IBM Redbooks®.

Figure 1: FC fully redundant configuration

Although you can define up to 12 paths per host, a maximum of six paths per host provides sufficient

redundancy and performance.

Useful XCLI commands:

# module_list -t all

# module_list -x

# fc_port_list

The XIV storage subsystem contains six FC data modules (4 to 9), each with 8 GB memory. The FC rate

is 4 Gbps and the data partition size is 1 MB.

Check the XIV HBA queue depth setting: The higher the host HBA queue depth, the more parallel I/O goes to the XIV system, but each XIV port can only sustain up to 1400 concurrent

I/Os to the same type target or logical unit (LUN). Therefore, the number of connections multiplied by the host HBA queue depth should not exceed that value. The number of connections should take the multipath configuration into account.

Note: The XIV queue limit is 1400 per XIV FC host port and 256 per LUN per worldwide port name (WWPN) per port.

Page 5: Linux IO Performance

Linux I/O performance 3

Twenty-four multipath connections to the XIV system would dictate that host queue depth be set to 58. (24*58=1392)

Check the operating system (OS) disk queue depth (see below) Make use of the XIV host attachment kit for RHEL

Useful commands:

# xiv_devlist

Page 6: Linux IO Performance

Linux I/O performance 4

External SAN switches As a best practice, set SAN Switch port speeds to Auto (auto-negotiate).

Typical bottlenecks are:

Latency bottleneck Congestion bottleneck

Latency bottlenecks occur when frames are sent faster than they can be received. This can be due to

buffer credit starvation or slow drain devices in the fabric.

Congestion bottlenecks occur when the required throughput exceeds the physical data rate for the connection.

Most SAN Switch web interfaces can be used to monitor the basic performance metrics, such as throughput utilization, aggregate throughput, and percentage of utilization.

The Fabric OS command-line interface (CLI) can also be used to create frame monitors. These monitors

analyze the first 64 bytes of each frame and can detect various types of protocols that can be monitored. Some performance features, such as frame monitor configuration (fmconfig), require a license.

Some of the useful commands:

switch:admin>perfhelp switch:admin>perfmonitorshow switch:admin>perfaddeemonitor

switch:admin>fmconfig

Bottleneck monitoring

Enable bottleneck monitoring on SAN switches by using the following command:

switch:admin> bottleneckmon --enable -alert

Useful commands

switch:admin> bottleneckmon --status

switch:admin> bottleneckmon --show -interval 5 -span 300

switch:admin> switchstatusshow

switch:admin> switchshow

switch:admin> configshow

switch:admin> configshow -pattern "fabric"

switch:admin> diagshow

switch:admin> porterrshow

Page 7: Linux IO Performance

Linux I/O performance 5

Fabric parameters

Fabric parameters are described in the following table. Default values are in brackets []:

Fabric parameter Description

BBCredit Increasing the buffer-to buffer (BB) credit parameter may increase performance by buffering FC frames coming from 8Gb/s FC server ports and going to 4Gb/s FC ports on the XIV. SAN segments can run

at different rates.

Frame pacing (BB credit starvation) occurs when no more BB credits are available. Frame pacing delay: AVG FRAME PACING should

always be zero. If not, increase buffer credits. But, over-increasing the number of BB credits does not increase performance

[16]

E_D_TOV Error Detect TimeOut Value [2000]

R_A_TOV Resource Allocation TimeOut Value [10000]

dataFieldSize 512, 1024, 2048, 2112 [2112]

Sequence Level Switching Under normal conditions, disable for better performance (interleave frames, do not group frames) [0]

Disable Device Probing Set this mode only if N_Pord discovery causes attached devices to fail [0]

Per-Frame Routing Priority [0]

Suppress Class F Traffic Used with ATM gateways only [0]

Insistent Domain ID Mode fabric.ididmode [0]

Table 1: Fabric Parameters – default values are in brackets []

Basic port configuration

Target rate limiting (ratelim) is used to minimize congestion at the adapter port caused by a slow drain

device operating in the fabric at a slower speed (for example a 4 GBps XIV system)

Advanced port configuration

Turning on Interrupt Control Coalesce and increasing the latency monitor timeout value can improve

performance by reducing interrupts and processor utilization.

Page 8: Linux IO Performance

Linux I/O performance 6

Host adapter placement rules It is extremely important for you to follow the adapter placement rules for your server in order to minimize PCI bus saturation.

System BIOS settings Use recommended CMOS settings for your IBM System x server.

You can use the IBM Advanced Settings Utility (asu64) to modify the System x BIOS settings from the Linux command line. It is normally installed in /opt/ibm/toolscenter/asu

ASU normally tries to communicate over the LAN through the USB interface. Disable the LAN over USB interface with the following command:

# asu64 set IMM.LanOverUsb Disabled --kcs

The following settings can result in better performance

uEFI.TurboModeEnable=Enable

uEFI.PerformanceStates=Enable

uEFI.PackageCState=ACPI C3

uEFI.ProcessorC1eEnable=Disable

uEFI.DDRspeed=Max Performance

uEFI.QPISpeed=Max Performance

uEFI.EnergyManager=Disable

uEFI.OperatingMode=Performance Mode

Also: enabling or disabling Hyperthreading can improve application performance.

Useful commands:

# asu64 show

# asu64 show --help

# asu64 set IMM.LanOverUsb Disabled --kcs

# asu64 set uEFI.OperatingMode Performance

HBA BIOS settings You can use the QLogic SANSurfer command-line utility (scli) to show or modify HBA settings.

Page 9: Linux IO Performance

Linux I/O performance 7

Task Command

Display current HBA Parameter settings

# scli -c

Display WWPNs only # scli -c | grep WWPN

Display settings only # scli -c | grep \: | grep -v WWPN | sort | uniq -c

Restore default settings # scli -n all default

Table 2: Modifying HBA settings

WWPNs can also be determined from the Linux command line or using a small script

#!/bin/sh

###

hba_location=$(lspci | grep HBA | awk '{print $1}')

for adapter in $hba_location

do

cat $(find /sys/devices -name \*${adapter})/host*/fc_host/host*/port_name

done

Listing 1: Determining WWPNs

HBA parameters as reported by the scli command appear in the following table:

Parameter Default value

Connection Options 2 - Loop Preferred, Otherwise Point-to-Point

Data Rate Auto

Enable FC Tape Support Disabled

Enable Hard Loop ID Disabled

Enable Host HBA BIOS Disabled

Enable LIP Full Login Yes

Enable Target Reset Yes

Execution Throttle 16

Frame Size 2048

Page 10: Linux IO Performance

Linux I/O performance 8

Hard Loop ID 0

Interrupt Delay Timer (100ms) 0

Link Down Timeout (seconds) 30

Login Retry Count 8

Loop Reset Delay (seconds) 5

LUNs Per Target 128

Operation Mode 0

Out Of Order Frame Assembly Disabled

Port Down Retry Count 30 seconds

Table 3: HBA BIOS tunable parameters (sorted)

Use the lspci command to show which type(s) of Fibre Channel adapters exist in the system. For example:

# lspci | grep HBA

Note: Adapters from different vendors have different default values.

Page 11: Linux IO Performance

Linux I/O performance 9

Linux kernel parameters The available options for the Linux scheduler are noop, anticipatory, deadline, or cfq.

echo "Linux: SCHEDULER"

cat /sys/block/*/queue/scheduler | grep -v none | sort | uniq -c

echo ""

Listing 2: Determining the Linux scheduler for block devices

The Red Hat enterprise-storage tuned profile uses the deadline scheduler. The deadline scheduler can be enabled by adding the elevator=deadline parameter to the kernel command line in grub.conf.

Useful commands:

# cat /proc/cmdline

Linux memory settings This section shows you the Linux memory settings.

Page size

The default page size for Red Hat Linux is 4096 bytes.

# getconf PAGESIZE

Transparent huge pages

The default size for huge pages is 2048 KB for most large systems.

echo "Linux: HUGEPAGES"

cat /sys/kernel/mm/redhat_transparent_hugepage/enabled

echo ""

Listing 3: Determining the Linux huge page setting

The Red Hat enterprise-storage tuned profile enables huge pages.

Page 12: Linux IO Performance

Linux I/O performance 10

Linux module settings - qla2xxx You can see the parameters for the qla2xxx module using the following script:

#!/bin/sh

###

for param in $(ls /sys/module/qla2xxx/parameters)

do

echo -n "${param} = "

cat /sys/module/qla2xxx/parameters/${param}

done

Listing 4: Determining qla2xxx module parameters

Disable Qlogic failover. If the output of the following command shows the -k driver (not the -fo driver) then

failover is disabled.

# modprobe qla2xxx | grep -w ^version

version: <some_version>-k

Qlogic lists the highlights of the 2400 series HBAs:

150,000 IOPS per port

Out-of-order frame reassembly T10 CRC for end-to-end data integrity

Useful commands:

# modinfo -p qla2xxx

The qla_os.c file in the Linux kernel source contains information on many of the qla2xxx module parameters. Some parameters as listed by modinfo -p do not exist in the Linux source code. Others

are not explicitly defined but may be initialized by the adapter firmware.

Descriptions of module parameters appear in the following table:

Parameter Description Linux kernel source Default value

ql2xallocfwdump Allocate memory for a

firmware dump during

1 1 - allocate memory

Page 13: Linux IO Performance

Linux I/O performance 11

HBA initialization

ql2xasynctmfenable Issue TM IOCBs asynchronously via IOCB mechanism

does not exist 0 - issue TM IOCBs via mailbox mechanism

ql2xdbwr Scheme for request queue posting

does not exist 1 - CAMRAM doorbell (faster)

ql2xdontresethba Reset behavior does not exist 0 - reset on failure

ql2xenabledif T10-CRC-DIF does not exist 1 - DIF support

ql2xenablehba_err_chk T10-CRC-DIF Error

isolation by HBA

does not exist 0 - disabled

ql2xetsenable Firmware ETS burst does not exist 0 - skip ETS enablement

ql2xextended_error_logging

Extended error logging not explicitly defined 0 - no logging

ql2xfdmienable FDMI registrations 1 0 - no FDMI

ql2xfwloadbin Location from which to load firmware

not explicitly defined 0 - use default semantics

ql2xgffidenable GFF_ID checks of port type

does not exist 0 - do not use GFF_ID

ql2xiidmaenable IDMA setting 1 1 - perform iIDMA

ql2xloginretrycount Alternate value for

NVRAM login retry count

0 0

ql2xlogintimeout Login timeout value in seconds

20 20

ql2xmaxqdepth Maximum queue depth for target devices -- used to seed queue

depth for scsi devices

32 32

ql2xmaxqueues MQ 1 1 - single queue

ql2xmultique_tag CPU affinity not defined 0 - no affinity

ql2xplogiabsentdevice PLOGI not defined 0 - no PLOGI

ql2xqfulrampup Time in seconds to wait

to begin to ramp-up the queue depth for a device after a queue-full

condition has been

does not exist 120 seconds

Page 14: Linux IO Performance

Linux I/O performance 12

detected

ql2xqfulltracking Track and dynamically

adjust queue depth for a scsi devices

does not exist 1 - perform tracking

ql2xshiftctondsd Control shifting of command type processing based on

total number of SG elements

does not exist 6

ql2xtargetreset Target reset does not exist 1 - use hw defaults

qlport_down_retry Maximum number of

command retries to a port in PORT-DOWN state

not defined 0

Table 4: qla2xxx module parameters

Linux SCSI subsystem tuning - /sys See /sys/block/<device>/queue/<parameter>

Block device parameter values can be determined using a small script:

#!/bin/sh

###

param_list=$(find /sys/block/sda/queue -maxdepth 1 -type f -exec basename '{}' \; | sort)

dev_list=$(ls -l /dev/disk/by-path | grep -w fc | awk -F \/ '{print $3}')

dm_list=$(ls -d /sys/block/dm-* | awk -F \/ '{print $NF}')

for param in ${param_list}

do

echo -n "${param} = "

for dev in ${dev_list} ${dm_list}

do

cat /sys/block/${dev}/queue/${param}

Page 15: Linux IO Performance

Linux I/O performance 13

done | sort | uniq -c

done

echo -n "queue_depth = "

for dev in ${dev_list}

do

cat /sys/block/${dev}/device/queue_depth

done | sort | uniq -c

Determining block device parameters

To send down large-size requests (greater than 512 KB on 4 KB page size systems):

Consider increasing max_segments to 1024 or greater Set max_sectors_kb equal to max_hw_sectors_kb

SCSI device parameters appear in the following table. Values that can be changed are shown as (rw):

Parameter Description Value

hw_sector_size (ro) Hardware sector size in bytes 512

max_hw_sectors_kb (ro) Maximum number of kilobytes supported in a single data transfer

32767

max_sectors_kb (rw) Maximum number of kilobytes that the block layer will allow for

a file system request

512

nomerges (rw) Enable or disable lookup logic 0- all merges are enabled

nr_requests (rw) Number of read or write requests which can be allocated in the

block layer

128

read_ahead_kb (rw) 8192

rq_affinity (rw) Always complete a request on the same CPU that queued it.

1- CPU group affinity

2- strict CPU affinity

1 - CPU group affinity

scheduler (rw) deadline

Table 5: SCSI subsystem tunable parameters

Page 16: Linux IO Performance

Linux I/O performance 14

Using max_sectors_kb:

By default, Linux devices are configured for a maximum 512 KB I/O size. When using a larger file system

block size, increase the max_sectors_kb parameter. Max_sectors_kb must be less than or equal to max_hw_sectors_kb.

The default queue_depth is 32 and represents the total number of transfers that can be queued to a device. You can check the queue depth by examining /sys/block/<device>/device/queue_depth.

Linux XFS file system create options

Useful commands:

# getconf PAGESIZE

# man mkfs.xfs

Note: XFS writes are not guaranteed to be committed unless the program issues a fsync() call afterwards.

Red Hat: Optimizing for a large number of files

If necessary, you can increase the amount of space allowed for inodes using the mkfs.xfs -i maxpct= option. The default percentage of space allowed for inodes varies by file

system size. For example, a file system between 1 TB and 50 GB in size will allocate 5% of the total space for inodes.

Red Hat: Optimizing for a large number of files in a single directory

Normally, the XFS file system directory block size is the same as the file system block size. Choose a larger value for the mkfs.xfs -n size= option, if there are many millions of directory entries.

Red Hat: Optimizing for concurrency

Increase the number of allocation groups on systems with many processors.

Red Hat: Optimizing for applications that use extended attributes

1. Increasing inode size might be necessary if applications use extended attributes.

2. Multiple attributes can be stored in an inode provided that they do not exceed the maximum size

limit (in bytes) for attribute+value.

Page 17: Linux IO Performance

Linux I/O performance 15

Red Hat: Optimizing for sustained metadata modifications

1. Systems with large amounts of RAM could benefit from larger XFS log sizes.

2. The log should be aligned with the device stripe size (the mkfs command may do this automatically)

The metadata log can be placed on another device, for example, a solid-state drive (SSD) to reduce disk seeks.

Specify the stripe unit and width for hardware RAID devices

Syntax (options not related to performance are omitted)

# mkfs.xfs [ options ] device

-b block_size_options

size=<int> -- size in bytes

default 4096

minimum 512

maximum 65536 (must be <= PAGESIZE)

-d data_section_options

More allocation groups imply that more parallelism can be achieved when allocating blocks and inodes

agcount=<int> -- number of allocation groups

agsize

name

file

size

sunit

su

swidth

sw

Page 18: Linux IO Performance

Linux I/O performance 16

-i inode_options

size

log

perblock

maxpct

align

attr

-l log_section_options

internal

logdev

size

version

sunit

su

lazy-count

-n naming_options

size

log

version

-r realtime_section_options

rtdev

extsize

size

-s sector_size

Page 19: Linux IO Performance

Linux I/O performance 17

log

size

-N

Dry run. Print out filesystem parameters without creating the filesystem.

L

inux XFS file system mount options

isting 5: Create options for XFS file systems

L

Useful commands

fs_info

roc/mounts

obarrier

oatime

ode64 – XFS is allowed to create inodes at any location in the file system. Starting from kernel 2.6.35,

gbsize – Larger values can improve performance. Smaller values should be used with fsync-heavy

elaylog – RAM is used to reduces the number of changes to the log.

he Red Hat 6.2 Release Notes mention that XFS has been improved in order to better handle metadata

# x

# xfs_quota

# grep xfs /p

# mount | grep xfs

n

n

inXFS file systems will mount either with or without the inode64 option.

loworkloads.

d

Tintensive workloads. The default mount options have been updated to use delayed logging.

Page 20: Linux IO Performance

Linux I/O performance 18

Red Hat tuned Red Hat Enterprise Linux has a tuning package called “tuned” which sets certain parameters based on a chosen profile.

Useful commands:

# tuned-adm help

# tuned-adm list

# tuned-adm active

The enterprise-storage profile contains the following files. When comparing the enterprise-storage profile with the throughput-performance profile, some files are identical:

# cd /etc/tune-profiles

# ls enterprise-storage/

ktune.sh ktune.sysconfig sysctl.ktune tuned.conf

# sum throughput-performance/* enterprise-storage/* | sort

03295 2 throughput-performance/sysctl.s390x.ktune

08073 2 enterprise-storage/sysctl.ktune

15419 2 enterprise-storage/ktune.sysconfig

15419 2 throughput-performance/ktune.sysconfig

15570 1 enterprise-storage/ktune.sh

43756 1 enterprise-storage/tuned.conf

43756 1 throughput-performance/tuned.conf

47739 2 throughput-performance/sysctl.ktune

57787 1 throughput-performance/ktune.sh

ktune.sh

The enterprise-storage ktune.sh is the same as the throughput-performance ktune.sh but adds functionality for disabling or enabling I/O barriers. The enterprise-storage profile is preferred when using XIV storage. Important functions include:

set_cpu_governor performance -- uses cpuspeed to set the governor enable_transparent_hugepages -- does what it says

Page 21: Linux IO Performance

Linux I/O performance 19

remount_partitions nobarrier -- disables write barriers multiply_disk_readahead -- modifies /sys/block/sd*/queue/read_ahead_kb

ktune.sysconfig

ktune.sysconfig is identical for both throughput-performance and enterprise-storage profiles:

# grep -h ^[A-Za-z] enterprise-storage/ktune.sysconfig \ throughput-performance/ktune.sysconfig | sort | uniq -c

2 ELEVATOR="deadline"

2 ELEVATOR_TUNE_DEVS="/sys/block/{sd,cciss,dm-}*/queue/scheduler"

2 SYSCTL_POST="/etc/sysctl.conf"

2 USE_KTUNE_D="yes"

Listing 6: Sorting the ktune.sysconfig file

sysctl.ktune

sysctl.ktune is functionally identical for both throughput-performance and enterprise-storage profiles:

# grep -h ^[A-Za-z] enterprise-storage/sysctl.ktune \ throughput-performance/sysctl.ktune | sort | uniq -c

2 kernel.sched_min_granularity_ns = 10000000

2 kernel.sched_wakeup_granularity_ns = 15000000

2 vm.dirty_ratio = 40

Listing 7: Sorting the sysctl.ktune file

tuned.conf

tuned.conf is identical for both throughput-performance and enterprise-storage profiles:

# grep -h ^[A-Za-z] enterprise-storage/tuned.conf \throughput-performance/tuned.conf | sort | uniq -c

12 enabled=False

Listing 8: Sorting the tuned.conf file

Page 22: Linux IO Performance

Linux I/O performance 20

Linux multipath Keep it simple: configure just enough paths for redundancy and performance.

features='1 queue_if_no_path' hwhandler='0' wp=rw

policy='round-robin 0' prio=-1

features='1 queue_if_no_path'

Set 'no_path_retry N', then remove features='1 queue_if_no_path' option or set 'features 0'

Multipath configuration defaults

Parameter Default value

polling interval 5

udev_dir/dev /dev

multipath_dir /lib/multipath

find_multipaths no

verbosity 2

path_selector round-robin 0

path_grouping_policy failover

getuid_callout /lib/udev/scsi_id -- whitelisted --device=/dev/%n

prio const

features queue_if_no_path

path_checker directio

failback manual

rr_min_io 1000

rr_weight uniform

no_path_retry 0

user_friendly_names no

queue_without_daemon yes

flush_on_last_del no

max_fds determined by the calling process

Page 23: Linux IO Performance

Linux I/O performance 21

checker_timer /sys/block/sdX/device/timeout

fast_io_fail_tmo determined by the OS

dev_loss_tmo determined by the OS

mode determined by the process

uid determined by the process

gid determined by the process

Table 6: Multipath configuration options

The default load balancing policy (path_selector) is round-robin 0. Other choices are queue-length 0 and

service-time 0.

Consider using the XIV Linux host attachment kit to create the multipath configuration file.

# cat /etc/multipath.conf

devices {

device {

vendor "IBM"

product "2810XIV"

path_selector "round-robin 0"

path_grouping_policy multibus

rr_min_io 15

path_checker tur

failback 15

no_path_retry 5

#polling_interval 3

}

}

defaults {

...

user_friendly_names yes

...

Page 24: Linux IO Performance

Linux I/O performance 22

}

Listing 9: A sample multipath.conf file

Sample scripts You can use the following script to query various settings related to I/O tuning:

#!/bin/sh

#!/bin/sh

# Query scheduler, hugepages, and readahead settings for fibre channel scsi devices

###

#hba_pci_loc=$(lspci | grep HBA | awk '{print $1}')

echo "Linux: HUGEPAGES"

cat /sys/kernel/mm/redhat_transparent_hugepage/enabled

echo ""

echo "Linux: SCHEDULER"

cat /sys/block/*/queue/scheduler | grep -v none | sort | uniq -c

echo ""

echo "FC: max_sectors_kb"

ls -l /dev/disk/by-path | grep -w fc | awk -F'/' '{print $3}' | xargs -n1 -i cat /sys/block/{}/queue/max_sectors_kb | sort | uniq -c

echo ""

echo "Linux: dm-* READAHEAD"

ls /dev/dm-* | xargs -n1 -i blockdev --getra {} | sort | uniq -c

blockdev --report /dev/dm-*

echo ""

Page 25: Linux IO Performance

Linux I/O performance 23

echo "Linux: FC disk sd* READAHEAD"

ls -l /dev/disk/by-path | grep -w fc | awk -F'/' '{print $3}' | xargs -n1 -i blockdev --getra /dev/{} | sort | uniq -c

ls -l /dev/disk/by-path | grep -w fc | awk -F'/' '{print $3}' | xargs -n1 -i blockdev --report /dev/{} | grep dev

echo ""

Listing 10: Sorting the ktune.sysconfig file

Page 26: Linux IO Performance

Linux I/O performance 24

Summary This white paper presented an end-to-end approach for Linux I/O tuning in a typical data center environment consisting of external storage subsystems, storage area network (SAN) switches, IBM

System x Intel servers, Fibre Channel HBAs and 64-bit Red Hat Enterprise Linux.

Visit the links in the “Resources” section for more information on topics presented in this white paper.

Page 27: Linux IO Performance

Linux I/O performance 25

Resources The following websites provide useful references to supplement the information contained in this paper:

XIV Redbooks

ibm.com/redbooks/abstracts/sg247659.html ibm.com/redbooks/abstracts/sg247904.html

Note: IBM Redbooks are not official IBM product documentation.

XIV Infocenter

http://publib.boulder.ibm.com/infocenter/ibmxiv/r2

XIV Host Attachment Kit for RHEL can be downloaded from Fix Central

ibm.com/support/fixcentral

Qlogic

http://driverdownloads.qlogic.com ftp://ftp.qlogic.com/outgoing/linux/firmware/rpms

Red Hat Enterprise Linux Documentation http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux

IBM Advanced Settings Utility ibm.com/support/entry/portal/docdisplay?Indocid=TOOL-ASU

Linux Documentation/kernel-parameters.txt Documentation/block/queue-sysfs.txt

Documentation/filesystems/xfs.txt drivers/scsi/qla2xxx http://xfs.org/index.php/XFS_FAQ

Page 28: Linux IO Performance

Linux I/O performance 26

About the author David Quenzler is a consultant in IBM Systems and Technology Group ISV Enablement Organization. He has more than 15 years’ experience working with the IBM System x (Linux) and IBM Power Systems

(IBM AIX®) platforms. You can reach David at [email protected].

Page 29: Linux IO Performance

Linux I/O performance 27

Trademarks and special notices © Copyright IBM Corporation 2012.

References in this document to IBM products or services do not imply that IBM intends to make them

available in every country.

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked

terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A

current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.

Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or

its affiliates.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

Intel, Intel Inside (logos), MMX, and Pentium are trademarks of Intel Corporation in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

SET and the SET Logo are trademarks owned by SET Secure Electronic Transaction LLC.

Other company, product, or service names may be trademarks or service marks of others.

Information is provided "AS IS" without warranty of any kind.

All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance

characteristics may vary by customer.

Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of

such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims

related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.

All statements regarding IBM future direction and intent are subject to change or withdrawal without

notice, and represent goals and objectives only. Contact your local IBM office or IBM authorized reseller for the full text of the specific Statement of Direction.

Some information addresses anticipated future capabilities. Such information is not intended as a

definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The

Page 30: Linux IO Performance

Linux I/O performance 28

information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O

configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.

Photographs shown are of engineering prototypes. Changes may be incorporated in production models.

Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of

the materials for this IBM product and use of those websites is at your own risk.