performance tuning a public mirror server - mike hulsman (proxy)
DESCRIPTION
Mike Hulsman (Proxy) on performance tuning a public mirror server. ftp.nluug.nl is a public mirror server running since the beginning of the 90′s, which currently transfers 4TB of data every day. This talk will provide some background on the service it provides. We discuss a history of the environment. In 2013, we chose to rebuild the server and we made some decisions. The process of performance tuning the ftp.nluug mirror server will be discussed, as well as how we got to the current performance.TRANSCRIPT
NLUUG 15 May 2014
My background
• Been a sysadmin since 1990
• Around 1995 started using Linux
• Since 2012 supporting ftp.nluug.nl
• Working @ Proxy in the Managed Services Team
NLUUG 15 May 2014
Hobby
NLUUG 15 May 2014
History of ftp.nluug.nl
• December 1992 – total download 448575834 bytes
NLUUG 15 May 2014
Former hardware
• 2 servers round robin DNS• All SCSI storage attached to 1 server• Storage exported with NFS and GFS
NLUUG 15 May 2014
Current Hardware
• 1 server• Xeon E3-1220 3.1 Ghz• 16 GB Memory• LSI Megaraid SAS 9271-4i• 4 x 1GB ethernet• 13 x 2TB disk, 12 disks in Raid5• 1 hot spare drive• 11 free positions for extra drives
NLUUG 15 May 2014
Performance servers 1 day avg
Max 480 Mb/sAvg 160 Mb/s2 TB/day
Max 800 Mb/sAvg 340 Mb/s4 TB/day
The current server
The 2 servers
NLUUG 15 May 2014
What we serve
• 17 TB of used storage• 13.800.038 files• 12.224.856 files smaller than 1 MB• 681.468 directories• 1.606.561 links (hard and soft links)
NLUUG 15 May 2014
Max of TB/month
Jun 2013 Oct 2013 Apr 2008 May 2013 Nov 2013 Feb 2014 Mar 2014 Dec 2013 Jan 2014 Apr 20140
20
40
60
80
100
120
140
93.67 94.61 94.66 96.43
103.94109.47
117.63 118.05 119.84
128.62
TB/month
NLUUG 15 May 2014
Atoptool with netatop
• Atoptool together with netatop kernel module•
NLUUG 15 May 2014
Mrtg
NLUUG 15 May 2014
Munin
NLUUG 15 May 2014
Nagios
NLUUG 15 May 2014
Design decisions new server
• Machine is I/O based not CPU or Memory• Disk I/O should be as fast as possible• 4 x 1 Gb ethernet cards in bonding mode• Is a public mirror server, downtime is not critical• Costs effective, whole server including disks costed about 5K
NLUUG 15 May 2014
Raid setup
• Machine build was in november 2012• Hitachi Ultrastar 7k3000 2 TB drives
– 2.0 million hours MTBF, 5 years guaranty– 64 MB cache
• LSI MegaRAID SAS 9271-4i Card– 1 Gb cache memory– CacheVault, NV Flash cache and battery
• 256kB Stripe size • 12 disks in Raid5 1 hot spare
NLUUG 15 May 2014
What was changed
NLUUG 15 May 2014
Problem #1
• 4 x 1Gb ethernet interfaces– Bonded as balance-alb
• Send and receive are bonded– Did not work out as we thought.– Maximum speed of all interfaces together did not exceed 960 Mb/s
• In June 2013 together with Surfnet assistance moved to 802.3ad and xmit_hash_policy=layer3+4– With atop we could see that the balancing is working– Did not work out as aspected– Maximum speed at that time was 993 Mb/s
NLUUG 15 May 2014
Solution
• Changes in the bonding parameters • Upgraded / downgraded ethernet driver versions• Tuning kernel parameters• Crap said the Surfnet engineer ;-)• Connection to a Cisco module where 8 ports have only a 1
Gb/s backend port.• We were on ports Gi9/3 Gi9/4 Gi9/5 Gi9/6• After rerouting ports (Gi9/3 Gi9/11 Gi9/17 Gi9/28) peaks of
2.4 Gb/s were seen
NLUUG 15 May 2014
The whole process of tuning
• At first I did not document, only implement– Now I document what I changed, including time stamp
• Buffers were too small• Timeout's were too long• Did not know where to start• So many performance tuning articles• The problem with a public mirror server is that it is public• I even changed parameters while writing this presentation
NLUUG 15 May 2014
Firewall
• Lot's of messages ip_conntrack: table full– net.netfilter.nf_conntrack_max = 1048576
• wc -l /proc/net/nf_conntrack– net.netfilter.nf_conntrack_tcp_timeout_established = 600
• Defaults are 432000 (5 day's)
NLUUG 15 May 2014
We can get troughput
NLUUG 15 May 2014
Networking buffers
• Ethtool -G ethx rx 4096 tx 4096• Ifconfig ethx txqueuelen 20000
– Also for bonding interface (not sure if needed)
• Ethtool -K ethx gso on (generic-segmentation-offload)• Ethtool -K ethx gro on (generic-receive-offload)
NLUUG 15 May 2014
Filesystems
• Limited to 3TB filesystems• Echo “noop” > /sys/block/sda/queue/scheduler
– Also tried deadline
• All are ext4– options noatime,nodiratime,noacl,commit=15
• LSI CacheFlushInterval=10 (default 5)
NLUUG 15 May 2014
OS level
• irqbalance to oneshot• Chkconfig –del cpuspeed; service stop cpuspeed• vm.min_free_kbytes=204800
– To prevent out of memory errors– Prevent deadlocks under high loads
• Ulimit – Max openfiles
NLUUG 15 May 2014
Yum-plugin-fastestmirror
• centos.mirror1.spango.com 1.624 ms • ftp.nluug.nl 1.533 ms• mirror.prolocation.net 1.44 ms• mirror.widexs.nl 1.371 ms
• Add “prefer=ftp.nluug.nl” to
/etc/yum/pluginconf.d/fastestmirror.conf
NLUUG 15 May 2014
Application level
• Rsync just the standards• Vsftpd just the standards• Apache 2.2.15
– Were running 4 instances, 1 for every IPv4 and Ipv6– Reduce now to 1 instance– KeepAlive On– MaxKeepAliveRequests 1000– ServerLimit 1024– MaxClients 1024– MaxRequestsPerChild 800
NLUUG 15 May 2014
Future
• Hardware– Memory from 16GB to 32 GB
• Add more opensource projects– Than we need more disks
• Nginx•• Maybe I should try XFS in the future• We just need more hits !!
NLUUG 15 May 2014
Some of the mirrors
• Most Linux distributions are mirrored (currently 163 different)• BSD (FreeBSD, NetBSD, OpenBSD• Openindiana, opensolaris, illumos• Jenkins• Mariadb• Vim• Blender, gimp, ImageMagick• Apache• Qt , perl, gcc• Vlc, xbmc, openelec
NLUUG 15 May 2014
• Questions......
NLUUG 15 May 2014
List of URL's
• https://github.com/jeffmurphy/NetPass/blob/master/doc/netfilter_conntrack_perf.txt• https://gist.github.com/kfox/1942782• http://www.atoptool.nl/• http://www.bufferbloat.net/attachments/9/BufferBloat11.pdf• http://lwn.net/Articles/507065/• http://www.coverfire.com/articles/queueing-in-the-linux-network-stack/