optiportal configuration considerations

20
OptiPortal Configuration Considerations Ashley Wright High Performance Computing and Research Support (QUT)

Upload: jamil

Post on 08-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Ashley Wright High Performance Computing and Research Support (QUT). OptiPortal Configuration Considerations. Our OptiPortal. Our Optiportal. 6x Dell Precision T3500 Intel Xeon E5520 (2.27GHz) 4GB RAM nVidia FX 1800 Onboard 1Gb/s network PCIe 1Gb/s network card (supports Jumbo Frames) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: OptiPortal Configuration Considerations

OptiPortal Configuration Considerations

Ashley WrightHigh Performance Computing and Research Support

(QUT)

Page 2: OptiPortal Configuration Considerations

Our OptiPortal

Page 3: OptiPortal Configuration Considerations

Our Optiportal

6x Dell Precision T3500 Intel Xeon E5520 (2.27GHz) 4GB RAM nVidia FX 1800 Onboard 1Gb/s network PCIe 1Gb/s network card (supports Jumbo Frames) 300GB HDD

22x Dell 24” Monitors (4x5 configuration)

Page 4: OptiPortal Configuration Considerations

Considerations

Wish to be able to keep the cluster in a known state. To be able to recover quickly when something goes

wrong. Need to be able to install applications fast. Compile code on the OptiPortal. Fast. Easy to use.

Page 5: OptiPortal Configuration Considerations

ROCKS with Viz Roll

Fairly easy to install. Used initially to test OptiPortal and software which

can run on a Vis Wall. Software was out of date

(CentOS 5 vs Fedora 12) Difficult to customise. Difficult to install our own software.

Page 6: OptiPortal Configuration Considerations

Similarities to HPC clusters.

Lots of applications. Each node of the cluster is identical. Need performance. Need to minimise downtime.

Page 7: OptiPortal Configuration Considerations

HPC Cluster

Network boot and install. Shared file system across nodes. Nodes are generally identical. Multiple networks for different uses

(ie management vs MPI)

Page 8: OptiPortal Configuration Considerations

Installing nodes

Network boot and auto install scripts, make reinstalling easy.

Fedora 11 & 12 used. Cobbler (https://fedorahosted.org/cobbler/)

HTTP/PXE/TFTP DHCP/DNS Yum mirror Also customisation of the install process.

Page 9: OptiPortal Configuration Considerations

Installing nodes - cobbler

#install nvidia driver

pushd /root/

wget http://$http_server/files/NVIDIA-Linux-x86_64-190.53-pkg2.run -O /root/NVIDIA-Linux-x86_64-190.53-pkg2.run

chmod +x /root/NVIDIA-Linux-x86_64-190.53-pkg2.run

wget http://$http_server/files/nvidia-install.sh -O /etc/init.d/nvidia-install.sh

chmod +x /etc/init.d/nvidia-install.sh

chkconfig --add nvidia-install.sh

chkconfig nvidia-install.sh on

Page 10: OptiPortal Configuration Considerations

File Server

Hosts non-volatile, shared home directories (/home), software directories (/pkg), and fedora mirror. Built with an old Dell 2900 Server:

6x1.5TB HDD (RAID 0+1). 4x 1Gb/s aggregate network. 250MB/s throughput.

Page 11: OptiPortal Configuration Considerations

Keeping nodes in 'sync'

When you change something on one node you want it the same on the other nodes.

Having a shared home and application directory makes this easy.

Puppet to manage files in /etc (http://www.puppetlabs.com/)

Automated configuration management. Makes sure files and services are in a known state.

If they are not puppet fixes them. Updates every 30mins (default).

Page 12: OptiPortal Configuration Considerations

Nodes in 'sync' - Puppetclass sshd {

file { "/etc/ssh/sshd_config":

owner => root,

group => root,

mode => 600,

ensure => present,

source => "puppet:///files/ssh/sshd_config"

}

exec { "/etc/init.d/sshd reload":

subscribe => File["/etc/ssh/sshd_config"],

refreshonly => true,

}

service { "sshd":

status => "/etc/init.d/sshd status",

ensure => running,

}

}

Page 13: OptiPortal Configuration Considerations

Network

One network for management (dns/dhcp). Onboard network, can network boot.

One network for Internet. PCIe network card, can jumbo frame.

Internet network outside QUT firewall.

Page 14: OptiPortal Configuration Considerations

Performance

Aim to render 10-25 frames per sec. 9600x4800 pixels = 175MB/frame. Bottlenecks everywhere, mostly I/O (bus, disk and

network). 1x PCIe (Gen 2) = 500MB/s 1Gb/s network = 120MB/s 1.5TB hard disk = 150MB/s (maximum)

Page 15: OptiPortal Configuration Considerations

Performance - Disk

First file server. Open Solaris + ZFS RAID5z (across 6 disks) ZFS makes all reads random seeks <100 MB/s read performance Single 1Gb/s network.

Page 16: OptiPortal Configuration Considerations

Performance - Disk

Second Server Fedora 12. SW RAID 0 (3 pairs) across HW RAID 1 (2 disks). Reads mostly sequential. 250 MB/s read performance. 4x 1Gb/s network.

Page 17: OptiPortal Configuration Considerations

Performance - Compression

Compressing data files reduces disk I/O. CPU time to decompress negligible. Better use of I/O cache. Decompress straight to memory. Can get you over the line.

(2x-5x improvement)

Page 18: OptiPortal Configuration Considerations

Issues

SSH and puppet security keys change on rebuild. Upgrading major OS versions still a lot of work. More RAM in file server (IO Cache). 1 Gb/s is not enough (at times). Need to remember to add changes to build scripts.

Page 19: OptiPortal Configuration Considerations

Issues - Multiple Networks

Some software does not like multiple networks. Looks up hostname and will only use that IP

address. Should be able to overwrite in a config file.

Page 20: OptiPortal Configuration Considerations

Questions?