hotfoot hpc cluster march 31, 2011. topics overview execute nodes manager/submit nodes nfs server...
TRANSCRIPT
![Page 1: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/1.jpg)
Hotfoot HPC ClusterMarch 31, 2011
![Page 2: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/2.jpg)
Topics
• Overview• Execute Nodes• Manager/Submit Nodes• NFS Server• Storage• Networking• Performance
![Page 3: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/3.jpg)
Overview - Hotfoot Pilot
• Launched May 2009
• Original Partnership– Astronomy– Statistics– CUIT– Office of the Executive Vice President for Research
![Page 4: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/4.jpg)
Overview - Hotfoot Expansion
• Expanded March 2011– More Nodes– More Storage– Changed Scheduler
• New Participant– Social Science Computing Committee (SSCC)
![Page 5: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/5.jpg)
Overview – Cluster Components
• 52 Execute Nodes
• 520 Total Cores
• 2 Manager Nodes
• 1 NFS Server (1 Cold Spare)
• 52 TB Storage (72 TB Raw)
![Page 6: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/6.jpg)
Overview
![Page 7: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/7.jpg)
Overview - Architecture
Manager/SubmitNode 1
(Haddock)
RAID
NFS Server(Herring)
Manager/SubmitNode 2
(Mahimahi)
Hotfoot Components
Blade Chassis32 Execute Nodes
NFS Server(Sardine)
Original blade chassis
containing 32 Execute nodes.
New blade chassiscontaining 24
Execute nodes.
One Manager/Submit node is active. Failover is manual.
Second server available to provide NFS services.
Currently not connected.
72TB raw storage. Approximately 52TB usable
under RAID 5.
NFS server provides working storage for all other systems
in cluster.
![Page 8: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/8.jpg)
Execute Nodes
Model Quantity CPU Cores Total Cores Memory
BL2x220c G5 32 Dual 4 core 256 16 GB
BL2x220c G6 14 Dual 6 core 168 24 GB
BL2x220c G6 8 Dual 6 core 96 96 GB
![Page 9: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/9.jpg)
Manager/Submit Nodes
• HP DL360 G5, 4 GB RAM
• Torque Resource Manager (OpenPBS descendent)
• Maui Cluster Scheduler
• User Access via virtual interface (vif)
• Failover via Torque High Availability (HA)
![Page 10: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/10.jpg)
NFS Servers
• Primary– HP DL360 G7– 2 x 4 cores– 16 GB RAM
• Backup– HP DL360 G5– 1 x 2 cores– 8 GB RAM
![Page 11: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/11.jpg)
Storage
• HP P2000 Storage Array
• 32 x 2 TB Drives
• RAID 5
• ~52 TB Usable
![Page 12: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/12.jpg)
Networking
• Execute Nodes
– Channel-bonding mode 2 (load-balancing and fault tolerance)
– 1 Gb connection to chassis switches
– Usage records suggested this was sufficient
![Page 13: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/13.jpg)
Networking
Sample Traffic for an Execute Node
![Page 14: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/14.jpg)
Networking
• Chassis
– Each chassis has four Cisco 3020 switches
– 1 Gb connection to Edge switches
– Usage records suggested this was sufficient
![Page 15: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/15.jpg)
Networking
Sample Traffic for a Chassis Switch
![Page 16: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/16.jpg)
Networking
Original Chassis, Showing Network Connections for Two Servers
![Page 17: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/17.jpg)
Performance
• Concern about the ability of NFS to handle i/o demands.
• Reviewed performance of pilot system.
• Ran tests on expanded system.
![Page 18: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/18.jpg)
Performance
Memory Usage on Old NFS Server
![Page 19: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/19.jpg)
Performance
Load Average on Old NFS Server
![Page 20: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/20.jpg)
Performance
![Page 21: Hotfoot HPC Cluster March 31, 2011. Topics Overview Execute Nodes Manager/Submit Nodes NFS Server Storage Networking Performance](https://reader030.vdocuments.site/reader030/viewer/2022032709/56649ec55503460f94bcf94a/html5/thumbnails/21.jpg)
Performance