linux clusters in itd

52
Brookhaven Science Associates U.S. Department of Linux Clusters in ITD Efstratios Efstathiadis Information Technology Division

Upload: samson

Post on 01-Feb-2016

60 views

Category:

Documents


0 download

DESCRIPTION

Linux Clusters in ITD. Efstratios Efstathiadis Information Technology Division. Outline. Linux in Scientific Computing Large Scale Linux Installation & Configuration File Sharing: NAS/SAN, NFS, PVFS Cluster Interconnects Load Management Systems Parallel Computing System Monitoring Tools - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy

Linux Clusters in ITD

Efstratios EfstathiadisInformation Technology Division

Page 2: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 2

Linux in Scientific Computing Large Scale Linux Installation & Configuration File Sharing: NAS/SAN, NFS, PVFS Cluster Interconnects Load Management Systems Parallel Computing System Monitoring Tools Linux Clusters in ITD Thoughts, Conclusions

Outline

Page 3: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 3

Features of Scientific Computing: Floating point performance is very important User write their own codes. Fortran is common. GUIs and user-friendly interfaces are not required. Goal is Science, Not Computer Science

Linux in Scientific Computing

Page 4: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 4

Scientific computing is one of the first areas where Linux has had a major impact on production and mission-critical computing.

Access to cheap hardware. License Issues. Vendor Response/Support is slow. Access to Source code is needed to implement desired features. Availability of man power. Availability of Scientific Tools/Resources.

Linux in Scientific Computing

Page 5: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 5

SUN: (www.sun.com/linux) Porting its software products to Linux (Java 2, Forte for Java, OpenOffice, Grid

Engine) Porting Linux for the UltraSPARC architecture Provides common utilities for Solaris and Linux so that users can move between the

two Improves the compatibility between the two so that applications can run on both Sun StorEdge T3 Arrays are compatible with Linux

IBM: (www-1.ibm.com/linux) AFS Storage Devices Linux pre-installed ( 20% of its Intel-based servers are Linux) openclustergroup spends over $1.3B in supporting Linux.

Linux Endorsement

Page 6: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 6

Most Popular: x86AlphaSparcPowerPCMIPS

Which Processor? It Depends on:CostPerformanceAvailability of Software

Processor Support in Linux

Page 7: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 7

SPEC: Standard Performance Evaluation Corporation (http://www.spec.org)

SPECint95: 8 integer-intensive C codesSPECfp95: 10 floating-point scientific FORTRAN codes

Performance

Processor MHz SPECfp95 SPECint95Alpha 21264 500 48.4 23.6UltraSparc 450 27.0 19.7Athlon 650 22.4 29.4PIII/500 500 15.1 21.6

Page 8: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 8

Compilers:gcc, g++, g77: Available on all platforms, but: Generated code is not very fast No parallelization for SMPs g77 is Fortran 77 only g++ has its limitations

x86 Compilers:Portland Group (www.pgroup.com)Fortran 90/95, OpenMP parallelization, HPF, better performance (15%)Kuck and Associates (www.kai.com)C++/ OpenMP ParallelizationNAG, Absoft, Fujitsu etc

Software

Page 9: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 9

International Data Corp. (IDC) Cluster requirements:

Software must provide environment that looks to all as much as like a single system as possible.

Environment must provide higher data and app availability than is possible on single systems.

Developers must not have to use special APIs to have the app work in clustering environment.

Administrators must be able to treat the configuration as a single management domain.

There must be facilities for components of a single or entire app to be run in parallel on many different processors to improve single app performance or overall scalability of the environment.

What is a Cluster?

Page 10: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 10

Cluster

A cluster is a collection of interconnected computers that can be viewed and used as a single, unified computing resource.

What is a Cluster?

Page 11: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy

Choice of: Diskless Install

One copy of Linux to Maintain

Requires Special Tools

It doesn’t scale to large Number of nodes Local InstallKickstartSystem ImagerLUI (Linux Utilities: Installation)g4u (Ghost for Unix: http://www.feyrer.de/g4u/ )

Large Scale Linux Installation

Page 12: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 12

Kickstart

Pulls and installs a list of RPM files from a RedHat mirror site (such as linux.bnl.gov) specified in a configuration file (ks.cfg).

Cluster nodes must be on a public network. Have to maintain several configuration files ks.cfg RedHat only. No easy way to propagate configuration changes.

Large Scale Linux Installation

Page 13: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 13

System Imager (http://systemimager.sourceforge.net)

It pulls the system image of a master-client into an Image Server. Cluster nodes can pull the image they choose from the Image Server.

The Image Server “pulls” the system image of a master-client. Cluster nodes use rsync and tftp to pull images from the Image Server Can be done on a private network. It Supports several Linux distributions Configuration changes can be easily propagated to clients through rsync. Rsync (http://rsync.samba.org) is capable to just “pull” the new &/

modified files off the server rather than the whole system image.

Large Scale Linux Installation

Page 14: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 14

File Sharing: DASFile Sharing: DAS

DAS: Direct Attached Storage

Page 15: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 15

File Sharing: NASFile Sharing: NAS

NAS: Network Attached Storage

Page 16: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 16

File Sharing: NASFile Sharing: NAS

Network Attached Storage (NAS): Shared storage on a network. A dedicated high-performance single purpose

machine. Separate Data Servers from Application Servers Provide Centralized Data Management Scalability Dynamic Growth of Filesystems (LVM). Journaling Filesystems RAID controllers Support for Multiple protocols (NFS, CIFS, HTTP, FTP etc) Multiple Network Interfaces Uses existing Network Infrastructure Web Admin/Monitor GUI (netattach) Redundant Power Supplies/fans/cables Linux Support

Page 17: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 17

File Sharing: SANFile Sharing: SAN

SAN: Storage Area Network

Page 18: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 18

File Sharing: SANFile Sharing: SAN

Storage Area Network (SAN): Shared storage on a network. A dedicated high-performance network

connecting storage elements and the back end of the servers. Provides the Benefits of NAS and also isolates the network traffic

into a dedicated high performance network. Disk Drives are attached directly on a Fiber Channel Network (not

acceptable on a TCP/IP network).

SAN Disadvantages Expensive: Must build a dedicated, high-performance network Lack of strong standards Proprietary Solutions Only

Page 19: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 19

File Sharing: NFSFile Sharing: NFS

- What is NFS ?

The Network File System (NFS) protocol provides transparent remote access to shared file systems across networks. The NFS protocol is designed to be machine, operating system, network architecture and transport protocol independent. This independence is achieved through the use of Remote Procedure Calls (RPC) primitives built on top of eXternal Data Representation (XDR).

- How is NFS3 different from NFS2 ?

NFS version 3 support allows 64 bit file system support (version 2 is limited to 32 bit support), reliable asynchronous writes (version 2 supports only synchronous writes), better cache consistency by providing attribute information before and after an operation which is an added feature and better performance on directory lookups by using READDIRPLUS calls which reduce the number of messages passed between the client and server. The readdirplus calls return file handles and attributes in addition to directory entries. The maximum data transfer size which was set to be 8k in NFS2 is now set by values in the FSINFO return structure.

Page 20: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 20

File Sharing: NFSFile Sharing: NFS

NFS Benchmarking: Bonnie (http://www.textuality.com/bonnie) The same filesystem (/scratch) mounted on the Linux client using different NFS version. The

NFS server is running Solaris 7 (sun3.bnl.gov), The Client is a dual 800MHz RedHat 6.2 host (BLC)

NFSv2 -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 1000 698 2.2 619 0.5 655 1.1 5615 17.3 9994 12.9 176.2 2.8

NFSv3 -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 1000 4630 15.6 4631 4.4 2329 4.6 11233 39.3 11185 16.6 695.4 11.5

Page 21: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 21

File Sharing: NFSFile Sharing: NFS

NFS Benchmarking: Bonnie Linux Server - Linux Client (RedHat 6.2, 2.2.18)

-------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPUNFSv3-v2 1000 9787 33.5 9886 8.7 3222 5.6 8630 29.3 9110 13.6 115.9 0.9NFSv3-v3 1000 9848 34.1 9911 8.9 3227 5.6 8740 30.0 9087 12.7 116.5 1.0Local 1000 19643 61.5 25040 11.7 8297 12.7 15651 40.9 18885 11.1 1150.4 6.9

Page 22: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 22

File Sharing: NFSFile Sharing: NFS

Network Attached Storage (NAS) Linux Server (VA 9450NAS ) Dual PIII Xeon 700MHz, 2.0GB RAM, RedHat 6.2 NFSv3, Mylex extremeRAID 2000, ext3 ) Linux Client (VA 2200) Dual PIII 800 MHz, 0.5GB RAM, RedHat 6.2 2.2.18 kernel with NFSv3).

Server and Client are on the same Network switch (cisco 4006) -------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPUNFSNFSv3-v3 1000 v3-v3 1000 9848 9848 34.1 34.1 9911 9911 8.9 3227 5.6 8.9 3227 5.6 87408740 30.0 30.0 90879087 12.7 116.5 1.0 12.7 116.5 1.0NASv3-v3 1024 7551 25.8 7544 6.8 5048 10.7 11504 36.9 11488 15.9 1640.2 22.1NASv3-v3 2047 7428 25.5 7427 7.0 4021 7.8 10297 33.7 9940 13.9 740.5 13.5

NFSNFSv3-v2 1000 v3-v2 1000 97879787 33.5 33.5 98869886 8.7 8.7 32223222 5.6 5.6 86308630 29.3 29.3 91109110 13.6 115.9 0.9 13.6 115.9 0.9NASv3-v2 512 7430 25.1 7280 7.0 6500 10.4 26453 63.8 19378 16.2 6881.1 87.7NASv3-v2 1024 7360 25.3 7431 7.0 4811 9.1 11357 36.5 10985 16.3 1742.8 24.0NASv3-v2 2047 7348 25.1 7398 6.9 4154 8.2 10813 34.9 10474 15.1 981.3 13.5

Page 23: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 23

File Sharing: NFSFile Sharing: NFS

The setup of having a SUN workstation as a “main node” serving home directories is pretty common.quark.phy.bnl.gov, sun1.sns.bnl.gov, sun2.bnl.gov (sun65.bnl.gov) etchttp://linux.itd.bnl.gov/NFShttp://nfs.sourceforge.net

Page 24: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 24

File Sharing: PVFSFile Sharing: PVFSPVFS: Parallel Virtual File System http://www.parl.clemson.edu/pvfs/desc.htmlStripes file data across multiple disks in different nodes (I/O nodes) in a Cluster. This way large files can be created and bandwidth is increased. Four major components to the PVFS system:

Metadata server (mgr)

I/O server (iod)

PVFS native API (libpvfs)

PVFS Linux kernel support

Page 25: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 25

Cluster Network Cluster Network

Fast Ethernet Transmission Speed: 0.1Gbps, Latency: 100 s, Cost/Connection:<$1000 Gigabit Ethernet Maximum Bandwidth: 1.0Gbps; Cost: $1,650/connection (based on 64 ports, copper).

Myrinet Low Latency, small distance network (System Area Network). Maximum Bandwidth: 1.2Gbps; Latency: 9 s, Cost: > $2,500/connection Single Vendor Hardware.

CDIC Cluster Interconnect: Cisco 4006: 48x3 port Full-duplex Fast Ethernet Switch

Page 26: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 26

Network Graph Network Graph

http://linux.itd.bnl.gov/netpipe

Page 27: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 27

Network Signature Network Signature

Page 28: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 28

Cluster Network: Private vs PublicCluster Network: Private vs Public

Private NetworkCluster Security/Setup/Administration much easierApplications cannot interact with the outside world Public NetworkSecurity/setup/administration difficultIP addresses neededInteraction possible.

Page 29: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 29

Load Management Systems (LMS)Load Management Systems (LMS)

Transparent Load Sharing The users submit jobs w/o being concerned with which cluster resource is

being used to process the job. Control Over Resource Sharing Rather than leaving it up to individuals to search the network for available

resources and capacity to run their jobs, LMS controls the resources in the cluster. LMS takes into account the specifications or requirements of the job when assigning resources. It matches the requirements with the resources available.

Implement Policies In an LMS, rules can be established that automatically set priorities for jobs

among groups or teams. This enables the LMS to implement resource sharing between groups.

Page 30: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 30

Load Management Systems (LMS)Load Management Systems (LMS)

Batch queuing Load Balancing Failover Capability Job Accounting/Statistics User specifiable Resources Relinking/Recompiling of Application Programs Fault tolerant Suspend/Resume jobs Job Status Host Status Meta-Job Capability Cluster-wide resources Job Migration Central Control

Page 31: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 31

Load Management Systems (LMS)Load Management Systems (LMS)

Numerous LMS (or CMS) available on Linux. Portable Batch System (PBS) (http://pbs.mrj.com)Developed by NASA. It is freely distributed by a commercial company which can also

provide service and support. Load Share Facility (LSF) (http://www.platform.com) Sun Grid Engine (CODINE) (http://www.sun.com/software/gridware/linux/) Distributed Queuing System (DQS) (http://www.scri.fsu.edu/~pasko/dqs.html) Generic Network Queuing System (GNQS) (http://www.gnqs.org/) LoadLeveler (http://www.austin.ibm.com/software/sp_products/loadlev.html)Developed by IBM, is a modified version of the Condor batch queuing system

http://www.cs.wisc.edu/condor/

Page 32: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 32

Load Management Systems (LMS)Load Management Systems (LMS)

Portable Batch System (PBS)PBS was designed and developed by NASA to provide control over the initiating, scheduling, and

execution of batch jobs. User Interfaces: Gui xPBS and Command Line Interface (CLI) Heterogeneous Clusters Interactive (debugging sessions or jobs that require user command-line input) and Batch Jobs Parallel code support for MPI, PVM, HPF File Staging Automatic Load-Leveling: The PBS Scheduler numerous ways to distribute workload across the

cluster, based on hardware configuration, resource availability and keyboard activity Job Accounting Cross-system Scheduling Web Site: http://pbs.mrj.com Short introduction at BNL: http://www.itd.bnl.gov/bcf/cluster/pbs/

SPF (Single Point of Failure)

Page 33: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 33

Parallel ProcessingParallel Processing

The use of multiple processors to execute different parts of the program simultaneously.

Main Goal is to reduce wall-clock time; (also cost, memory constraints, etc)

Things to consider: Is the problem parallelizable? (F(k+2)=F(k+1)+F(k)) Parallel Overhead (the amount of time required to coordinate

parallel tasks) Synchronization of parallel tasks (waiting of two or more tasks to

reach a specified point). Granularity of the problem SMP vs DMP (is the network a factor ?)

Page 34: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 34

Parallel ProcessingParallel Processing

Threads Used on SMP hosts only; Not widely used in scientific computing. Compiler generated parallel programs Compiler detects concurrency in loops and distributes work in a loop to different

threads. Compiler is usually assisted by compiler directives. MPI, PVM Embarrassing parallelismIndependent processes can be executed in parallel with little or no coupling between them.

Page 35: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 35

Message Passing Interface (MPI)Message Passing Interface (MPI)

MPI is a message passing library, a collection of subroutines to facilitate the communication (exchange of data and synchronization) among processes in a distributed memory program.

MPI offers portability and performance; Not a true standard. Messages are the actual data that you send/receive and an

envelope of information that helps route the data. In MPI message-passing calls there are three parameters that describe the data and another three that specify the routing (envelope).

Data: startbuf, count, datatype Envelope: dest, tag, communicator Messages are sent over TCP sockets.

Page 36: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 36

Message Passing Interface (MPI)Message Passing Interface (MPI)

Number of processors. (are we overdoing it with too many processes? Creating too much message passing over the network, enhancing synchronization time?)

Message size. (it is the optimum for our network technology? What bandwidth do we get when we pass different size of messages?)

Design: Is our problem very fine gained?? Do we take advantage of loop unrolling? Use the right compiler flags Take advantage of the resources. Avoid nodes that are busy.

Sometimes quite slow nodes can do the job equally well than a fast and busy node.

Benchmark. Get the numbers. What is important to your code? Memory, CPU, etc

Page 37: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 37

Benchmarking MPIBenchmarking MPI

Tools included in the MPICH distribution LLC Bench (http://icl.cs.utk.edu/projects/llcbench/index.htm) Vampir MPI Performance Analysis Tool NAS Parallel Benchmarks (NPB) http://www.nas.nasa.gov/software/NPB NAS: Numerical Aerospace Simulation NPB are installed on BGC under /opt/pgi/bench/NPB2.3 NPB benchmark Results: only one program seems to benefit from the Gigabit

interconnect upgrade, due to large message passing. NAS serial version (help understand the architecture)

Page 38: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 38

NPB-2.3-Serial (W Class)

0

50

100

150

200

250

BT LU SP CG MG

MFlops

BLC-W

BGC-W

SUN2-W

SUN65-W

Page 39: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 39

Cluster MonitorsCluster Monitors

Administrators: Cluster Usage Log File scans Intrusion Detection Hardware and Software Inventories, etc

Users: How many nodes are in the cluster? What type of nodes (architecture)? PC, Spark, SGI Available resources (CPU speed, disk space, memory etc) What is being used? What nodes are “empty”?

Page 40: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 40

Cluster MonitorsCluster Monitors

Load Management Systems provide some sort of monitoring. HP OpenView

Open Source Products Pong3 (Perl, www.megacity.org/pong3) System-Info (Perl, http://blaine.res.wpi.net/) spong (Perl, spong.sourceforge.net) bWatch (Tcl/Tk and ssh, user customizable) Vacum (VA product) rps (Perl, rps.sourceforge.net)

Page 41: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 41

HP OpenViewHP OpenView

IT Operations• ITO agent running on client• ITO central console• ITO agents monitor client log files and report events back to the central console• Actions can be taken either manually or automatically at the console in response

to events RADIA (Novadigm Product)

• Hardware and software Inventories• Software distributions

NNM (Network Node Manager)• Net Metrics: Report Network Statistics • Net Rangers: Intrusion detection

Page 42: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 42

Page 43: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 43

Page 44: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 44

Spong (spong.sourceforge.net)Spong (spong.sourceforge.net)

Provides CPU, memory, disk utilization Checks Availability of services (ssh, http, PBS etc)(*) List running jobs (sorted by CPU usage) (*) Keeps history of events per host Warns (email) Admins of status changes Usage graphs (per hour/day/month/year) (**) Scans Log files Per Host Configuration Open Source(*) Modified/Enchanced (**)unstable

Page 45: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 45

Page 46: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 46

Remote ps (http://rps.sourceforge.net)Remote ps (http://rps.sourceforge.net)

Page 47: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 47

The CDIC Cluster (BGC)The CDIC Cluster (BGC)

49 Nodes on a local network bgc000: Master Node, 2NIC 001-029: dual PIII 700MHz, 1GB Memory, 8GB disk 030-047: dual PIII 500MHz, 0.5GB Memory, 2GB disk bgc-f1: fileserver hosting “local” user Home directories (50GB) (“public” home directories are mounted on the master node only, under /itd) RedHat 6.2 PBS (with MPI support) MPICH-1.2.1 Initial installation with kickstart. Use rsync to propagate updates. Portland Group Compilers (3.2) 16 CPU, 2 users Monitors: bWatch, spong, pong3 Interim backup solution

Page 48: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 48

The Brookhaven Cluster (BLC)The Brookhaven Cluster (BLC) General Purpose Cluster 60 Nodes on a public network blc000.bnl.gov: Master Node 001-040: dual PIII 800MHz, 0.5GB Memory, 2x9GB disk 041-059: dual PIII 500MHz, 0.5GB Memory, 18GB disk RedHat 6.2 PBS (with MPI support). MPICH-1.2.1 Home directories are hosted on a Solaris File Server (userdata.bnl.gov) Initial installation, configuration and updates with System Imager. (Host

Images are kept on the master node) Portland Group Compilers (3.2) 64 CPU, 4 users Monitors: bWatch, spong, pong3, HPOV

Page 49: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 49

The SNS Cluster (SNSC)The SNS Cluster (SNSC)

6 Nodes on public network snsc00.sns.bnl.gov: Master Node 01-05: dual PIII 700MHz, 0.5GB Memory, 18GB disk RedHat 7.0 PBS (with MPI support). (?) MPICH-1.2.1 Home directories are hosted on a Solaris File Server (sun1.sns.bnl.gov) Initial installation, configuration and updates with System Imager ( Host

Images are kept on the master node).

Page 50: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 50

Thoughts ...Thoughts ...

Need for Centralized Cluster Management and Homogeneity : Easier monitoring, administration, maintenance and recovery. Users will have the option to share resources: Idle CPUs,

filesystems (NAS), Network Switches, Printers etc. and costs (licenses, software).

Increase user interaction. Faster Integration of new groups.

Page 51: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 51

BLC1

Public Network

Private Network (192.168.50.0)

Gateway

BLC 2 BLC 3 BLC N

Gateway Fileserver1 Software1

BGC 1 BGC 2 BGC 3 BGC N

SNSC 1 SNSC 2 SNSC 3 SNSC N

VIS 1 VIS 2 VIS 3 VIS N

Page 52: Linux Clusters in ITD

Brookhaven Science AssociatesU.S. Department of Energy 52

Linux in ITDLinux in ITD Mail Gateway, DNS, DHCP, Proxies, etc Linux clusters are growing in size and number. Linux is becoming a Player (Back ups, HPOV,

SAN, Security, man power, Mirror sites etc) Linux is becoming a common platform for

Scientific Computing (RHIC, CDIC, SNS, g-2, Physics Theory, …)