Download - Inria Sophia Nef · 3/29/2016 · Inria Sophia Nef cluster 29/03/2016 2 Nef platform Nef is a cluster computing platform : nodes now : 3 node types, 69 nodes, 1004 cores, 18 GPUs

Inria Sophia Nef cluster

Inria Sophia Antipolis Méditerranée - « ateliers thématiques »

SIC – SED v2.1 29 March 2016

29/03/2016Inria Sophia Nef cluster 2

Nef platform

Nef is a cluster computing platform :● nodes now : 3 node types, 69 nodes, 1004 cores, 18 GPUs ● nodes 04-2016 : 7 node types, 118 nodes, 2148 cores, 18 GPUs

Legacy Nef : stops 17 April 2016, hardware re-installed to Nef

● storage (~15TB for homes, ~150TB for data)● fast network interconnect (Infiniband QDR 32Gbit/s)● front-end servers

For what ?● all computation needs for Inria Sophia teams activity● includes experimentation, « production », big data, parallel computations,

sequential jobs, GPU


Nef platform

For who ?● all Inria Sophia research teams users● Inria users● academic and industrial partners of Inria (under agreement)

By who ?● financed by Inria, CPER, research teams● scientific piloting committee – CSPP « Cluster, Grid, Cloud, HPC »● technical team https://helpdesk.inria.fr

Which future ?● perennial and evolutive platform ● CPER OPAL 2015-2020 : distributed meso-center with regional academic

partners

https://helpdesk.inria.fr/


Nef platform


Nef evolution in a nutshell

What changes from Legacy Nef to Nef :

Legacy Nef Nef

scheduler Torque/Maui OAR

queues many, complex(see documentation)

default, besteffort, big

storage /dfs /data

nodes Legacy Nef nodes+ added hardware

system Fedora16 CentOS7

software new versions,environment modules


Accessing Nef – account request

Account on Nef is distinct from Inria account (requires request & renewal)

Kali web portal https://kali.inria.fr is the preferred account management interface● click Sign in/up with CAS and use your Inria credentials● go to Clusters > Overview page and apply for an account on Sophia New Nef

Kali is also a web portal for simple cluster usage● follow Kali online help to prepare and launch your jobs


Accessing Nef – ssh (1/2)

front-end access from jobsubmission

developmenttools

nef-frontal internet yes no

nef-devel2 Inria yes yes

nef-devel(04-2016)

Inria (Legacy Nef) (Legacy Nef)

Inria Sophia Nef cluster 9

Accessing Nef – ssh (2/2)

Example : successful connection from outside (better : use ~/.ssh/config)

mylaptop$ ssh [email protected]## not needed from Inria network or VPNnef-frontal$ ssh nef-devel2nef-devel2$

Example : bad ssh key configuration in ~/.ssh/authorized_keys

mylaptop$ ssh myneflogin@nef-devel2

Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

mylaptop$

Example : bad connection to nef-frontal when not on Inria network (or configure

~/.ssh/config)

mylaptop$ ssh [email protected]

ssh: connect to host nef-devel2.inria.fr port 22: Connection

timed out

mylaptop$

Inria Sophia Nef cluster 10

Resource manager / scheduler - OAR

OAR is an open source batch scheduler :● submit a job to request resources for a given amount of time (walltime)● core = most basic resource● hierarchy of resources : /cluster/nodes/cpu/core● when you reserve a core, you get a fraction of the memory of the node :

● total mem / total cores

(A few) changes from Torque :● you can't ssh to the nodes (but can use oarsh or oarsub -C)● you can get all the cores of a node without specifying the number of cores● you can reserve a given amount of cores whatever the number of nodes


Requesting resources – oarsub (1/4)

Batch mode job (default, recommended) :

● oarsub -l /nodes=3,walltime=02:00:00 /path/to/my/script

Interactive mode job :

● oarsub -l /core=10 -I

Advance reservation (don't use it if you don't need it) :

● oarsub -l /nodes=1 -r "2016-06-12 14:00:00" /path/to/my/script



Example - resource specification :


● oarsub -l /nodes=4/core=2,walltime=00:30:00 ./runme

Example - properties :

● oarsub -p 'mem > 100000' -l /nodes=1 -I

● oarsub -p "cputype='xeon' and not cluster='dellc6220' " -l

"{mem_core > 6000}"/core=6 ./runme

● oarsub -p "gpu='YES' " -l /nodes=1 -I

GPUs inside a node are shared among cores, so you should reserve

complete nodes and not a few cores !



Example - moldable jobs (either-or) :

● oarsub -l /nodes=4,walltime=2 -l /nodes=2,walltime=4 ./runme

Example - submission script :

● oarsub -S ./test2.sh

nef-devel2 $ cat ./test2.sh

#OAR -l /nodes=2,walltime=1

#OAR -p ibswitch='ibswy1nef'

#OAR -q default

/path/to/my/command

Example – job array with param file (one line per job) :

● oarsub –array-param-file ./param_file ./runme



Example - bad request, can't be satisfied by cluster

● oarsub -l /nodes=20 -p "cluster='dellr900'" -I

● fails with "There are not enough resources for your request"

Example - bad request, doesn't comply with per user resource allocation limits


● remains in "Waiting" state

Hint : use oarstat -fj OAR_JOB_ID or Monika for detailed information

Warning : with oarsub -l /core=4 you can get 1 core on 4 nodes

For multithreaded runs, use oarsub -l /nodes=1/core=4


Obtaining resources – queues (1/3)

Jobs are submitted to a queue, available queues and limits are :

Example : 128 cores with 2x default RAM/core during 3.5 days == 21504

Best effort jobs : not subject to per user limits, but can be killed while running

● Good practice : use if appropriate (eg many short jobs)

queue name max userresources

max duration(days)

priority max user(hours*resources)

default 256 30 10 21504

big 1024 30 5 2000

besteffort 30 0



Job priority order :

● higher priority queue first

● then user's Karma : last 30 days resource consumption

● includes resource consumed + resource requested (used and unused)

Good practice : adjust walltime, RAM and CPU :

● Colmet : http://nef-devel2.inria.fr:5000/ (from Inria network)

"Why is my job still 'Waiting' while there are unused resources ?"

"Why is my job still 'Waiting' while other jobs go 'Running' ?"

● hint : "best fit", per user limits, specific resource request, etc.

http://nef-devel2.inria.fr:5000/



Submit a job to the default queue :

● oarsub ./myscript

Submit a job to the big queue :

● oarsub -q big ./myscript

Submit a best effort job :

● oarsub -t besteffort ./myscript

● oarsub -t besteffort -t idempotent ./myscript


Monitoring jobs and interacting (1/3)

Monika : view jobs/nodes status and properties



Drawgantt : display gantt chart of nodes and jobs for the past and future



oarstat : print info about jobs

oardel : delete a job

oarpeek : show the stdout/stderr of a running job

oarnodes : print info about cluster nodes

Connect to a cluster node where jobid is running from nef-devel2 or nef-frontal

● oarsub -C jobid

● OAR_JOB_ID=jobid oarsh nodename


Managing data (1/4)

All data stored on the cluster ARE NOT backed up.

/home/myneflogin : home (default) directory

● seen cluster wide (nodes, nef-devel2, nef-frontal), long term

● quota 150GB/user, check occupation with quota -s

● hard limit 600GB, grace delay 4 weeks

Local storage on nodes (for jobs temporary files) :

● /tmp : local hard disk

● /dev/shm : RAM filesystem


Managing data (2/4)

All data stored on the cluster ARE NOT backed up.

/data : distributed scalable filesystem

● seen cluster wide (nodes, nef-devel2, nef-frontal)

● team directory : /data/myteamgroup/share

user directory : /data/myteamgroup/user/myneflogin

● long term storage : 1TB/team + quota bought by the team

● tag with chgrp myteamgroup ./long_term_file (Unix group)

● scratch storage : no quota, variable size, may be purged periodically

● tag with chgrp scratch ./scratch_file (Unix group)

● check quota with sudo nef-getquota -g myteamgroup


Managing data (3/4)

Copying files to/from the cluster using rsync :

# example : from nef to mylaptop on Inria Sophia network

# or user customized ~/.ssh/config

mylaptop$ rsync -av [email protected]:nef_source_dir

./laptop_dest_dir

# example : from mylaptop on the Internet to nef

mylaptop$ rsync -av ./laptop_source_dir myneflogin@nef-

frontal.inria.fr:nef_dest_dir

Good practice : avoid scp -r (follows symlinks)

Good practice : copy to/from nef-devel2 when possible (performance)


Managing data (4/4)

Accessing files on the cluster using sshfs :

# example : from mylaptop, Fedora, on Inria Sophia network

# or user customized ~/.ssh/config

mylaptop$ mkdir $XDG_RUNTIME_DIR/nef

mylaptop$ sshfs -o transform_symlinks nef-devel2:/

$XDG_RUNTIME_DIR/nef

mylaptop$ fusermount -u $XDG_RUNTIME_DIR/nef


Using software (1/4)

Overview of the tools available :

● Alinea DDT (debugger for openMP/MPI) & MAP (profiler)

● Intel Parallel Studio (c/c++/fortran compilers, MPI, Vtune)

● Scientific libraries (petsc, trilinos, hypre, mumps, openblas, gmsh, …)

● GPU : cuda 7.5, caffe

● Many languages : GCC, Matlab, R, Python (scipy, numpy, pip), java, ...

● Recommended MPI : openmpi 1.10.1

● Visualization : Paraview ; vnc & virtualGL on a GPU node

You can also install your own software in your home directory :

e.g. with python : pip install –user params



Nef nodes and nef-devel2 use Linux CentOS7 64bit distribution

Compilation : use nef-devel2 (or a node)

Environment modules : configures user environment for using a tool

● module avail : list all available modules

● module load module_name : configures current session for module_name

● module list : show loaded module

● module purge : unload all modules



Example : PETSc / OpenMPI test code from the PETSc distribution

Compilation :

nef-devel2$ module load mpi/openmpi-1.10.1-gcc

nef-devel2$ module load petsc/3.6.3

nef-devel2$ ./configure ## openmpi and petsc PATH/params come

from module

nef-devel2$ make test_code

nef-devel2$



Example : PETSc / OpenMPI test code from the PETSc distribution (continued)

Job script :

nef-devel2$ cat job_script

# !/bin/bash

source /etc/profile.d/module.sh

module load mpi/openmpi-1.10.1-gcc

module load petsc/3.6.3

mpirun –prefix $MPI_HOME /path/to/test_code

nef-devel2$

Submitting job :

nef-devel2$ oarsub -l /core=20 /path/to/job_script

nef-devel2$


Appendix : Nef nodes (1/2)

nodes CPU type cores memory GPU HDD

8x Dell C6220 Xeon E5-2650v2

2x8 @ 2.6Ghz 256 GB - 1TB SATA

44x Dell C6100 Xeon X5670 [email protected] 96 GB - 250GB SATA

13x Dell R900 Xeon E7450 [email protected] 64 GB - 146GB SAS

2x Carri 5600XLR8

Xeon X5650 [email protected] 72 GB 7 GPU(C2050/C2070)

160GB SSD

2x Dell C6100 Xeon X5670 1x6 @ 2.66Ghz 24 GB 2 GPU (M2050) 250GB SATA

Current Nef nodes :


Appendix : Nef nodes (2/2)

nodes CPU type cores memory GPU HDD

16x Dell C6220 Xeon E5 2680 v2 2x10 @ 2.6Ghz 192 GB - 2TB SATA

6x Dell C6145 Opteron 6376 4x16 @ 2.3Ghz 256 GB - 500GB SATA

6x Dell R815 Opteron 6174 4x12@ 2.2Ghz 256/512 GB - 600GB SAS

19x Dell PE1950 Xeon X5670 [email protected] 16 GB - 73GB SAS

Nodes to be added 04/2016 (currently in Legacy Nef) :

Thank you

wiki.inria.fr/ClustersSophia

Inria Sophia Antipolis Méditerranée

29/03/2016

Download - Inria Sophia Nef · 3/29/2016 · Inria Sophia Nef cluster 29/03/2016 2 Nef platform Nef is a cluster computing platform : nodes now : 3 node types, 69 nodes, 1004 cores, 18 GPUs

Top Related