worker node

24
www.eu-eela.org E-science grid facility for Europe and Latin America WORKER NODE GIUSEPPE PLATANIA INFN Catania 30 June - 4 July, 2008

Upload: nerea-atkins

Post on 03-Jan-2016

22 views

Category:

Documents


1 download

DESCRIPTION

WORKER NODE. GIUSEPPE PLATANIA INFN Catania 30 June - 4 July, 2008. OUTLINE. OVERVIEW INSTALLATION & CONFIGURATION TESTING FIREWALL SETUP TROUBLESHOOTING. OVERVIEW. The Worker Node is a service where the jobs run. Its main functionally are: execute the jobs - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: WORKER NODE

www.eu-eela.org

E-science grid facility for Europe and Latin America

WORKER NODE

GIUSEPPE PLATANIA

INFN Catania

30 June - 4 July, 2008

Page 2: WORKER NODE

2www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

OUTLINE

• OVERVIEW

• INSTALLATION & CONFIGURATION

• TESTING

• FIREWALL SETUP

• TROUBLESHOOTING

Page 3: WORKER NODE

3www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

OVERVIEW

• The Worker Node is a service where the jobs run. • Its main functionally are:

– execute the jobs– update to Computing Element the status of the jobs

• It can run several kinds of client batch system:– Torque– LSF– SGE– Condor

Page 4: WORKER NODE

4www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

TORQUE client

• The Torque client is composed by a:

– pbs_mompbs_mom which places the job into execution. It is also responsible for returning the job’s output to the user

Page 5: WORKER NODE

www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

Worker Node installation & configuration using YAIM

Page 6: WORKER NODE

www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

There are several kinds of metapackages to install:

ig_WN – “Generic” WorkerNode.

ig_WN_noafs – Like ig_WN but without AFS.

ig_WN_LSF – LSF WorkerNode. IMPORTANT: provided for consistency, it does

not install LSF softwarebut it apply some fixes via ig_configure_node.

ig_WN_LSF_noafs – Like ig_WN_LSF but without AFS.

ig_WN_torque – Torque WorkerNode.

ig_WN_torque_noafs – Like ig_WN_torque but without AFS.

WHAT KIND OF WN?

Page 7: WORKER NODE

7www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

Repository settings

• REPOS="ca dag ig jpackage gilda glite-wn_torque.repo"

Download and store repo files:

• for name in $REPOS; do wget \

http://grid018.ct.infn.it/mrepo/repos/$name.repo -O \

/etc/yum.repos.d/$name.repo; done

Page 8: WORKER NODE

8www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

INSTALLATION

• yum install jdk java-1.5.0-sun-compat

• yum install lcg-CA

• yum install ig_WN_torque_noafs

In case you want to AFS installed on:

• yum install openafs openafs-client kernel-module-openafs-

`uname -r`

• yum install ig_WN_torque

Gilda rpms:

• yum install gilda_utils gilda_applications

Page 9: WORKER NODE

www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

• Copy users and groups example files to /opt/glite/yaim/etc/gilda/

cp /opt/glite/yaim/examples/ig-groups.conf /opt/glite/yaim/etc/gilda/cp /opt/glite/yaim/examples/ig-users.conf /opt/glite/yaim/etc/gilda/

• Append gilda users and groups definitions to /opt/glite/yaim/etc/gilda/ig-users.conf

cat /opt/glite/yaim/etc/gilda/gilda_ig-users.conf >> /opt/glite/yaim/etc/gilda/ig-users.conf

cat /opt/glite/yaim/etc/gilda/gilda_ig-groups.conf >> /opt/glite/yaim/etc/gilda/ig-groups.conf

Customize ig-site-info.def

Page 10: WORKER NODE

10www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

• Copy ig-site-info.def template file provided by ig_yaim in to gilda dir and customize it

cp /opt/glite/yaim/examples/siteinfo/ig-site-info.def /opt/glite/yaim/etc/gilda/<your_site-info.def>

• Open /opt/glite/yaim/etc/gilda/<your_site-info.def> file using a text editor and set the following values according to your grid environment:

CE_HOST=<write the CE hostname you are installing>

TORQUE_SERVER=$CE_HOST

Customize ig-site-info.def

Page 11: WORKER NODE

www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

WN_LIST=/opt/glite/yaim/etc/gilda/wn-list.conf

The file specified in WN_LIST has to be set with the list of all your WNs hostname.

WARNING: It’s important to setup it before to run the configure command

Customize ig-site-info.def

Page 12: WORKER NODE

www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

GROUPS_CONF=/opt/glite/yaim/etc/gilda/ig-groups.confUSERS_CONF=/opt/glite/yaim/etc/gilda/ig-users.confJAVA_LOCATION="/usr/java/j2sdk1.4.2_12“

JOB_MANAGER=lcgpbsBATCH_BIN_DIR=/usr/binBATCH_VERSION=torque-2.1.9-4VOS=“gilda”ALL_VOMS=“gilda”

Customize ig-site-info.def

Page 13: WORKER NODE

www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

QUEUES="short long infinite“

SHORT_GROUP_ENABLE=$VOSLONG_GROUP_ENABLE=$VOSINFINITE_GROUP_ENABLE=$VOS

In case of to configure a queue fo a single VO:

QUEUES="short long infinite gilda“

SHORT_GROUP_ENABLE=$VOSLONG_GROUP_ENABLE=$VOSINFINITE_GROUP_ENABLE=$VOSGILDA_GROUP_ENABLE=“gilda”

Customize ig-site-info.def

Page 14: WORKER NODE

www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

WN Torque CONFIGURATION

•Now we can configure the node:

/opt/glite/yaim/bin/ig_yaim -c -s /opt/glite/yaim/etc/gilda/<your_site-

info.def> -n ig_WN_torque_noafs

Page 15: WORKER NODE

www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

Worker Nodetesting

Page 16: WORKER NODE

www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

• Verify if the pbs_mom is active and if its status is free:[root@wn root]# /etc/init.d/pbs_mom statuspbs_mom (pid 3692) is running...

[root@wn root]# pbsnodes -awn.localdomain state = free np = 2 properties = lcgpro ntype = cluster status = arch=linux,uname=Linux wn.localdomain 2.4.21-37.EL.cern 1 Tue

Oct 4 16:45:05 CEST 2005 i686,sessions=5892 5910 563 1703 2649,3584,nsessions=6,nusers=1,idletime=1569,totmem=254024kb,availmem=69852kb,physmem=254024kb,ncpus=1,loadave=0.30,rectime=1159016111

Testing

Page 17: WORKER NODE

www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

• First of all, check if a generic user on WN can do ssh to the CE without type the password:

[root@wn root] su – gilda001 [gilda001@wn gilda001] ssh ce [gilda001@ce gilda001]

• The same test has to be executed between the WNs in order to run MPI jobs:

[gilda001@wn gilda001] ssh wn1 [gilda001@wn1 gilda001]

Testing

Page 18: WORKER NODE

www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

FIREWALL setup

Page 19: WORKER NODE

www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

*filter:INPUT ACCEPT [0:0]:FORWARD ACCEPT [0:0]:OUTPUT ACCEPT [0:0]:RH-Firewall-1-INPUT - [0:0]-A INPUT -j RH-Firewall-1-INPUT-A FORWARD -j RH-Firewall-1-INPUT-A RH-Firewall-1-INPUT -i lo -j ACCEPT-A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT-A RH-Firewall-1-INPUT -p tcp -s <ip_you_want> --dport 22 -j ACCEPT-A RH-Firewall-1-INPUT -p all -s <your CE ip address> -j ACCEPT-A RH-Firewall-1-INPUT -p all -s <your WN ip address> -j ACCEPT-A RH-Firewall-1-INPUT -p tcp -m tcp --syn -j REJECT-A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibitedCOMMIT

/etc/sysconfig/iptables

Page 20: WORKER NODE

www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

IPTABLES STARTUP

/sbin/chkconfig iptables on

/etc/init.d/iptables start

Page 21: WORKER NODE

www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

Troubleshooting

Page 22: WORKER NODE

www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

[root@wn root]# su – gilda001[gilda001@wn gilda001] ssh cegilda001@ce’s password:

probably this wn hostname is not in /etc/ssh/shosts.equiv or its ssh keys were not created and stored in /etc/ssh/ssh_known_hosts on CE

Solution (to run on CE):• Ensure that the wn is in pbs list using:[root@ce root]# pbsnodes –a• And then:[root@ce root]# /opt/edg/sbin/edg-pbs-shostsequiv[root@ce root]# /opt/edg/sbin/edg-pbs-known-hosts

Troubleshooting

Page 23: WORKER NODE

www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008

[root@wn root]# pbsnodes -awn.localdomain state = down np = 2 properties = lcgpro ntype = cluster

Solution: [root@wn root]# /etc/init.d/pbs_mom restart

Troubleshooting

Page 24: WORKER NODE

24www.eu-eela.eu Catania (Italy) , Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008